CDD Hybrid RL Environment on Prime Intellect

This workspace implements a production-oriented starter for a hybrid Commercial Due Diligence (CDD) RL environment using Prime Intellect patterns:

Environment package under environments/cdd_hybrid/
Data pipeline under scripts/
Core validation and metrics under src/cdd_prime/
Reproducible configs under configs/
Tests under tests/

About

This project provides a practical foundation for training and evaluating LLM agents on CDD-style reasoning with strict anti-leakage controls and measurable decision quality. It is designed for Prime Intellect workflows (verifiers, prime eval, and prime-rl) while staying runnable locally.

Objective

Train/evaluate models to perform CDD-like reasoning with two reward families:

Process quality: workstream coverage, evidence usage, internal consistency, output format quality.
Outcome quality: calibrated probability prediction and decision utility against realized outcomes.

Implemented Hybrid Flow

Validate toolchain and Prime OpenAPI contracts (scripts/check_toolchain.py, scripts/check_prime_openapi_contract.py).
Build a historical deal universe (scripts/expand_deals_from_wikipedia.py, scripts/merge_deal_sources.py).
Enrich with realized outcomes using market data (scripts/build_outcomes.py).
Ingest richer text evidence snippets (scripts/enrich_text_evidence.py).
Build pre-deal-only packets and prompts (scripts/build_packets.py).
Split and validate (scripts/split_dataset.py, scripts/validate_dataset.py).
Run baseline and evaluate (scripts/run_heuristic_baseline.py, scripts/evaluate_predictions.py, scripts/evaluate_group_policy.py).
Run memorization probes and judge scoring (scripts/run_memorization_probe.py, scripts/run_model_judge.py).
Use environments/cdd_hybrid/ with prime env install + prime eval run.

Quick Start (Local Smoke)

set -a; source .env.local; set +a
./scripts/smoke_pipeline.sh

Full Pipeline (200+ Deals)

set -a; source .env.local; set +a
./scripts/full_pipeline.sh

This runs:

Toolchain lock checks
Prime OpenAPI contract checks
Wikipedia expansion + source merge
Outcome enrichment + rich text evidence ingestion + packetization
Split + leakage validation + baseline metrics + group metrics + regression gate + tests

Toolchain Lock

Pinned versions are in toolchain.lock.toml.

Validate:

python3 scripts/check_toolchain.py

Prime Integration

Environment source: environments/cdd_hybrid/cdd_hybrid.py
Environment package metadata: environments/cdd_hybrid/pyproject.toml
Eval config template: configs/eval/cdd_hybrid.toml
Prime-RL config template: configs/prime-rl/cdd_hybrid.toml

Live Eval (Current CLI Stack)

On the currently installed stack (prime 0.4.x + verifiers 0.1.5), local evaluation is run with vf-eval:

PRIME_API_KEY=... vf-eval cdd_hybrid \
  -m qwen/qwen3-235b-a22b-instruct-2507 \
  -b https://api.pinference.ai/api/v1 \
  -k PRIME_API_KEY \
  -n 2 -r 1 \
  -a '{"dataset_path":"data/processed/train.jsonl","eval_dataset_path":"data/processed/test.jsonl"}' \
  -s

Baseline Matrix and Seed Experiments

Small-model matrix benchmark:

set -a; source .env.local; set +a
python3 scripts/run_model_matrix.py --dataset data/processed/test.jsonl --limit 24

Outputs:

data/interim/model_matrix/benchmark_report.md
data/interim/model_matrix/benchmark_summary.json
Tracked copies: reports/benchmark_report.md, reports/benchmark_summary.json

Optional multi-sample pass@k run:

set -a; source .env.local; set +a
python3 scripts/run_model_matrix.py \
  --dataset data/processed/test.jsonl \
  --models qwen/qwen3-8b \
  --limit 12 \
  --samples-per-deal 3 \
  --temperature 0.1 \
  --out-dir data/interim/model_matrix_passk

Three-seed short optimization experiments:

set -a; source .env.local; set +a
python3 scripts/run_seed_optimization.py --dataset data/processed/train.jsonl --eval-dataset data/processed/test.jsonl

Outputs:

data/interim/seed_optimization/seed_results.json
data/interim/seed_optimization/seed_summary.json
Tracked copy: reports/seed_summary.json

Judge + Group Metrics

Blinded process-quality judge (dry-run heuristic or online model judge):

python3 scripts/run_model_judge.py \
  --dataset data/processed/test.jsonl \
  --predictions data/interim/model_matrix/qwen-qwen3-8b.jsonl \
  --dry-run \
  --output data/interim/model_judge_results.jsonl \
  --summary-output reports/model_judge_summary.json

Group metrics from multi-sample predictions:

python3 scripts/evaluate_group_policy.py \
  --dataset data/processed/test.jsonl \
  --predictions data/interim/heuristic_predictions_test.jsonl \
  --k-values 1 \
  --output data/interim/group_metrics.json
cp data/interim/group_metrics.json reports/group_metrics.json

Comments

The current dataset is expanded from public acquisition list pages plus a curated seed set.
Time-split evaluation and leakage checks are enforced to reduce label contamination risk.
Memorization probing is included and can be run in --dry-run or online mode.
Judge rubric scoring is blinded to realized outcomes by design.

Notes

Public-source extraction quality depends on citation availability and page structure.
For production, enrich evidence from issuer filings/transcripts and internal data room documents.
Reward design is modular; adjust weights and thresholds for your IC loss function.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
configs		configs
data		data
environments/cdd_hybrid		environments/cdd_hybrid
reports		reports
scripts		scripts
src/cdd_prime		src/cdd_prime
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CDD_HYBRID_EXECUTION_TODO.md		CDD_HYBRID_EXECUTION_TODO.md
EXECUTION_REPORT.md		EXECUTION_REPORT.md
Makefile		Makefile
PRIMEINTELLECT_DOCS_FULL_DIGEST.md		PRIMEINTELLECT_DOCS_FULL_DIGEST.md
PRIMEINTELLECT_DOCS_REVIEW.md		PRIMEINTELLECT_DOCS_REVIEW.md
README.md		README.md
SUPER_CDD_PLAN.md		SUPER_CDD_PLAN.md
primeintellect_docs_coverage.md		primeintellect_docs_coverage.md
primeintellect_docs_index.json		primeintellect_docs_index.json
primeintellect_llms.txt		primeintellect_llms.txt
primeintellect_llms_full.txt		primeintellect_llms_full.txt
primeintellect_openapi.json		primeintellect_openapi.json
primeintellect_relevant_sections.md		primeintellect_relevant_sections.md
pyproject.toml		pyproject.toml
toolchain.lock.toml		toolchain.lock.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDD Hybrid RL Environment on Prime Intellect

About

Objective

Implemented Hybrid Flow

Quick Start (Local Smoke)

Full Pipeline (200+ Deals)

Toolchain Lock

Prime Integration

Live Eval (Current CLI Stack)

Baseline Matrix and Seed Experiments

Judge + Group Metrics

Comments

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CDD Hybrid RL Environment on Prime Intellect

About

Objective

Implemented Hybrid Flow

Quick Start (Local Smoke)

Full Pipeline (200+ Deals)

Toolchain Lock

Prime Integration

Live Eval (Current CLI Stack)

Baseline Matrix and Seed Experiments

Judge + Group Metrics

Comments

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages