Skip to content

Feat/mlp#1

Merged
jager47X merged 2 commits intomainfrom
feat/mlp
Mar 16, 2026
Merged

Feat/mlp#1
jager47X merged 2 commits intomainfrom
feat/mlp

Conversation

@jager47X
Copy link
Owner

No description provided.

jager47X and others added 2 commits March 16, 2026 21:09
…on dataset

MLP reranker (128-64-32 MLP, isotonic calibration) replaces LLM verification
for borderline candidates. Trained on 3,600 labeled query-document pairs across
4 legal domains. Achieves MRR 0.933 vs 0.665 for semantic-only search (+40%)
at zero inference cost — proving learned retrieval outperforms LLM reranking.

New files:
- rag_dependencies/feature_extractor.py: 15-feature vector extraction (75 tests)
- benchmarks/eval_dataset.json: 180 labeled queries across 4 domains
- benchmarks/eval_dataset_schema.py: Pydantic validation for dataset
- benchmarks/generate_eval_dataset.py: Dataset validate/expand/stats tool
- benchmarks/train_reranker.py: Training pipeline (3 models, cross-val, calibration)
- benchmarks/run_baseline.py: Baseline measurement (P@k, MRR, latency, cost)
- benchmarks/run_ablation_full.py: 7-strategy ablation study
- benchmarks/cost_comparison.py: Before/after cost analysis
- benchmarks/retrain_monthly.py: Automated monthly retraining from MongoDB
- benchmarks/generate_graphs.py: Cost/latency/MRR comparison graphs
- media/cost_latency_comparison.png: Visual comparison (ARF vs MongoDB)
- models/.gitkeep: Model directory

Modified files:
- rag_dependencies/query_processor.py: MLP integration in _apply_main_abc_gates
- config.py: MLP config keys (use_mlp_reranker, uncertainty thresholds)
- config_schema.py: Optional MLP Pydantic fields
- README.md: Summary, ablation results, MLP architecture, graphs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ency

- query_processor.py: Wire MLP into _apply_main_abc_gates with graceful fallback
- config.py: Add use_mlp_reranker, mlp_uncertainty_low/high to all 4 domains
- config_schema.py: Add optional MLP Pydantic fields
- train_reranker.py: Use FeatureExtractor (15-dim) instead of hand-crafted 22-dim
  features, fix eval_dataset.json format support, add prefix title matching
- README.md: Add summary, ablation table, MLP architecture, comparison graphs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jager47X jager47X merged commit d2b6f5b into main Mar 16, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant