Experimenting with DSPy and GEPA optimization for structured information extraction. TL;DR: automatic prompt optimization yields +22pp exact match accuracy over vanilla OpenAI API calls.
| Method | Exact Match | Mean Field |
|---|---|---|
| OpenAI Baseline | 32.07% | 84.34% |
| DSPy Baseline | 39.79% | 87.06% |
| DSPy + BAML | 42.74% | 87.86% |
| DSPy + GEPA | 53.84% | 91.64% |
| DSPy + BAML + GEPA | 54.43% | 91.62% |
Financial NER extraction from Cleanlab's structured output benchmark - 2,117 samples with 7 entity types: Company, Date, Location, Money, Person, Product, Quantity.
get_responses.ipynb- Original OpenAI baselineget_responses_dspy.ipynb- DSPy + GEPA experimentsgenerate_comparison_charts.py- Regenerate comparison chartsschema.py- Pydantic schema for extracted entities
pip install dspy pandas numpy matplotlib scikit-learn
# Run the DSPy notebook
jupyter notebook get_responses_dspy.ipynb- DSPy alone doesn't always beat hand-crafted prompts
- GEPA optimization is where the real gains are (~14pp over DSPy baseline)
- Cost-effective: ~$2-3 in API costs for permanent accuracy improvements
- Biggest gains in Product (+12pp) and Date (+9pp) extraction
Full writeup: Optimizing Structured Output with DSPy and GEPA

