Skip to content

Using a dataset from Cleanlab.ai, evaluate improvements from using DSPy + GEPA optimization.

Notifications You must be signed in to change notification settings

kmad/dspy-optimizer-experiment

Repository files navigation

Financial Entity Extraction with DSPy + GEPA

Experimenting with DSPy and GEPA optimization for structured information extraction. TL;DR: automatic prompt optimization yields +22pp exact match accuracy over vanilla OpenAI API calls.

Results

Method Exact Match Mean Field
OpenAI Baseline 32.07% 84.34%
DSPy Baseline 39.79% 87.06%
DSPy + BAML 42.74% 87.86%
DSPy + GEPA 53.84% 91.64%
DSPy + BAML + GEPA 54.43% 91.62%

Overall comparison

Per-field comparison

Dataset

Financial NER extraction from Cleanlab's structured output benchmark - 2,117 samples with 7 entity types: Company, Date, Location, Money, Person, Product, Quantity.

Files

  • get_responses.ipynb - Original OpenAI baseline
  • get_responses_dspy.ipynb - DSPy + GEPA experiments
  • generate_comparison_charts.py - Regenerate comparison charts
  • schema.py - Pydantic schema for extracted entities

Quick Start

pip install dspy pandas numpy matplotlib scikit-learn

# Run the DSPy notebook
jupyter notebook get_responses_dspy.ipynb

Key Takeaways

  • DSPy alone doesn't always beat hand-crafted prompts
  • GEPA optimization is where the real gains are (~14pp over DSPy baseline)
  • Cost-effective: ~$2-3 in API costs for permanent accuracy improvements
  • Biggest gains in Product (+12pp) and Date (+9pp) extraction

Blog Post

Full writeup: Optimizing Structured Output with DSPy and GEPA

About

Using a dataset from Cleanlab.ai, evaluate improvements from using DSPy + GEPA optimization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors