Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
6412a5d
feat(gepa): add tool description optimization for multi-agent systems
Ju-usc Oct 10, 2025
cf0be4f
style: fix ruff formatting (trailing whitespace)
Ju-usc Oct 10, 2025
aa53fe2
style: apply ruff formatting fixes
Ju-usc Oct 10, 2025
045c6cf
feat(gepa): implement tool-specific proposer for tool descriptions
Ju-usc Oct 10, 2025
c4f2041
docs(gepa): clean up multi-agent example code
Ju-usc Oct 10, 2025
260ca80
refactor(gepa): simplify tool reflective dataset with ReAct context r…
Ju-usc Oct 11, 2025
04f7e3d
fix(gepa): unify custom proposer routing for tools
Ju-usc Oct 12, 2025
f92e184
docs(gepa): clarify tool reflection prompt
Ju-usc Oct 12, 2025
7178869
test: streamline GEPA tool optimization tests
Ju-usc Oct 12, 2025
e34703b
fix(gepa): streamline tool proposer formatting
Ju-usc Oct 12, 2025
3f05311
test(gepa): drop legacy dummy tool fixture
Ju-usc Oct 12, 2025
4df9ce5
docs(gepa): add tool-specific reflection prompt and metric example
Ju-usc Oct 12, 2025
4296ccf
docs(gepa): fix implementation details with accurate code flow
Ju-usc Oct 13, 2025
ea1204a
docs(gepa): remove backward compatibility note
Ju-usc Oct 13, 2025
48d5cd6
docs(gepa): improve usage examples with optimization visualization
Ju-usc Oct 13, 2025
548d9b6
docs(gepa): add design rationale comments for tool context sharing
Ju-usc Oct 13, 2025
e61d0a1
docs(gepa): add tool optimization links to overview and parameter docs
Ju-usc Oct 13, 2025
5c95412
docs(gepa): refine tool optimization scenarios and remove implementat…
Ju-usc Oct 13, 2025
19d7717
docs(gepa): clarify future work section in code comments
Ju-usc Oct 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
298 changes: 298 additions & 0 deletions docs/docs/api/optimizers/GEPA/GEPA_Advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -443,3 +443,301 @@ gepa = dspy.GEPA(
auto="medium"
)
```

## Tool Description Optimization

### What is optimize_tool_descriptions?

The `optimize_tool_descriptions` parameter enables GEPA to optimize tool descriptions in addition to signature instructions. This is particularly valuable for ReAct agents and other tool-using systems, where the quality of tool descriptions directly impacts the agent's ability to select appropriate tools for each task.

Unlike signature instructions that guide reasoning strategies, tool descriptions serve a different purpose: they help agents decide **which tool to use** in a given situation. GEPA applies a specialized reflection prompt tailored for tool selection decisions.

### Tool-Specific Reflection Prompt

GEPA uses a dedicated prompt for optimizing tool descriptions. The prompt receives the complete ReAct trajectory (all thoughts, actions, observations) from executions that used the tool being optimized:

```python
class GenerateImprovedToolDescriptionFromFeedback(dspy.Signature):
"""You are refining a tool description that the assistant currently uses.

Review the current description along with examples of the assistant's tool decisions
and the feedback those decisions received.

Read them together and refine the description.
So the agent understands when this tool actually helps, what argument or result matters,
and what misuse the feedback exposed. Keep the tool's voice and only change what the
evidence justifies.

Return a refined description that helps the assistant quickly recognize good
opportunities for the tool."""

current_tool_description = dspy.InputField(desc="The current description of the tool")
examples_with_feedback = dspy.InputField(
desc="Examples showing tool usage decisions and feedback on correctness"
)

improved_tool_description = dspy.OutputField(
desc="An improved description that guides correct tool selection and usage"
)
```

The `examples_with_feedback` contains full ReAct trajectories showing the complete context in which each tool was selected and used, enabling the reflection LM to understand tool selection patterns.

**Example: Writing Tool-Aware Metrics**

To provide effective feedback for tool optimization, write metrics that examine the trajectory:

```python
def tool_feedback_metric(example, prediction, trace=None, pred_name=None, pred_trace=None):
"""Metric that provides tool-specific feedback for GEPA optimization."""
correct = prediction.answer == example.answer
score = 1.0 if correct else 0.0

# Generate tool-specific feedback if available
if hasattr(prediction, 'trajectory'):
tools_used = [
prediction.trajectory[key]
for key in prediction.trajectory
if key.startswith('tool_name_') and prediction.trajectory[key] != 'finish'
]
feedback = f"{'Correct' if correct else 'Wrong'}. Tools: {', '.join(tools_used)}"
else:
feedback = "Correct" if correct else "Wrong"

return dspy.Prediction(score=score, feedback=feedback)
```

This produces feedback like:
```
[Tool 'calculator' from 'agent'] Correct. Tools: calculator
[Tool 'search' from 'agent'] Wrong. Tools: search, calculator
```

The tool-specific prefix `[Tool 'calculator' from 'agent']` is automatically added by GEPA to focus the reflection LM on optimizing that particular tool's description.

**Note:** Tool descriptions are treated as components in GEPA's optimization process. The `component_selector` parameter applies to both signature instructions and tool descriptions. For example, `component_selector="all"` optimizes all signatures and tools together, while `component_selector="round_robin"` cycles through them one at a time.

### Default Behavior

By default, GEPA only optimizes signature instructions (`optimize_tool_descriptions=False`):

```python
# Default behavior: only signature optimization
gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key),
# optimize_tool_descriptions=False # This is the default
auto="medium"
)
optimized_program = gepa.compile(student, trainset=examples)
```

### How It Works

When enabled, GEPA:

1. **Discovers all tools**: Traverses your program including nested sub-modules to find all `dspy.Tool` instances
2. **Categorizes components**: Separates tools (identified by `tool:` prefix) from signature instructions
3. **Routes components appropriately**:
- Signature instructions → Default or custom instruction proposer
- Tool descriptions → ToolProposer (receives ReAct's reflective data with tool-specific annotation)
4. **Optimizes holistically**: Treats tool descriptions as first-class components in the optimization process

### Implementation Details

**Reflective Dataset Construction:**

GEPA constructs the reflective dataset for tool optimization in two passes:

**Pass 1: Build reflective examples for predictors (used by instruction proposer)**

For each predictor (including ReAct modules), GEPA creates reflective examples containing:
- **Inputs**: The predictor's input fields (e.g., `{"question": "..."}`)
- **Generated Outputs**: ALL of the predictor's output fields converted to strings
- For ReAct: This includes both `answer` AND `trajectory` fields
- The trajectory contains the complete execution trace with all thoughts, actions, and observations
- **Feedback**: Text feedback returned by your metric function

These examples are used by the instruction proposer to optimize signature instructions.

**Pass 2: Copy reflective examples to tools with annotation (used by tool proposer)**

For each tool being optimized, GEPA:
- Identifies ALL ReAct predictors (across all nested modules) that have this tool in their toolset
- Takes ALL reflective examples from those predictors and makes a deep copy for the tool
- Annotates the feedback: `[Tool 'tool_name' from 'predictor_key'] {original_feedback}`
- If multiple ReAct modules use the same tool, their reflective examples are aggregated together

These annotated examples are used by the tool proposer (with the tool-specific reflection prompt shown above) to optimize tool descriptions.

This means:
- A tool receives the FULL ReAct trajectory (thoughts, actions, observations) in the "Generated Outputs" field
- The metric can optionally examine the trajectory and include tool-specific insights in the feedback text
- The reflection LM sees complete context about how and when the tool was used

**Component Identification & Proposer Routing:**

GEPA discovers tools by traversing ReAct modules and extracting their associated `dspy.Tool` instances. Once identified, GEPA routes components to appropriate proposers:
- **Signature instructions** → Custom instruction proposer (if provided) OR default GEPA proposer
- **Tool descriptions** → Built-in `ToolProposer` (always used, not customizable)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is customizable, right? If not, shouldn't we make this customizable too?


The custom instruction proposer affects ONLY signature instructions. Tools always use the specialized `ToolProposer` with the tool-specific reflection prompt, regardless of whether you provide a custom instruction proposer.

### When to Use optimize_tool_descriptions

Enable `optimize_tool_descriptions=True` when you use `dspy.Tool` in your program and need better tool selection. Here are common scenarios:

1. **ReAct agents with multiple tools** - Agent with `search` and `calculator` tools keeps searching when it should calculate, or vice versa. GEPA learns from execution feedback to clarify "use search for factual queries, calculator for numerical analysis."

2. **Multi-agent systems with delegation** - Parent agent has delegation tools to specialized sub-agents but doesn't understand when to use each. GEPA optimizes both delegation tools and sub-agent internal tools holistically.

3. **Sequential tool workflows** - Tools like `query_database` → `analyze_results` have dependencies but descriptions don't capture this. GEPA learns the sequence and timing from successful executions.

4. **Domain-specific tools** - Tools like legal vs. medical document search have overlapping but domain-specific purposes. GEPA discovers usage patterns and adds context: "for legal precedents" vs. "for patient records."

5. **Tools with limitations** - Initial description "Does calculations" is too vague. GEPA adds specificity from observed usage: "Use for arithmetic (+, -, *, /, **). Not for date math or string operations."

See the usage examples below for implementations of scenarios 1 and 2.

### Usage Examples

#### Basic ReAct Agent

```python
import dspy

def search_web(query: str) -> str:
return f"Search results for: {query}"

def calculate(expression: str) -> float:
return eval(expression)

# Create ReAct agent with tools (poor initial descriptions)
search_tool = dspy.Tool(search_web, name="search", desc="Finds things")
calc_tool = dspy.Tool(calculate, name="calculator", desc="Does calculations")

agent = dspy.ReAct("question -> answer", tools=[search_tool, calc_tool])

# Enable tool optimization
gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5-mini"),
optimize_tool_descriptions=True,
component_selector="all", # Optimize all components together
auto="medium"
)

optimized_agent = gepa.compile(agent, trainset=train_examples, valset=val_examples)

# View optimized tool descriptions
print("Optimized search tool:", optimized_agent.tools["search"].desc)
print("Optimized calculator tool:", optimized_agent.tools["calculator"].desc)
```

**Example output after optimization:**
```
Optimized search tool: Use when you need to find current information, facts, or data
from external sources. Provide specific search queries to get relevant results.

Optimized calculator tool: Use for arithmetic operations and mathematical expressions.
Accepts Python-compatible expressions with numbers and operators (+, -, *, /, **).
Do not use for date calculations or string manipulations.
```

#### Multi-Agent System

GEPA automatically discovers and optimizes tools in nested agents:

```python
import dspy

def search_web(query: str) -> str:
return f"Search results for: {query}"

def calculate(expression: str) -> float:
return eval(expression)

search_tool = dspy.Tool(search_web, name="search", desc="Searches")
calc_tool = dspy.Tool(calculate, name="calculator", desc="Computes")

class ResearchAssistant(dspy.Module):
def __init__(self):
super().__init__()
self.researcher = dspy.ReAct("query -> findings", tools=[search_tool])

def delegate_research(query: str) -> str:
return self.researcher(query=query).findings

research_tool = dspy.Tool(delegate_research, name="research", desc="Helps with questions")
self.assistant = dspy.ReAct("question -> answer", tools=[research_tool, calc_tool])

def forward(self, question):
return self.assistant(question=question)

# Optimizes ALL tools: calculator, research, search
gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5-mini"),
optimize_tool_descriptions=True,
component_selector="all",
auto="medium"
)

optimized_system = gepa.compile(ResearchAssistant(), trainset=train, valset=val)

# View optimized nested tool descriptions
print(optimized_system.researcher.tools["search"].desc)
print(optimized_system.assistant.tools["research"].desc)
print(optimized_system.assistant.tools["calculator"].desc)
```

### Inspecting Optimized Tool Descriptions

After optimization, tool descriptions are automatically updated in your program. Access them directly through your module structure:

```python
optimized_agent = gepa.compile(agent, trainset=train, valset=val)

# Access tools directly - descriptions are already updated
print(optimized_agent.tools["search"].desc)
print(optimized_agent.tools["calculator"].desc)
```

For multi-agent systems, access nested tools through your module hierarchy:

```python
optimized_system = gepa.compile(ResearchAssistant(), trainset=train, valset=val)

# Access tools at different levels
print(optimized_system.researcher.tools["search"].desc) # Sub-agent tool
print(optimized_system.assistant.tools["research"].desc) # Main agent tool
print(optimized_system.assistant.tools["calculator"].desc)
```

### Compatibility with Custom Instruction Proposers

Tool optimization works seamlessly with custom instruction proposers. When you provide a custom instruction proposer AND enable `optimize_tool_descriptions=True`:

**Component routing:**
- **Signature instructions** → Your custom instruction proposer
- **Tool descriptions** → Built-in `ToolProposer` with specialized tool reflection prompt

**Key points:**
- Both operate independently during the same GEPA run
- Tools receive domain-appropriate optimization guidance (tool selection patterns, usage context)
- Signatures use your custom logic (task-specific reasoning, formatting, etc.)
- The built-in tool proposer is not customizable - it always uses `GenerateImprovedToolDescriptionFromFeedback`

This separation ensures tools and signatures get appropriate optimization strategies without interference.

```python
from dspy.teleprompt.gepa.instruction_proposal import MultiModalInstructionProposer

gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key),
instruction_proposer=MultiModalInstructionProposer(), # For signatures
optimize_tool_descriptions=True, # Enables ToolProposer for tools
auto="medium"
)
```
6 changes: 6 additions & 0 deletions docs/docs/api/optimizers/GEPA/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,12 @@ Practical Recipe for GEPA-Friendly Feedback:
- **Multi-Objective Tasks** (e.g., PUPA): Decompose aggregate scores to reveal contributions from each objective, highlighting tradeoffs (e.g., quality vs. privacy).
- **Stacked Pipelines** (e.g., code generation: parse → compile → run → profile → evaluate): Expose stage-specific failures; natural-language traces often suffice for LLM self-correction.

## Tool Description Optimization

GEPA can optimize tool descriptions for ReAct agents. When `optimize_tool_descriptions=True`, GEPA discovers all tools in your program (including nested multi-agent systems) and applies a specialized reflection prompt to improve how tools are described. This helps agents make better tool selection decisions by learning from execution traces which tools work well in which contexts.

For details on how tool optimization works, when to use it, and usage examples, see [Tool Description Optimization](GEPA_Advanced.md#tool-description-optimization) in the Advanced Features guide.

## Custom Instruction Proposal

For advanced customization of GEPA's instruction proposal mechanism, including custom instruction proposers and component selectors, see [Advanced Features](GEPA_Advanced.md).
Expand Down
23 changes: 22 additions & 1 deletion dspy/teleprompt/gepa/gepa.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,11 @@ def metric(
warn_on_score_mismatch: GEPA (currently) expects the metric to return the same module-level score when
called with and without the pred_name. This flag (defaults to True) determines whether a warning is
raised if a mismatch in module-level and predictor-level score is detected.
optimize_tool_descriptions: Whether to optimize tool descriptions for modules with tools
(e.g., ReAct agents). When enabled, tool descriptions are included in the optimization
process alongside signature instructions. See the
[Tool Description Optimization guide](https://dspy.ai/api/optimizers/GEPA/GEPA_Advanced/#tool-description-optimization)
for details on when to use this feature and how it works. Default is False.
seed: The random seed to use for reproducibility. Default is 0.
gepa_kwargs: (Optional) provide additional kwargs to be passed to [gepa.optimize](https://github.com/gepa-ai/gepa/blob/main/src/gepa/api.py) method

Expand Down Expand Up @@ -328,6 +333,7 @@ def __init__(
wandb_init_kwargs: dict[str, Any] | None = None,
track_best_outputs: bool = False,
warn_on_score_mismatch: bool = True,
optimize_tool_descriptions: bool = False,
use_mlflow: bool = False,
# Reproducibility
seed: int | None = 0,
Expand Down Expand Up @@ -390,6 +396,7 @@ def __init__(
self.wandb_api_key = wandb_api_key
self.wandb_init_kwargs = wandb_init_kwargs
self.warn_on_score_mismatch = warn_on_score_mismatch
self.optimize_tool_descriptions = optimize_tool_descriptions
self.use_mlflow = use_mlflow

if track_best_outputs:
Expand Down Expand Up @@ -518,11 +525,25 @@ def feedback_fn(
rng=rng,
reflection_lm=self.reflection_lm,
custom_instruction_proposer=self.custom_instruction_proposer,
warn_on_score_mismatch=self.warn_on_score_mismatch
warn_on_score_mismatch=self.warn_on_score_mismatch,
optimize_tool_descriptions=self.optimize_tool_descriptions
)

# Instantiate GEPA with the simpler adapter-based API
base_program = {name: pred.signature.instructions for name, pred in student.named_predictors()}

if self.optimize_tool_descriptions:
tool_descriptions = {}
for _, module in student.named_sub_modules():
if hasattr(module, "tools"):
for tool_name, tool in module.tools.items():
tool_key = f"tool:{tool_name}"
if tool_key not in tool_descriptions:
tool_descriptions[tool_key] = tool.desc
if tool_descriptions:
logger.info(f"Including {len(tool_descriptions)} tool descriptions for optimization")
base_program.update(tool_descriptions)

gepa_result: GEPAResult = optimize(
seed_candidate=base_program,
trainset=trainset,
Expand Down
Loading