-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat(gepa): add tool description optimization for multi-agent systems #8928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Ju-usc
wants to merge
19
commits into
stanfordnlp:main
Choose a base branch
from
Ju-usc:feature/tool-description-optimization
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
6412a5d
feat(gepa): add tool description optimization for multi-agent systems
Ju-usc cf0be4f
style: fix ruff formatting (trailing whitespace)
Ju-usc aa53fe2
style: apply ruff formatting fixes
Ju-usc 045c6cf
feat(gepa): implement tool-specific proposer for tool descriptions
Ju-usc c4f2041
docs(gepa): clean up multi-agent example code
Ju-usc 260ca80
refactor(gepa): simplify tool reflective dataset with ReAct context r…
Ju-usc 04f7e3d
fix(gepa): unify custom proposer routing for tools
Ju-usc f92e184
docs(gepa): clarify tool reflection prompt
Ju-usc 7178869
test: streamline GEPA tool optimization tests
Ju-usc e34703b
fix(gepa): streamline tool proposer formatting
Ju-usc 3f05311
test(gepa): drop legacy dummy tool fixture
Ju-usc 4df9ce5
docs(gepa): add tool-specific reflection prompt and metric example
Ju-usc 4296ccf
docs(gepa): fix implementation details with accurate code flow
Ju-usc ea1204a
docs(gepa): remove backward compatibility note
Ju-usc 48d5cd6
docs(gepa): improve usage examples with optimization visualization
Ju-usc 548d9b6
docs(gepa): add design rationale comments for tool context sharing
Ju-usc e61d0a1
docs(gepa): add tool optimization links to overview and parameter docs
Ju-usc 5c95412
docs(gepa): refine tool optimization scenarios and remove implementat…
Ju-usc 19d7717
docs(gepa): clarify future work section in code comments
Ju-usc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -443,3 +443,301 @@ gepa = dspy.GEPA( | |
auto="medium" | ||
) | ||
``` | ||
|
||
## Tool Description Optimization | ||
|
||
### What is optimize_tool_descriptions? | ||
|
||
The `optimize_tool_descriptions` parameter enables GEPA to optimize tool descriptions in addition to signature instructions. This is particularly valuable for ReAct agents and other tool-using systems, where the quality of tool descriptions directly impacts the agent's ability to select appropriate tools for each task. | ||
|
||
Unlike signature instructions that guide reasoning strategies, tool descriptions serve a different purpose: they help agents decide **which tool to use** in a given situation. GEPA applies a specialized reflection prompt tailored for tool selection decisions. | ||
|
||
### Tool-Specific Reflection Prompt | ||
|
||
GEPA uses a dedicated prompt for optimizing tool descriptions. The prompt receives the complete ReAct trajectory (all thoughts, actions, observations) from executions that used the tool being optimized: | ||
|
||
```python | ||
class GenerateImprovedToolDescriptionFromFeedback(dspy.Signature): | ||
"""You are refining a tool description that the assistant currently uses. | ||
|
||
Review the current description along with examples of the assistant's tool decisions | ||
and the feedback those decisions received. | ||
|
||
Read them together and refine the description. | ||
So the agent understands when this tool actually helps, what argument or result matters, | ||
and what misuse the feedback exposed. Keep the tool's voice and only change what the | ||
evidence justifies. | ||
|
||
Return a refined description that helps the assistant quickly recognize good | ||
opportunities for the tool.""" | ||
|
||
current_tool_description = dspy.InputField(desc="The current description of the tool") | ||
examples_with_feedback = dspy.InputField( | ||
desc="Examples showing tool usage decisions and feedback on correctness" | ||
) | ||
|
||
improved_tool_description = dspy.OutputField( | ||
desc="An improved description that guides correct tool selection and usage" | ||
) | ||
``` | ||
|
||
The `examples_with_feedback` contains full ReAct trajectories showing the complete context in which each tool was selected and used, enabling the reflection LM to understand tool selection patterns. | ||
|
||
**Example: Writing Tool-Aware Metrics** | ||
|
||
To provide effective feedback for tool optimization, write metrics that examine the trajectory: | ||
|
||
```python | ||
def tool_feedback_metric(example, prediction, trace=None, pred_name=None, pred_trace=None): | ||
"""Metric that provides tool-specific feedback for GEPA optimization.""" | ||
correct = prediction.answer == example.answer | ||
score = 1.0 if correct else 0.0 | ||
|
||
# Generate tool-specific feedback if available | ||
if hasattr(prediction, 'trajectory'): | ||
tools_used = [ | ||
prediction.trajectory[key] | ||
for key in prediction.trajectory | ||
if key.startswith('tool_name_') and prediction.trajectory[key] != 'finish' | ||
] | ||
feedback = f"{'Correct' if correct else 'Wrong'}. Tools: {', '.join(tools_used)}" | ||
else: | ||
feedback = "Correct" if correct else "Wrong" | ||
|
||
return dspy.Prediction(score=score, feedback=feedback) | ||
``` | ||
|
||
This produces feedback like: | ||
``` | ||
[Tool 'calculator' from 'agent'] Correct. Tools: calculator | ||
[Tool 'search' from 'agent'] Wrong. Tools: search, calculator | ||
``` | ||
|
||
The tool-specific prefix `[Tool 'calculator' from 'agent']` is automatically added by GEPA to focus the reflection LM on optimizing that particular tool's description. | ||
|
||
**Note:** Tool descriptions are treated as components in GEPA's optimization process. The `component_selector` parameter applies to both signature instructions and tool descriptions. For example, `component_selector="all"` optimizes all signatures and tools together, while `component_selector="round_robin"` cycles through them one at a time. | ||
|
||
### Default Behavior | ||
|
||
By default, GEPA only optimizes signature instructions (`optimize_tool_descriptions=False`): | ||
|
||
```python | ||
# Default behavior: only signature optimization | ||
gepa = dspy.GEPA( | ||
metric=my_metric, | ||
reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key), | ||
# optimize_tool_descriptions=False # This is the default | ||
auto="medium" | ||
) | ||
optimized_program = gepa.compile(student, trainset=examples) | ||
``` | ||
|
||
### How It Works | ||
|
||
When enabled, GEPA: | ||
|
||
1. **Discovers all tools**: Traverses your program including nested sub-modules to find all `dspy.Tool` instances | ||
2. **Categorizes components**: Separates tools (identified by `tool:` prefix) from signature instructions | ||
3. **Routes components appropriately**: | ||
- Signature instructions → Default or custom instruction proposer | ||
- Tool descriptions → ToolProposer (receives ReAct's reflective data with tool-specific annotation) | ||
4. **Optimizes holistically**: Treats tool descriptions as first-class components in the optimization process | ||
|
||
### Implementation Details | ||
|
||
**Reflective Dataset Construction:** | ||
|
||
GEPA constructs the reflective dataset for tool optimization in two passes: | ||
|
||
**Pass 1: Build reflective examples for predictors (used by instruction proposer)** | ||
|
||
For each predictor (including ReAct modules), GEPA creates reflective examples containing: | ||
- **Inputs**: The predictor's input fields (e.g., `{"question": "..."}`) | ||
- **Generated Outputs**: ALL of the predictor's output fields converted to strings | ||
- For ReAct: This includes both `answer` AND `trajectory` fields | ||
- The trajectory contains the complete execution trace with all thoughts, actions, and observations | ||
- **Feedback**: Text feedback returned by your metric function | ||
|
||
These examples are used by the instruction proposer to optimize signature instructions. | ||
|
||
**Pass 2: Copy reflective examples to tools with annotation (used by tool proposer)** | ||
|
||
For each tool being optimized, GEPA: | ||
- Identifies ALL ReAct predictors (across all nested modules) that have this tool in their toolset | ||
- Takes ALL reflective examples from those predictors and makes a deep copy for the tool | ||
- Annotates the feedback: `[Tool 'tool_name' from 'predictor_key'] {original_feedback}` | ||
- If multiple ReAct modules use the same tool, their reflective examples are aggregated together | ||
|
||
These annotated examples are used by the tool proposer (with the tool-specific reflection prompt shown above) to optimize tool descriptions. | ||
|
||
This means: | ||
- A tool receives the FULL ReAct trajectory (thoughts, actions, observations) in the "Generated Outputs" field | ||
- The metric can optionally examine the trajectory and include tool-specific insights in the feedback text | ||
- The reflection LM sees complete context about how and when the tool was used | ||
|
||
**Component Identification & Proposer Routing:** | ||
|
||
GEPA discovers tools by traversing ReAct modules and extracting their associated `dspy.Tool` instances. Once identified, GEPA routes components to appropriate proposers: | ||
- **Signature instructions** → Custom instruction proposer (if provided) OR default GEPA proposer | ||
- **Tool descriptions** → Built-in `ToolProposer` (always used, not customizable) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is customizable, right? If not, shouldn't we make this customizable too? |
||
|
||
The custom instruction proposer affects ONLY signature instructions. Tools always use the specialized `ToolProposer` with the tool-specific reflection prompt, regardless of whether you provide a custom instruction proposer. | ||
|
||
### When to Use optimize_tool_descriptions | ||
|
||
Enable `optimize_tool_descriptions=True` when you use `dspy.Tool` in your program and need better tool selection. Here are common scenarios: | ||
|
||
1. **ReAct agents with multiple tools** - Agent with `search` and `calculator` tools keeps searching when it should calculate, or vice versa. GEPA learns from execution feedback to clarify "use search for factual queries, calculator for numerical analysis." | ||
|
||
2. **Multi-agent systems with delegation** - Parent agent has delegation tools to specialized sub-agents but doesn't understand when to use each. GEPA optimizes both delegation tools and sub-agent internal tools holistically. | ||
|
||
3. **Sequential tool workflows** - Tools like `query_database` → `analyze_results` have dependencies but descriptions don't capture this. GEPA learns the sequence and timing from successful executions. | ||
|
||
4. **Domain-specific tools** - Tools like legal vs. medical document search have overlapping but domain-specific purposes. GEPA discovers usage patterns and adds context: "for legal precedents" vs. "for patient records." | ||
|
||
5. **Tools with limitations** - Initial description "Does calculations" is too vague. GEPA adds specificity from observed usage: "Use for arithmetic (+, -, *, /, **). Not for date math or string operations." | ||
|
||
See the usage examples below for implementations of scenarios 1 and 2. | ||
|
||
### Usage Examples | ||
|
||
#### Basic ReAct Agent | ||
|
||
```python | ||
import dspy | ||
|
||
def search_web(query: str) -> str: | ||
return f"Search results for: {query}" | ||
|
||
def calculate(expression: str) -> float: | ||
return eval(expression) | ||
|
||
# Create ReAct agent with tools (poor initial descriptions) | ||
search_tool = dspy.Tool(search_web, name="search", desc="Finds things") | ||
calc_tool = dspy.Tool(calculate, name="calculator", desc="Does calculations") | ||
|
||
agent = dspy.ReAct("question -> answer", tools=[search_tool, calc_tool]) | ||
|
||
# Enable tool optimization | ||
gepa = dspy.GEPA( | ||
metric=my_metric, | ||
reflection_lm=dspy.LM(model="gpt-5-mini"), | ||
optimize_tool_descriptions=True, | ||
component_selector="all", # Optimize all components together | ||
auto="medium" | ||
) | ||
|
||
optimized_agent = gepa.compile(agent, trainset=train_examples, valset=val_examples) | ||
|
||
# View optimized tool descriptions | ||
print("Optimized search tool:", optimized_agent.tools["search"].desc) | ||
print("Optimized calculator tool:", optimized_agent.tools["calculator"].desc) | ||
``` | ||
|
||
**Example output after optimization:** | ||
``` | ||
Optimized search tool: Use when you need to find current information, facts, or data | ||
from external sources. Provide specific search queries to get relevant results. | ||
|
||
Optimized calculator tool: Use for arithmetic operations and mathematical expressions. | ||
Accepts Python-compatible expressions with numbers and operators (+, -, *, /, **). | ||
Do not use for date calculations or string manipulations. | ||
``` | ||
|
||
#### Multi-Agent System | ||
|
||
GEPA automatically discovers and optimizes tools in nested agents: | ||
|
||
```python | ||
import dspy | ||
|
||
def search_web(query: str) -> str: | ||
return f"Search results for: {query}" | ||
|
||
def calculate(expression: str) -> float: | ||
return eval(expression) | ||
|
||
search_tool = dspy.Tool(search_web, name="search", desc="Searches") | ||
calc_tool = dspy.Tool(calculate, name="calculator", desc="Computes") | ||
|
||
class ResearchAssistant(dspy.Module): | ||
def __init__(self): | ||
super().__init__() | ||
self.researcher = dspy.ReAct("query -> findings", tools=[search_tool]) | ||
|
||
def delegate_research(query: str) -> str: | ||
return self.researcher(query=query).findings | ||
|
||
research_tool = dspy.Tool(delegate_research, name="research", desc="Helps with questions") | ||
self.assistant = dspy.ReAct("question -> answer", tools=[research_tool, calc_tool]) | ||
|
||
def forward(self, question): | ||
return self.assistant(question=question) | ||
|
||
# Optimizes ALL tools: calculator, research, search | ||
gepa = dspy.GEPA( | ||
metric=my_metric, | ||
reflection_lm=dspy.LM(model="gpt-5-mini"), | ||
optimize_tool_descriptions=True, | ||
component_selector="all", | ||
auto="medium" | ||
) | ||
|
||
optimized_system = gepa.compile(ResearchAssistant(), trainset=train, valset=val) | ||
|
||
# View optimized nested tool descriptions | ||
print(optimized_system.researcher.tools["search"].desc) | ||
print(optimized_system.assistant.tools["research"].desc) | ||
print(optimized_system.assistant.tools["calculator"].desc) | ||
``` | ||
|
||
### Inspecting Optimized Tool Descriptions | ||
|
||
After optimization, tool descriptions are automatically updated in your program. Access them directly through your module structure: | ||
|
||
```python | ||
optimized_agent = gepa.compile(agent, trainset=train, valset=val) | ||
|
||
# Access tools directly - descriptions are already updated | ||
print(optimized_agent.tools["search"].desc) | ||
print(optimized_agent.tools["calculator"].desc) | ||
``` | ||
|
||
For multi-agent systems, access nested tools through your module hierarchy: | ||
|
||
```python | ||
optimized_system = gepa.compile(ResearchAssistant(), trainset=train, valset=val) | ||
|
||
# Access tools at different levels | ||
print(optimized_system.researcher.tools["search"].desc) # Sub-agent tool | ||
print(optimized_system.assistant.tools["research"].desc) # Main agent tool | ||
print(optimized_system.assistant.tools["calculator"].desc) | ||
``` | ||
|
||
### Compatibility with Custom Instruction Proposers | ||
|
||
Tool optimization works seamlessly with custom instruction proposers. When you provide a custom instruction proposer AND enable `optimize_tool_descriptions=True`: | ||
|
||
**Component routing:** | ||
- **Signature instructions** → Your custom instruction proposer | ||
- **Tool descriptions** → Built-in `ToolProposer` with specialized tool reflection prompt | ||
|
||
**Key points:** | ||
- Both operate independently during the same GEPA run | ||
- Tools receive domain-appropriate optimization guidance (tool selection patterns, usage context) | ||
- Signatures use your custom logic (task-specific reasoning, formatting, etc.) | ||
- The built-in tool proposer is not customizable - it always uses `GenerateImprovedToolDescriptionFromFeedback` | ||
|
||
This separation ensures tools and signatures get appropriate optimization strategies without interference. | ||
|
||
```python | ||
from dspy.teleprompt.gepa.instruction_proposal import MultiModalInstructionProposer | ||
|
||
gepa = dspy.GEPA( | ||
metric=my_metric, | ||
reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key), | ||
instruction_proposer=MultiModalInstructionProposer(), # For signatures | ||
optimize_tool_descriptions=True, # Enables ToolProposer for tools | ||
auto="medium" | ||
) | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.