stanfordnlp · Ju-usc · Oct 10, 2025 · Oct 10, 2025 · Oct 10, 2025 · Oct 10, 2025
diff --git a/docs/docs/api/optimizers/GEPA/GEPA_Advanced.md b/docs/docs/api/optimizers/GEPA/GEPA_Advanced.md
@@ -443,3 +443,301 @@ gepa = dspy.GEPA(
     auto="medium"
 )
 ```
+
+## Tool Description Optimization
+
+### What is optimize_tool_descriptions?
+
+The `optimize_tool_descriptions` parameter enables GEPA to optimize tool descriptions in addition to signature instructions. This is particularly valuable for ReAct agents and other tool-using systems, where the quality of tool descriptions directly impacts the agent's ability to select appropriate tools for each task.
+
+Unlike signature instructions that guide reasoning strategies, tool descriptions serve a different purpose: they help agents decide **which tool to use** in a given situation. GEPA applies a specialized reflection prompt tailored for tool selection decisions.
+
+### Tool-Specific Reflection Prompt
+
+GEPA uses a dedicated prompt for optimizing tool descriptions. The prompt receives the complete ReAct trajectory (all thoughts, actions, observations) from executions that used the tool being optimized:
+
+```python
+class GenerateImprovedToolDescriptionFromFeedback(dspy.Signature):
+    """You are refining a tool description that the assistant currently uses.
+
+    Review the current description along with examples of the assistant's tool decisions 
+    and the feedback those decisions received.
+
+    Read them together and refine the description.
+    So the agent understands when this tool actually helps, what argument or result matters, 
+    and what misuse the feedback exposed. Keep the tool's voice and only change what the 
+    evidence justifies.
+
+    Return a refined description that helps the assistant quickly recognize good 
+    opportunities for the tool."""
+
+    current_tool_description = dspy.InputField(desc="The current description of the tool")
+    examples_with_feedback = dspy.InputField(
+        desc="Examples showing tool usage decisions and feedback on correctness"
+    )
+
+    improved_tool_description = dspy.OutputField(
+        desc="An improved description that guides correct tool selection and usage"
+    )
+```
+
+The `examples_with_feedback` contains full ReAct trajectories showing the complete context in which each tool was selected and used, enabling the reflection LM to understand tool selection patterns.
+
+**Example: Writing Tool-Aware Metrics**
+
+To provide effective feedback for tool optimization, write metrics that examine the trajectory:
+
+```python
+def tool_feedback_metric(example, prediction, trace=None, pred_name=None, pred_trace=None):
+    """Metric that provides tool-specific feedback for GEPA optimization."""
+    correct = prediction.answer == example.answer
+    score = 1.0 if correct else 0.0
+
+    # Generate tool-specific feedback if available
+    if hasattr(prediction, 'trajectory'):
+        tools_used = [
+            prediction.trajectory[key] 
+            for key in prediction.trajectory 
+            if key.startswith('tool_name_') and prediction.trajectory[key] != 'finish'
+        ]
+        feedback = f"{'Correct' if correct else 'Wrong'}. Tools: {', '.join(tools_used)}"
+    else:
+        feedback = "Correct" if correct else "Wrong"
+
+    return dspy.Prediction(score=score, feedback=feedback)
+```
+
+This produces feedback like:
+```
+[Tool 'calculator' from 'agent'] Correct. Tools: calculator
+[Tool 'search' from 'agent'] Wrong. Tools: search, calculator
+```
+
+The tool-specific prefix `[Tool 'calculator' from 'agent']` is automatically added by GEPA to focus the reflection LM on optimizing that particular tool's description.
+
+**Note:** Tool descriptions are treated as components in GEPA's optimization process. The `component_selector` parameter applies to both signature instructions and tool descriptions. For example, `component_selector="all"` optimizes all signatures and tools together, while `component_selector="round_robin"` cycles through them one at a time.
+
+### Default Behavior
+
+By default, GEPA only optimizes signature instructions (`optimize_tool_descriptions=False`):
+
+```python
+# Default behavior: only signature optimization
+gepa = dspy.GEPA(
+    metric=my_metric,
+    reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key),
+    # optimize_tool_descriptions=False  # This is the default
+    auto="medium"
+)
+optimized_program = gepa.compile(student, trainset=examples)
+```
+
+### How It Works
+
+When enabled, GEPA:
+
+1. **Discovers all tools**: Traverses your program including nested sub-modules to find all `dspy.Tool` instances
+2. **Categorizes components**: Separates tools (identified by `tool:` prefix) from signature instructions
+3. **Routes components appropriately**:
+   - Signature instructions → Default or custom instruction proposer
+   - Tool descriptions → ToolProposer (receives ReAct's reflective data with tool-specific annotation)
+4. **Optimizes holistically**: Treats tool descriptions as first-class components in the optimization process
+
+### Implementation Details
+
+**Reflective Dataset Construction:**
+
+GEPA constructs the reflective dataset for tool optimization in two passes:
+
+**Pass 1: Build reflective examples for predictors (used by instruction proposer)**
+
+For each predictor (including ReAct modules), GEPA creates reflective examples containing:
+- **Inputs**: The predictor's input fields (e.g., `{"question": "..."}`)
+- **Generated Outputs**: ALL of the predictor's output fields converted to strings
+  - For ReAct: This includes both `answer` AND `trajectory` fields
+  - The trajectory contains the complete execution trace with all thoughts, actions, and observations
+- **Feedback**: Text feedback returned by your metric function
+
+These examples are used by the instruction proposer to optimize signature instructions.
+
+**Pass 2: Copy reflective examples to tools with annotation (used by tool proposer)**
+
+For each tool being optimized, GEPA:
+- Identifies ALL ReAct predictors (across all nested modules) that have this tool in their toolset
+- Takes ALL reflective examples from those predictors and makes a deep copy for the tool
+- Annotates the feedback: `[Tool 'tool_name' from 'predictor_key'] {original_feedback}`
+- If multiple ReAct modules use the same tool, their reflective examples are aggregated together
+
+These annotated examples are used by the tool proposer (with the tool-specific reflection prompt shown above) to optimize tool descriptions.
+
+This means:
+- A tool receives the FULL ReAct trajectory (thoughts, actions, observations) in the "Generated Outputs" field
+- The metric can optionally examine the trajectory and include tool-specific insights in the feedback text
+- The reflection LM sees complete context about how and when the tool was used
+
+**Component Identification & Proposer Routing:**
+
+GEPA discovers tools by traversing ReAct modules and extracting their associated `dspy.Tool` instances. Once identified, GEPA routes components to appropriate proposers:
+- **Signature instructions** → Custom instruction proposer (if provided) OR default GEPA proposer
+- **Tool descriptions** → Built-in `ToolProposer` (always used, not customizable)
+
+The custom instruction proposer affects ONLY signature instructions. Tools always use the specialized `ToolProposer` with the tool-specific reflection prompt, regardless of whether you provide a custom instruction proposer.
+
+### When to Use optimize_tool_descriptions
+
+Enable `optimize_tool_descriptions=True` when you use `dspy.Tool` in your program and need better tool selection. Here are common scenarios:
+
+1. **ReAct agents with multiple tools** - Agent with `search` and `calculator` tools keeps searching when it should calculate, or vice versa. GEPA learns from execution feedback to clarify "use search for factual queries, calculator for numerical analysis."
+
+2. **Multi-agent systems with delegation** - Parent agent has delegation tools to specialized sub-agents but doesn't understand when to use each. GEPA optimizes both delegation tools and sub-agent internal tools holistically.
+
+3. **Sequential tool workflows** - Tools like `query_database` → `analyze_results` have dependencies but descriptions don't capture this. GEPA learns the sequence and timing from successful executions.
+
+4. **Domain-specific tools** - Tools like legal vs. medical document search have overlapping but domain-specific purposes. GEPA discovers usage patterns and adds context: "for legal precedents" vs. "for patient records."
+
+5. **Tools with limitations** - Initial description "Does calculations" is too vague. GEPA adds specificity from observed usage: "Use for arithmetic (+, -, *, /, **). Not for date math or string operations."
+
+See the usage examples below for implementations of scenarios 1 and 2.
+
+### Usage Examples
+
+#### Basic ReAct Agent
+
+```python
+import dspy
+
+def search_web(query: str) -> str:
+    return f"Search results for: {query}"
+
+def calculate(expression: str) -> float:
+    return eval(expression)
+
+# Create ReAct agent with tools (poor initial descriptions)
+search_tool = dspy.Tool(search_web, name="search", desc="Finds things")
+calc_tool = dspy.Tool(calculate, name="calculator", desc="Does calculations")
+
+agent = dspy.ReAct("question -> answer", tools=[search_tool, calc_tool])
+
+# Enable tool optimization
+gepa = dspy.GEPA(
+    metric=my_metric,
+    reflection_lm=dspy.LM(model="gpt-5-mini"),
+    optimize_tool_descriptions=True,
+    component_selector="all",  # Optimize all components together
+    auto="medium"
+)
+
+optimized_agent = gepa.compile(agent, trainset=train_examples, valset=val_examples)
+
+# View optimized tool descriptions
+print("Optimized search tool:", optimized_agent.tools["search"].desc)
+print("Optimized calculator tool:", optimized_agent.tools["calculator"].desc)
+```
+
+**Example output after optimization:**
+```
+Optimized search tool: Use when you need to find current information, facts, or data 
+    from external sources. Provide specific search queries to get relevant results.
+
+Optimized calculator tool: Use for arithmetic operations and mathematical expressions. 
+    Accepts Python-compatible expressions with numbers and operators (+, -, *, /, **). 
+    Do not use for date calculations or string manipulations.
+```
+
+#### Multi-Agent System
+
+GEPA automatically discovers and optimizes tools in nested agents:
+
+```python
+import dspy
+
+def search_web(query: str) -> str:
+    return f"Search results for: {query}"
+
+def calculate(expression: str) -> float:
+    return eval(expression)
+
+search_tool = dspy.Tool(search_web, name="search", desc="Searches")
+calc_tool = dspy.Tool(calculate, name="calculator", desc="Computes")
+
+class ResearchAssistant(dspy.Module):
+    def __init__(self):
+        super().__init__()
+        self.researcher = dspy.ReAct("query -> findings", tools=[search_tool])
+
+        def delegate_research(query: str) -> str:
+            return self.researcher(query=query).findings
+
+        research_tool = dspy.Tool(delegate_research, name="research", desc="Helps with questions")
+        self.assistant = dspy.ReAct("question -> answer", tools=[research_tool, calc_tool])
+
+    def forward(self, question):
+        return self.assistant(question=question)
+
+# Optimizes ALL tools: calculator, research, search
+gepa = dspy.GEPA(
+    metric=my_metric,
+    reflection_lm=dspy.LM(model="gpt-5-mini"),
+    optimize_tool_descriptions=True,
+    component_selector="all",
+    auto="medium"
+)
+
+optimized_system = gepa.compile(ResearchAssistant(), trainset=train, valset=val)
+
+# View optimized nested tool descriptions
+print(optimized_system.researcher.tools["search"].desc)
+print(optimized_system.assistant.tools["research"].desc)
+print(optimized_system.assistant.tools["calculator"].desc)
+```
+
+### Inspecting Optimized Tool Descriptions
+
+After optimization, tool descriptions are automatically updated in your program. Access them directly through your module structure:
+
+```python
+optimized_agent = gepa.compile(agent, trainset=train, valset=val)
+
+# Access tools directly - descriptions are already updated
+print(optimized_agent.tools["search"].desc)
+print(optimized_agent.tools["calculator"].desc)
+```
+
+For multi-agent systems, access nested tools through your module hierarchy:
+
+```python
+optimized_system = gepa.compile(ResearchAssistant(), trainset=train, valset=val)
+
+# Access tools at different levels
+print(optimized_system.researcher.tools["search"].desc)  # Sub-agent tool
+print(optimized_system.assistant.tools["research"].desc)  # Main agent tool
+print(optimized_system.assistant.tools["calculator"].desc)
+```
+
+### Compatibility with Custom Instruction Proposers
+
+Tool optimization works seamlessly with custom instruction proposers. When you provide a custom instruction proposer AND enable `optimize_tool_descriptions=True`:
+
+**Component routing:**
+- **Signature instructions** → Your custom instruction proposer
+- **Tool descriptions** → Built-in `ToolProposer` with specialized tool reflection prompt
+
+**Key points:**
+- Both operate independently during the same GEPA run
+- Tools receive domain-appropriate optimization guidance (tool selection patterns, usage context)
+- Signatures use your custom logic (task-specific reasoning, formatting, etc.)
+- The built-in tool proposer is not customizable - it always uses `GenerateImprovedToolDescriptionFromFeedback`
+
+This separation ensures tools and signatures get appropriate optimization strategies without interference.
+
+```python
+from dspy.teleprompt.gepa.instruction_proposal import MultiModalInstructionProposer
+
+gepa = dspy.GEPA(
+    metric=my_metric,
+    reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key),
+    instruction_proposer=MultiModalInstructionProposer(),  # For signatures
+    optimize_tool_descriptions=True,  # Enables ToolProposer for tools
+    auto="medium"
+)
+```
diff --git a/docs/docs/api/optimizers/GEPA/overview.md b/docs/docs/api/optimizers/GEPA/overview.md
@@ -117,6 +117,12 @@ Practical Recipe for GEPA-Friendly Feedback:
 - **Multi-Objective Tasks** (e.g., PUPA): Decompose aggregate scores to reveal contributions from each objective, highlighting tradeoffs (e.g., quality vs. privacy).
 - **Stacked Pipelines** (e.g., code generation: parse → compile → run → profile → evaluate): Expose stage-specific failures; natural-language traces often suffice for LLM self-correction.
 
+## Tool Description Optimization
+
+GEPA can optimize tool descriptions for ReAct agents. When `optimize_tool_descriptions=True`, GEPA discovers all tools in your program (including nested multi-agent systems) and applies a specialized reflection prompt to improve how tools are described. This helps agents make better tool selection decisions by learning from execution traces which tools work well in which contexts.
+
+For details on how tool optimization works, when to use it, and usage examples, see [Tool Description Optimization](GEPA_Advanced.md#tool-description-optimization) in the Advanced Features guide.
+
 ## Custom Instruction Proposal
 
 For advanced customization of GEPA's instruction proposal mechanism, including custom instruction proposers and component selectors, see [Advanced Features](GEPA_Advanced.md).

diff --git a/dspy/teleprompt/gepa/gepa.py b/dspy/teleprompt/gepa/gepa.py
@@ -273,6 +273,11 @@ def metric(
         warn_on_score_mismatch: GEPA (currently) expects the metric to return the same module-level score when 
             called with and without the pred_name. This flag (defaults to True) determines whether a warning is 
             raised if a mismatch in module-level and predictor-level score is detected.
+        optimize_tool_descriptions: Whether to optimize tool descriptions for modules with tools 
+            (e.g., ReAct agents). When enabled, tool descriptions are included in the optimization 
+            process alongside signature instructions. See the 
+            [Tool Description Optimization guide](https://dspy.ai/api/optimizers/GEPA/GEPA_Advanced/#tool-description-optimization) 
+            for details on when to use this feature and how it works. Default is False.
         seed: The random seed to use for reproducibility. Default is 0.
         gepa_kwargs: (Optional) provide additional kwargs to be passed to [gepa.optimize](https://github.com/gepa-ai/gepa/blob/main/src/gepa/api.py) method
 
@@ -328,6 +333,7 @@ def __init__(
         wandb_init_kwargs: dict[str, Any] | None = None,
         track_best_outputs: bool = False,
         warn_on_score_mismatch: bool = True,
+        optimize_tool_descriptions: bool = False,
         use_mlflow: bool = False,
         # Reproducibility
         seed: int | None = 0,
@@ -390,6 +396,7 @@ def __init__(
         self.wandb_api_key = wandb_api_key
         self.wandb_init_kwargs = wandb_init_kwargs
         self.warn_on_score_mismatch = warn_on_score_mismatch
+        self.optimize_tool_descriptions = optimize_tool_descriptions
         self.use_mlflow = use_mlflow
 
         if track_best_outputs:
@@ -518,11 +525,25 @@ def feedback_fn(
             rng=rng,
             reflection_lm=self.reflection_lm,
             custom_instruction_proposer=self.custom_instruction_proposer,
-            warn_on_score_mismatch=self.warn_on_score_mismatch
+            warn_on_score_mismatch=self.warn_on_score_mismatch,
+            optimize_tool_descriptions=self.optimize_tool_descriptions
         )
 
         # Instantiate GEPA with the simpler adapter-based API
         base_program = {name: pred.signature.instructions for name, pred in student.named_predictors()}
+
+        if self.optimize_tool_descriptions:
+            tool_descriptions = {}
+            for _, module in student.named_sub_modules():
+                if hasattr(module, "tools"):
+                    for tool_name, tool in module.tools.items():
+                        tool_key = f"tool:{tool_name}"
+                        if tool_key not in tool_descriptions:
+                            tool_descriptions[tool_key] = tool.desc
+            if tool_descriptions:
+                logger.info(f"Including {len(tool_descriptions)} tool descriptions for optimization")
+                base_program.update(tool_descriptions)
+
         gepa_result: GEPAResult = optimize(
             seed_candidate=base_program,
             trainset=trainset,