Skip to content

Commit 95dbb2b

Browse files
committed
Add docs
1 parent 7bb085c commit 95dbb2b

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed

docs/evals.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -766,6 +766,52 @@ async def main():
766766

767767
_(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main(answer))` to run `main`)_
768768

769+
### Generating from an Existing Agent
770+
771+
If you already have an agent, you can use [`generate_evals_from_agent`][pydantic_evals.generation.generate_evals_from_agent] to automatically extract types from the agent and generate test cases. This is simpler than `generate_dataset` because you don't need to manually specify the dataset type or generic parameters.
772+
773+
```python {title="generate_from_agent_example.py"}
774+
from pydantic import BaseModel
775+
from pydantic_ai import Agent
776+
777+
from pydantic_evals.generation import generate_evals_from_agent
778+
779+
780+
class AnswerOutput(BaseModel):
781+
"""Model for expected answer outputs."""
782+
783+
answer: str
784+
confidence: float
785+
786+
787+
agent = Agent( # (1)!
788+
'openai:gpt-4o',
789+
output_type=AnswerOutput,
790+
system_prompt='You are a helpful assistant that answers questions about world geography.',
791+
)
792+
793+
794+
async def main():
795+
dataset = await generate_evals_from_agent( # (2)!
796+
agent=agent,
797+
n_examples=3,
798+
model='openai:gpt-4o',
799+
path='agent_test_cases.json',
800+
extra_instructions='Generate questions about world capitals and landmarks.',
801+
)
802+
print(f'Generated {len(dataset.cases)} test cases')
803+
```
804+
805+
1. Create an agent with a defined output type and system prompt.
806+
2. Generate test cases by extracting types from the agent. The function will:
807+
- Use an LLM to generate diverse input prompts based on the agent's configuration
808+
- Run each input through the actual agent to get real outputs
809+
- Save the inputs and outputs as test cases
810+
811+
This approach ensures your test cases use realistic outputs from your actual agent, rather than having an LLM imagine what the outputs should be.
812+
813+
_(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_
814+
769815
## Integration with Logfire
770816

771817
Pydantic Evals is implemented using OpenTelemetry to record traces of the evaluation process. These traces contain all

0 commit comments

Comments
 (0)