feat: Optional Case specific Goal for GoalSuccessRateEvaluator #75

dbermuehler · 2025-12-17T13:47:04Z

Description

Related Issues

Documentation PR

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ccessRateEvaluator

strands-agent

Review: ✅ LGTM with Minor Suggestions

Great feature addition! This allows users to provide case-specific goals for the GoalSuccessRateEvaluator, which is very useful for more targeted evaluations.

What's Good:

Clean Implementation: The goal is read from evaluation_case.metadata.get("goal") - elegant and backward-compatible
New Prompt Template (v1): Good versioning approach with a new v1 template that explicitly documents the optional goal
Sensible Default: Changed default version to "v1" - users get the new feature by default
Refactored _format_prompt: Now takes the full evaluation case instead of just session_input, allowing access to metadata

Questions/Suggestions:

Missing __init__.py update? - Does src/strands_evals/evaluators/prompt_templates/goal_success_rate/__init__.py need to be updated to include goal_success_rate_v1? Without this, the get_template("v1") call won't find the new template.

Tests: The checklist indicates no tests were added. Consider adding a test case like:

def test_goal_success_rate_evaluator_with_custom_goal():
    evaluator = GoalSuccessRateEvaluator()
    case = EvaluationData(
        input="What's the weather?",
        output="It's sunny!",
        metadata={"goal": "Get accurate weather information"}
    )
    # Test that custom goal is used in evaluation

Documentation: Would be helpful to add a brief example in the feature request issue (#74) showing how to use the new metadata field.

Non-Blocking:

Consider running hatch run prepare to ensure all checks pass (checklist item)

Overall this is a clean, useful feature. Happy to approve once the above items are addressed! 🦆

🤖 This is an experimental AI agent response from the Strands team, powered by Strands Agents. We're exploring how AI agents can help with community support and development. Your feedback helps us improve! If you'd prefer human assistance, please let us know.

Introducing optional goal specification per evaluation case to GoalSu…

a33db7c

…ccessRateEvaluator

dbermuehler had a problem deploying to manual-approval December 17, 2025 13:47 — with GitHub Actions Failure

dbermuehler changed the title ~~feat: Optional Case specific Goal for GoalSuccessRateEvaluator #74~~ feat: Optional Case specific Goal for GoalSuccessRateEvaluator Dec 17, 2025

dbermuehler mentioned this pull request Dec 17, 2025

[FEATURE] Optional Case specific Goal for GoalSuccessRateEvaluator #74

Open

strands-agent reviewed Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Optional Case specific Goal for GoalSuccessRateEvaluator #75

feat: Optional Case specific Goal for GoalSuccessRateEvaluator #75

Uh oh!

dbermuehler commented Dec 17, 2025 •

edited

Loading

Uh oh!

strands-agent left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Optional Case specific Goal for GoalSuccessRateEvaluator #75

Are you sure you want to change the base?

feat: Optional Case specific Goal for GoalSuccessRateEvaluator #75

Uh oh!

Conversation

dbermuehler commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

strands-agent left a comment

Choose a reason for hiding this comment

Review: ✅ LGTM with Minor Suggestions

What's Good:

Questions/Suggestions:

Non-Blocking:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dbermuehler commented Dec 17, 2025 •

edited

Loading