Skip to content

Fix: Eval hash mismatch due to parameter truncation in DB storage#1523

Merged
rlundeen2 merged 7 commits intomicrosoft:mainfrom
rlundeen2:users/rlundeen/2026_03_19_eval_hash_bug
Mar 20, 2026
Merged

Fix: Eval hash mismatch due to parameter truncation in DB storage#1523
rlundeen2 merged 7 commits intomicrosoft:mainfrom
rlundeen2:users/rlundeen/2026_03_19_eval_hash_bug

Conversation

@rlundeen2
Copy link
Copy Markdown
Contributor

@rlundeen2 rlundeen2 commented Mar 19, 2026

Bug: Running await printer.print_summary_async(scenario_result) in 1_configuring_scenarios.ipynb prints "official evaluation has not been run yet for this specific configuration" — even when evals have been run.

Root cause: Long scorer params (e.g., system prompt templates) are truncated to 80 characters when stored in the DB via ComponentIdentifier.to_dict(max_value_length=80). The identity .hash is correctly preserved through the round-trip, but eval_hash is recomputed from the truncated params by EvaluationIdentifier, producing a different hash than what was stored during the eval run. This causes the metrics lookup to fail silently.

Fix: Store eval_hash inside the ComponentIdentifier serialization (to_dict/from_dict) so it survives DB round-trips without recomputation from truncated params.

  • ComponentIdentifier: Added stored_eval_hash field and KEY_EVAL_HASH. to_dict(eval_hash=...) includes it in the JSON; from_dict() restores it.
  • EvaluationIdentifier: Uses stored_eval_hash when available instead of recomputing from (potentially truncated) params.
  • ScenarioResultEntry/ScoreEntry/AttackResultEntry: Compute eval_hash from untruncated identifiers before truncation and pass to to_dict().
  • atomic_attack.py: Same fix for the enriched identifier persistence path.

No DB schema migration needed — eval_hash is stored inside the existing JSON columns. Old data without it falls back to recomputation (same as prior behavior).

@rlundeen2 rlundeen2 force-pushed the users/rlundeen/2026_03_19_eval_hash_bug branch from 409d1b7 to 7084a21 Compare March 20, 2026 01:17
@hannahwestra25
Copy link
Copy Markdown
Contributor

added a few comments. They can be addressed in a follow up PR if you want to get the release started / out

@rlundeen2 rlundeen2 force-pushed the users/rlundeen/2026_03_19_eval_hash_bug branch from 7084a21 to 94dbf6a Compare March 20, 2026 16:40
Store eval_hash inside ComponentIdentifier serialization (to_dict/from_dict)
so it survives DB round-trips without recomputation from truncated params.

- ComponentIdentifier: added stored_eval_hash field and KEY_EVAL_HASH
- EvaluationIdentifier: uses stored_eval_hash when available
- ScenarioResultEntry/ScoreEntry/AttackResultEntry: compute eval_hash before truncation
- atomic_attack.py: same fix for enriched identifier persistence
- Tests: round-trip, double round-trip, and regression tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rlundeen2 rlundeen2 force-pushed the users/rlundeen/2026_03_19_eval_hash_bug branch from 94dbf6a to 2e09b7b Compare March 20, 2026 17:12
@rlundeen2 rlundeen2 merged commit 7822646 into microsoft:main Mar 20, 2026
30 checks passed
riyosha pushed a commit to riyosha/PyRIT that referenced this pull request Mar 24, 2026
…icrosoft#1523)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jbolor21 pushed a commit to jbolor21/jbolor-PyRIT that referenced this pull request Mar 25, 2026
…icrosoft#1523)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants