Skip to content

Eval regression detected — 2026-04-19 #183

@github-actions

Description

@github-actions

The weekly eval-full run detected a regression on 2026-04-19.

Per-suite exit codes:

  • skill: 0
  • workflow: 1
  • integration: 1

Regressions

skill

Eval Pass Rate Duration Tokens (input+output) Verdict
grader-function-tests 1.00 (+0.00) 494ms (-1000ms) 0 (+0) improved
structural-handoff-template 1.00 (+0.00) 32ms (-11ms) 0 (+0) improved
structural-skill-validation 1.00 (+0.00) 1124ms (-1434ms) 0 (+0) improved
structural-state-enforcement 1.00 (+0.00) 81ms (-10ms) 0 (+0) improved
sw-audit-planted-debt 0.00 1494ms 0 (new — no baseline)
sw-build-after-build-integration 0.43 935ms 0 (new — no baseline)
sw-build-malformed-spec 0.50 1583ms 0 (new — no baseline)
sw-build-rate-limiter 0.00 15126ms 0 (new — no baseline)
sw-build-simple-function 0.40 3627ms 14748 (new — no baseline)
sw-debug-known-bug 0.00 2593ms 0 (new — no baseline)
sw-design-vague-request 1.00 3557ms 0 (new — no baseline)
sw-init-fresh-ts 0.00 0ms 0 (new — no baseline)
sw-verify-semantic-gate 0.00 2520ms 0 (new — no baseline)
workflow-yaml-validation 1.00 (+0.00) 1167ms (-133ms) 0 (+0) improved

Posted by scripts/eval-weekly-dispatch.sh (Specwright eval-full workflow).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions