The weekly eval-full run detected a regression on 2026-04-19.
Per-suite exit codes:
- skill:
0
- workflow:
1
- integration:
1
Regressions
skill
| Eval |
Pass Rate |
Duration |
Tokens (input+output) |
Verdict |
| grader-function-tests |
1.00 (+0.00) |
494ms (-1000ms) |
0 (+0) |
improved |
| structural-handoff-template |
1.00 (+0.00) |
32ms (-11ms) |
0 (+0) |
improved |
| structural-skill-validation |
1.00 (+0.00) |
1124ms (-1434ms) |
0 (+0) |
improved |
| structural-state-enforcement |
1.00 (+0.00) |
81ms (-10ms) |
0 (+0) |
improved |
| sw-audit-planted-debt |
0.00 |
1494ms |
0 |
(new — no baseline) |
| sw-build-after-build-integration |
0.43 |
935ms |
0 |
(new — no baseline) |
| sw-build-malformed-spec |
0.50 |
1583ms |
0 |
(new — no baseline) |
| sw-build-rate-limiter |
0.00 |
15126ms |
0 |
(new — no baseline) |
| sw-build-simple-function |
0.40 |
3627ms |
14748 |
(new — no baseline) |
| sw-debug-known-bug |
0.00 |
2593ms |
0 |
(new — no baseline) |
| sw-design-vague-request |
1.00 |
3557ms |
0 |
(new — no baseline) |
| sw-init-fresh-ts |
0.00 |
0ms |
0 |
(new — no baseline) |
| sw-verify-semantic-gate |
0.00 |
2520ms |
0 |
(new — no baseline) |
| workflow-yaml-validation |
1.00 (+0.00) |
1167ms (-133ms) |
0 (+0) |
improved |
Posted by scripts/eval-weekly-dispatch.sh (Specwright eval-full workflow).
The weekly eval-full run detected a regression on 2026-04-19.
Per-suite exit codes:
011Regressions
skill
Posted by
scripts/eval-weekly-dispatch.sh(Specwright eval-full workflow).