The weekly eval-full run detected a regression on 2026-04-12.
Per-suite exit codes:
- skill:
0
- workflow:
1
- integration:
1
Regressions
skill
| Eval |
Pass Rate |
Duration |
Tokens (input+output) |
Verdict |
| grader-function-tests |
1.00 (+0.00) |
540ms (-954ms) |
0 (+0) |
ok |
| structural-handoff-template |
1.00 (+0.00) |
31ms (-12ms) |
0 (+0) |
ok |
| structural-skill-validation |
1.00 (+0.00) |
1682ms (-246ms) |
0 (+0) |
ok |
| structural-state-enforcement |
1.00 (+0.00) |
77ms (-14ms) |
0 (+0) |
ok |
| sw-audit-planted-debt |
0.00 |
2341ms |
2610 |
(new — no baseline) |
| sw-build-after-build-integration |
0.43 |
1028ms |
2808 |
(new — no baseline) |
| sw-build-malformed-spec |
1.00 |
4069ms |
5510 |
(new — no baseline) |
| sw-build-rate-limiter |
0.00 |
14632ms |
0 |
(new — no baseline) |
| sw-build-simple-function |
0.50 |
3524ms |
15686 |
(new — no baseline) |
| sw-debug-known-bug |
0.50 |
4427ms |
921 |
(new — no baseline) |
| sw-design-vague-request |
0.50 |
15145ms |
9395 |
(new — no baseline) |
| sw-init-fresh-ts |
0.00 |
0ms |
13408 |
(new — no baseline) |
| sw-verify-semantic-gate |
0.00 |
3400ms |
8853 |
(new — no baseline) |
| workflow-yaml-validation |
1.00 (+0.00) |
842ms (+225ms) |
0 (+0) |
ok |
Posted by scripts/eval-weekly-dispatch.sh (Specwright eval-full workflow).
The weekly eval-full run detected a regression on 2026-04-12.
Per-suite exit codes:
011Regressions
skill
Posted by
scripts/eval-weekly-dispatch.sh(Specwright eval-full workflow).