The weekly eval-full run detected a regression on 2026-04-26.
Per-suite exit codes:
- skill:
0
- workflow:
1
- integration:
1
Regressions
skill
| Eval |
Pass Rate |
Duration |
Tokens (input+output) |
Verdict |
| grader-function-tests |
1.00 (+0.00) |
408ms (-1086ms) |
0 (+0) |
improved |
| structural-handoff-template |
1.00 (+0.00) |
30ms (-13ms) |
0 (+0) |
improved |
| structural-skill-validation |
1.00 (+0.00) |
4666ms (+205ms) |
0 (+0) |
ok |
| structural-state-enforcement |
1.00 (+0.00) |
80ms (-11ms) |
0 (+0) |
improved |
| sw-audit-planted-debt |
0.00 |
2116ms |
0 |
(new — no baseline) |
| sw-build-after-build-integration |
0.43 |
992ms |
0 |
(new — no baseline) |
| sw-build-malformed-spec |
0.50 |
2592ms |
0 |
(new — no baseline) |
| sw-build-rate-limiter |
0.00 |
14771ms |
0 |
(new — no baseline) |
| sw-build-simple-function |
0.50 |
3623ms |
20692 |
(new — no baseline) |
| sw-debug-known-bug |
0.00 |
2862ms |
0 |
(new — no baseline) |
| sw-design-vague-request |
1.00 |
4268ms |
0 |
(new — no baseline) |
| sw-init-fresh-ts |
0.00 |
0ms |
0 |
(new — no baseline) |
| sw-verify-semantic-gate |
0.00 |
3341ms |
0 |
(new — no baseline) |
| workflow-yaml-validation |
1.00 (+0.00) |
1326ms (+26ms) |
0 (+0) |
ok |
Posted by scripts/eval-weekly-dispatch.sh (Specwright eval-full workflow).
The weekly eval-full run detected a regression on 2026-04-26.
Per-suite exit codes:
011Regressions
skill
Posted by
scripts/eval-weekly-dispatch.sh(Specwright eval-full workflow).