You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lib/tasks/headline_voice.rake's verify task reports a Criterion 4 "concrete lead" failure when a headline doesn't match any of its regex patterns in the first 40 characters. In the 2026-04-11 validation run against 56 homepage-eligible briefings, the rake task reported 26/56 concrete leads (below the 38/56 threshold) — but manual inspection of the 10 flagged headlines revealed that 7 of them are actually specific and well-written, and the regex is simply missing them.
Example false negatives
From the actual verification run:
"Two Rivers weighs ending two TIF districts early and putting up to $27,536 into motel/hotel upgrades" — the $27,536 appears past the 40-char window
"Harbor plan lists dredging, seawall rebuild, and a breakwater extension for 2027–2029 WisDOT funding" — "2027" appears past the 40-char window
"Construction signs: Council weighs a 30-day removal rule after work ends or before occupancy" — "30-day removal rule" is specific but the patterns only match years, ordinals, dollars, percentages
"Kay Koach honored; Tracey Koach appointed to the Plan Commission seat she held" — named people and named commission, but no pattern catches proper nouns
"Two PUD projects: St. Mark's Square amendment at 1110 Victory St, and Elite Builds housing plan at 3000 Forest Ave" — addresses and street names present, but "Victory" and "1110" aren't in the regex
"Council lined up votes to end TIF district 13 and 16 early; TIF district 16 value returns to tax rolls in 2027" — "Council lined up votes" doesn't match \bCouncil\s+(?:votes|picks) because "lined" is between them
Actual concrete-lead count on 2026-04-11 was closer to 53/56 (~95%), well above the 2/3 threshold. The verifier's threshold failure was misleading.
40-char window is too tight. A headline like "Two Rivers weighs ending two TIF districts early and putting up to \$27,536..." has its concrete detail at char 64+.
Missing pattern families: generic numeric specifics (30-day, 2,200 feet, Bump importmap-rails from 2.2.2 to 2.2.3 #3), proper nouns (Hamilton, St. Mark's, Kay Koach), named actions beyond "votes/picks" (weighs, picks, honored, lined up, reviews), full street-name library (only 6 streets currently, Two Rivers has dozens).
"Council lined up votes" regression: the pattern \bCouncil\s+(?:votes|picks) requires immediate adjacency, which misses any intervening verb or modifier.
Suggested fixes
Two paths, in order of preference:
Option A: Widen the regex set (cheap, low-risk)
Expand window from 40 chars to 60 or 80
Add number + unit patterns: \b\d+(?:-day|-year|-month|-week)\b, \b\d[\d,]*\s*(?:feet|acres|households?|lots?|blocks?|units?)\b
Replace the fixed street-name list with a broader pattern like \b(?:Lincoln|Washington|Main|Memorial|Forest|Twin|Hamilton|Jefferson|Victory|Jackson|Roosevelt|Pierce|Mishicot|Zlatnik|Emmet|Neshotah|Lighthouse|Cool City)\b OR auto-derive the list from city GIS data
Allow "Council" action to accept any verb within 2 words: \bCouncil\s+\w+(?:\s+\w+)?\s+(?:votes?|picks?|approves?|denies?|weighs?|reviews?)
Option B: Invert the logic — check for meta-commentary instead of specificity
Rather than asking "does this lead with a concrete detail?", ask "does this lead with a vague abstract noun or meta-commentary phrase?". That's a shorter banlist and easier to maintain:
Leading with abstract nouns: "Updates", "Discussion", "Review", "Activity", "Status"
Bureaucratic openings: "The city", "Committee", "Board"
This is more aligned with how the prompt itself is structured (banned closers + banned phrases rather than required phrases).
Priority
Low. The verifier is a one-off dev tool used after prompt changes; it doesn't run in production. Real homepage quality is measured by reading the cards, not by the rake task output. This issue exists so the verifier gives more trustworthy automated signals on the next prompt iteration rather than requiring manual review every time.
Context
Found during: the 2026-04-11 prod backfill validation after the homepage headline voice prompt rewrite (spec: `docs/superpowers/specs/2026-04-10-homepage-headline-voice-design.md`). The verifier reported FAIL on Criterion 4 but manual inspection showed the actual pass rate was ~95%.
Related files:
`lib/tasks/headline_voice.rake` (the verify task)
`docs/superpowers/specs/2026-04-10-homepage-headline-voice-design.md` (the spec that defines the success criteria)
Observation
lib/tasks/headline_voice.rake'sverifytask reports a Criterion 4 "concrete lead" failure when a headline doesn't match any of its regex patterns in the first 40 characters. In the 2026-04-11 validation run against 56 homepage-eligible briefings, the rake task reported 26/56 concrete leads (below the 38/56 threshold) — but manual inspection of the 10 flagged headlines revealed that 7 of them are actually specific and well-written, and the regex is simply missing them.Example false negatives
From the actual verification run:
\bCouncil\s+(?:votes|picks)because "lined" is between themActual concrete-lead count on 2026-04-11 was closer to 53/56 (~95%), well above the 2/3 threshold. The verifier's threshold failure was misleading.
Root cause
The regex set is too narrow:
Issues:
"Two Rivers weighs ending two TIF districts early and putting up to \$27,536..."has its concrete detail at char 64+.\bCouncil\s+(?:votes|picks)requires immediate adjacency, which misses any intervening verb or modifier.Suggested fixes
Two paths, in order of preference:
Option A: Widen the regex set (cheap, low-risk)
\b\d+(?:-day|-year|-month|-week)\b,\b\d[\d,]*\s*(?:feet|acres|households?|lots?|blocks?|units?)\b\b[A-Z][a-z]+\s+[A-Z][a-z]+for two-word proper nouns (catches "Sandy Bay", "Kay Koach", "St. Mark's", etc.)\b(?:Lincoln|Washington|Main|Memorial|Forest|Twin|Hamilton|Jefferson|Victory|Jackson|Roosevelt|Pierce|Mishicot|Zlatnik|Emmet|Neshotah|Lighthouse|Cool City)\bOR auto-derive the list from city GIS data\bCouncil\s+\w+(?:\s+\w+)?\s+(?:votes?|picks?|approves?|denies?|weighs?|reviews?)Option B: Invert the logic — check for meta-commentary instead of specificity
Rather than asking "does this lead with a concrete detail?", ask "does this lead with a vague abstract noun or meta-commentary phrase?". That's a shorter banlist and easier to maintain:
This is more aligned with how the prompt itself is structured (banned closers + banned phrases rather than required phrases).
Priority
Low. The verifier is a one-off dev tool used after prompt changes; it doesn't run in production. Real homepage quality is measured by reading the cards, not by the rake task output. This issue exists so the verifier gives more trustworthy automated signals on the next prompt iteration rather than requiring manual review every time.
Context
Found during: the 2026-04-11 prod backfill validation after the homepage headline voice prompt rewrite (spec: `docs/superpowers/specs/2026-04-10-homepage-headline-voice-design.md`). The verifier reported FAIL on Criterion 4 but manual inspection showed the actual pass rate was ~95%.
Related files: