Skip to content

content(sentence-library): Day 2 byDay variants for all 5 phonemes#129

Merged
youtalk merged 90 commits intomainfrom
content/dictation-day2-bydays
Apr 27, 2026
Merged

content(sentence-library): Day 2 byDay variants for all 5 phonemes#129
youtalk merged 90 commits intomainfrom
content/dictation-day2-bydays

Conversation

@youtalk
Copy link
Copy Markdown
Owner

@youtalk youtalk commented Apr 27, 2026

Summary

Adds the Day 2 authored variants in byDay for every sentence across all 5 phoneme directories under Packages/MoraEngines/Sources/MoraEngines/Resources/SentenceLibrary/{f,r,sh,short_a,th}/. Together with the existing Day 1 (#116, #117, #127) and Day 5 (full sentence at the top level), this completes a Day 1 → Day 2 → Day 5 length ramp for the dictation (ShortSentences) phase.

The companion source-code wiring (SentenceDayPicker, SentenceLibrary.sentences(dayInWeek:…), SessionContainerView.bootstrap clamp on sessionCompletionCount + 1) was already merged in #116 — this PR is content-only.

Day-2 length ramp (averaged over 360 sentences per phoneme)

phoneme extended / 360 avg D1 avg D2 avg D5 (=full)
f 125 4.01 4.36 9.43
th 219 4.15 4.76 9.49
sh 215 4.06 4.66 9.36
r 95 4.03 4.30 9.28
short_a 350 4.27 6.29 9.18

All 5 phonemes satisfy avg D1 < avg D2 < avg D5.

Authoring strategies

Two distinct strategies, by phoneme:

  1. Strict superset of Day 1, +1 word, article+adj+noun slot only — used by f, th, sh, r. Day-2 is Day-1 with exactly one adjective inserted between an article and the noun (e.g. Shen hid a rex.Shen hid a shy rex.). Where no such slot exists, falls back to byDay["2"] = byDay["1"] (Day-2 reads identically to Day-1). This produces grammatical output at every extension and a 26–61 % extension rate.
  2. Relaxed superset — used by short_a. Day-2 may pick any 5–7-word grammatical subset of the original (not constrained to be a superset of Day 1). When no valid subset exists, byDay["2"] is omitted entirely so the runtime falls through to the full Day-5 sentence rather than collapsing to Day-1. This was necessary because short_a Day-1 sentences extract the simplest clause from compound originals, leaving only conjunctions/proper nouns that cannot grammatically be added one at a time. 350/360 sentences received a Day-2 entry.

The relaxed strategy was added after the strict-superset rule produced 332/360 collapse-to-Day-1 fallbacks for short_a in an earlier attempt — that effectively zeroed the ramp for the most-taught phoneme. The omit-on-failure rule preserves the ramp by allowing fallback to the full sentence instead.

Quality notes

  • All extensions pass an automated audit: word order preserved, subset of original, no trailing prepositions / conjunctions / articles per blacklist, JSON parses cleanly.
  • Sample audit (12 random extensions per phoneme): f, th, sh, r → 100 % natural-reading. short_a → ~60–70 % natural, ~25 % borderline (e.g. trailing dangling NP …and an Anky.), ~5 % awkward (trailing adjective …and a fast. — adjective wasn't in the strand blacklist). Acceptable for v1; further polishing is a follow-up content task.
  • th, r, short_a JSON files were reformatted by their authoring sub-agents from the repo's compact one-line-per-word layout to a verbose multi-line layout. The data is identical (audited by structural equality); only whitespace differs. A pure-formatting follow-up commit could restore the compact layout if that's preferred.

Verification

  • xcodegen generate
  • xcodebuild build -project Mora.xcodeproj -scheme Mora -destination 'generic/platform=iOS Simulator' -configuration Debug CODE_SIGNING_ALLOWED=NO ✅ BUILD SUCCEEDED
  • swift test per package: MoraEngines 309 ✅, MoraCore 131 ✅, MoraUI 63 ✅, MoraTesting 22 ✅ (525 tests, 0 failures, 12 skipped)
  • swift-format lint --strict --recursive Mora Packages/*/Sources Packages/*/Tests ✅ clean

Test plan

  • Run a fresh A-day session (sessionCompletionCount == 0) on each phoneme and confirm Day-2 ShortSentences read shorter than the full sentence and grammatically reasonable.
  • Run a Day-1 session and a Day-5 session back-to-back, confirm the perceived difficulty ramp is monotonic.
  • Spot-check one short_a cell on Day 2 and confirm a fallback sentence renders as the full original (not as the Day-1 trim).
  • Scan the short_a worktree for any session whose Day-2 reading ends with an awkward dangling adjective; if frequent in practice, file a quality-cleanup follow-up.

🤖 Generated with Claude Code

youtalk added 30 commits April 27, 2026 06:59
youtalk and others added 26 commits April 27, 2026 07:13
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
19/20 sentences get byDay["2"] (S13 omitted — all valid prefix lengths
end with conjunction or article). D2 lengths are in [d1+1, n-1].

Co-Authored-By: Claude <noreply@anthropic.com>
All 20 sentences get byDay["2"]. D2 lengths are in [d1+1, n-1].
Non-prefix D1 sentences use parent-sentence prefix for D2.

Co-Authored-By: Claude <noreply@anthropic.com>
19/20 sentences get byDay["2"] (S18 omitted — all valid prefix lengths
end with preposition or article). D2 lengths are in [d1+1, n-1].

Co-Authored-By: Claude <noreply@anthropic.com>
All 20 sentences get byDay["2"]. D2 lengths are in [d1+1, n-1].
Non-prefix D1 sentences use parent-sentence prefix for D2.

Co-Authored-By: Claude <noreply@anthropic.com>
19/20 sentences get a D2 entry; S18 omitted (all valid prefix
lengths either end with a conjunction/article or lack a verb).

Co-Authored-By: Claude <noreply@anthropic.com>
20/20 sentences get a D2 entry (0 omitted).

Co-Authored-By: Claude <noreply@anthropic.com>
20/20 sentences get a D2 entry (0 omitted).

Co-Authored-By: Claude <noreply@anthropic.com>
20/20 sentences get a D2 entry (0 omitted).

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 17:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@youtalk youtalk merged commit c247420 into main Apr 27, 2026
5 checks passed
@youtalk youtalk deleted the content/dictation-day2-bydays branch April 27, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants