diff --git a/.claude/commands/debug-prod.md b/.claude/commands/debug-prod.md
new file mode 100644
index 0000000..ec7aaa8
--- /dev/null
+++ b/.claude/commands/debug-prod.md
@@ -0,0 +1,66 @@
+# Debug Production
+
+Investigate production issues on the live server.
+
+## Access
+
+- **SSH**: `ssh root@167.235.133.87`
+- **App directory**: `/opt/pseuno`
+
+## Common commands
+
+All commands below assume you are in `/opt/pseuno` on the server.
+
+### View logs
+```bash
+docker compose -f docker-compose.prod.yml logs --tail=200 backend
+```
+
+### View all service logs
+```bash
+docker compose -f docker-compose.prod.yml logs --tail=100
+```
+
+### Filter errors
+```bash
+docker compose -f docker-compose.prod.yml logs backend 2>&1 | grep -i error | tail -30
+```
+
+### Check health
+```bash
+curl -s localhost:8000/health
+```
+
+### Check running containers
+```bash
+docker compose -f docker-compose.prod.yml ps
+```
+
+### Database access
+```bash
+docker compose -f docker-compose.prod.yml exec postgres psql -U pseuno -d pseuno
+```
+
+### Redis
+```bash
+docker compose -f docker-compose.prod.yml exec redis redis-cli
+```
+
+### Restart backend
+```bash
+docker compose -f docker-compose.prod.yml restart backend
+```
+
+### Deploy latest
+```bash
+cd /opt/pseuno && git pull && docker compose -f docker-compose.prod.yml up -d --build backend
+```
+
+## Investigation workflow
+
+1. SSH in and check health first
+2. View recent logs, filter for errors
+3. Check if containers are running
+4. If needed, check DB/Redis state
+5. Restart backend if it's stuck
+6. If a code fix is needed, deploy from main after merging the fix
diff --git a/.claude/commands/test-frontend.md b/.claude/commands/test-frontend.md
new file mode 100644
index 0000000..e909496
--- /dev/null
+++ b/.claude/commands/test-frontend.md
@@ -0,0 +1,36 @@
+# Test Frontend
+
+Validate frontend changes compile, lint, and build correctly.
+
+## Steps
+
+### 1. Type check
+
+```bash
+cd frontend && npx tsc --noEmit
+```
+Must have zero errors.
+
+### 2. Lint
+
+```bash
+cd frontend && npm run lint
+```
+Must have zero warnings (strict policy).
+
+### 3. Build
+
+```bash
+cd frontend && npm run build
+```
+Must succeed.
+
+### 4. E2E tests (if dev stack is running)
+
+```bash
+cd frontend && npx playwright test
+```
+
+### 5. Visual verification (if making UI changes)
+
+Open `localhost:5173` in a browser (via Playwright MCP) and visually verify the change looks correct.
diff --git a/.claude/commands/test-perf.md b/.claude/commands/test-perf.md
new file mode 100644
index 0000000..a5a5161
--- /dev/null
+++ b/.claude/commands/test-perf.md
@@ -0,0 +1,70 @@
+# Test Performance
+
+Benchmark endpoint latency. Run this when making changes that could affect generation speed.
+
+## Prerequisites
+
+1. Verify dev stack is up: `curl -s localhost:8000/health`
+2. If not running, run `make dev-up` and wait for health check to pass.
+
+## Steps
+
+### 1. Benchmark `/generate/input-concept`
+
+Call 5 times and report min/max/avg latency. Target: <2s each.
+
+```bash
+for i in 1 2 3 4 5; do
+  time curl -s -X POST localhost:8000/generate/input-concept \
+    -H "Content-Type: application/json" \
+    -d '{"raw_input": "upbeat pop song about summer"}' > /dev/null
+done
+```
+
+### 2. Benchmark `/generate/advanced`
+
+Call 3 times with different inputs. Target: <15s each.
+
+```bash
+curl -s -w "\n%{time_total}s\n" -X POST localhost:8000/generate/advanced \
+  -H "Content-Type: application/json" \
+  -d '{"user_prompt": "indie rock with jangly guitars", "lyrics_about": "leaving home for the first time"}'
+
+curl -s -w "\n%{time_total}s\n" -X POST localhost:8000/generate/advanced \
+  -H "Content-Type: application/json" \
+  -d '{"user_prompt": "lo-fi hip hop beats", "lyrics_about": "late night studying"}'
+
+curl -s -w "\n%{time_total}s\n" -X POST localhost:8000/generate/advanced \
+  -H "Content-Type: application/json" \
+  -d '{"user_prompt": "orchestral film score", "lyrics_about": ""}'
+```
+
+### 3. Benchmark `/generate/refine`
+
+Call 3 times with `refine_target=lyrics`. Target: <15s each.
+
+Use the full response from step 2 to build the refine request (the endpoint requires the current snapshot, not just a generation_id):
+```bash
+curl -s -w "\n%{time_total}s\n" -X POST localhost:8000/generate/refine \
+  -H "Content-Type: application/json" \
+  -d '{
+    "suno_prompt": "<suno_prompt from step 2>",
+    "lyrics": "<lyrics from step 2>",
+    "exclude": "<exclude from step 2>",
+    "title": "<concept_title from step 2>",
+    "weirdness": <weirdness from step 2>,
+    "change_request": "make it more emotional",
+    "refine_target": "lyrics"
+  }'
+```
+
+### 4. Compare (if testing a change)
+
+If benchmarking before/after a code change:
+1. Run steps 1-3 on the base branch, save results
+2. Switch to feature branch, run again
+3. Report the delta for each endpoint
+
+### 5. Save results
+
+Save the report to `benchmarks/perf-YYYY-MM-DD.md` (use today's date). Include the git branch/commit, per-call latencies, and min/max/avg for each endpoint. See `benchmarks/README.md` for the format.
diff --git a/.claude/commands/test-quality.md b/.claude/commands/test-quality.md
new file mode 100644
index 0000000..05f6709
--- /dev/null
+++ b/.claude/commands/test-quality.md
@@ -0,0 +1,97 @@
+# Test Quality
+
+Assess generation quality by hitting real endpoints. Run this after prompt or generation changes.
+
+## Prerequisites
+
+1. Verify dev stack is up: `curl -s localhost:8000/health`
+2. If not running, run `make dev-up` and wait for health check to pass.
+
+## Steps
+
+### 1. Generate 5 songs across varied genres
+
+Call `POST localhost:8000/generate/advanced` with these inputs:
+
+```bash
+# Country
+curl -s -X POST localhost:8000/generate/advanced \
+  -H "Content-Type: application/json" \
+  -d '{"user_prompt": "classic country with steel guitar and fiddle", "lyrics_about": "driving down a dirt road at sunset"}'
+
+# Punk
+curl -s -X POST localhost:8000/generate/advanced \
+  -H "Content-Type: application/json" \
+  -d '{"user_prompt": "fast aggressive punk rock", "lyrics_about": "being fed up with corporate greed"}'
+
+# Hip-hop
+curl -s -X POST localhost:8000/generate/advanced \
+  -H "Content-Type: application/json" \
+  -d '{"user_prompt": "boom bap hip hop with jazz samples", "lyrics_about": "growing up in the city"}'
+
+# Folk
+curl -s -X POST localhost:8000/generate/advanced \
+  -H "Content-Type: application/json" \
+  -d '{"user_prompt": "acoustic folk with fingerpicking", "lyrics_about": "a small town slowly disappearing"}'
+
+# Electronic
+curl -s -X POST localhost:8000/generate/advanced \
+  -H "Content-Type: application/json" \
+  -d '{"user_prompt": "dark synthwave with arpeggiated bass", "lyrics_about": "neon lights in an empty city"}'
+```
+
+Capture full JSON responses from each.
+
+### 2. Assess vocabulary
+
+Read the lyrics from each response. Check:
+- **Banned/overused words**: silver, velvet, neon, shattered, whisper, shadows, echoes, crimson, golden, embers
+- Flag if any of these appear in 3+ of the 5 songs
+- Check that vocabulary feels genre-appropriate (country should sound different from punk)
+
+### 3. Assess chorus quality
+
+Parse `[Chorus]` sections from each song. Flag if any chorus has the same line repeated 3+ times consecutively.
+
+### 4. Assess style names
+
+Check that each response's `concept_title` / style name is:
+- Short (under 30 characters)
+- Descriptive, not a full style prompt sentence
+
+### 5. Assess structure
+
+For each song, verify:
+- Section tags are present (`[Verse]`, `[Chorus]`, etc.)
+- No stage directions in lyrics (e.g., "(softly)", "(guitar solo)")
+- No periods at end of lines
+
+### 6. Test refine
+
+Take one generated song's full response and call refine. The refine endpoint requires the full current snapshot:
+```bash
+curl -s -X POST localhost:8000/generate/refine \
+  -H "Content-Type: application/json" \
+  -d '{
+    "suno_prompt": "<suno_prompt from step 1>",
+    "lyrics": "<lyrics from step 1>",
+    "exclude": "<exclude from step 1>",
+    "title": "<concept_title from step 1>",
+    "weirdness": <weirdness from step 1>,
+    "change_request": "make the chorus more upbeat and energetic",
+    "refine_target": "lyrics"
+  }'
+```
+
+Verify:
+- `changed_fields` includes "lyrics"
+- `changed_fields` does NOT include "suno_prompt"
+- Completed in <30s
+
+### 7. Report
+
+Summarize findings: what passed, what failed, with specific examples of issues found.
+
+### 8. Save results
+
+Save the report to `benchmarks/quality-YYYY-MM-DD.md` (use today's date). Include the git branch/commit, per-song results for each check (vocabulary, chorus, style names, structure), and a summary. See `benchmarks/README.md` for the format.
diff --git a/.claude/commands/update-rules.md b/.claude/commands/update-rules.md
new file mode 100644
index 0000000..e85e9e6
--- /dev/null
+++ b/.claude/commands/update-rules.md
@@ -0,0 +1,17 @@
+# Update Rules
+
+Add new conventions or pitfalls to CLAUDE.md when you discover them during a task.
+
+## Steps
+
+1. Read current `CLAUDE.md` at the project root.
+2. Identify the new convention, pattern, or pitfall discovered during the current task.
+3. Add it to the appropriate section (keep entries concise, 1-2 lines).
+4. Don't remove existing rules unless they're demonstrably wrong.
+
+## Examples of things to add
+
+- A new testing pattern you had to figure out
+- A file that's easy to break and how to avoid it
+- A config value that doesn't do what its name suggests
+- An API behavior that's surprising or undocumented
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
new file mode 100644
index 0000000..7ab4725
--- /dev/null
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,54 @@
+name: CI
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  backend-lint:
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: backend
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+          cache: pip
+      - run: pip install -r requirements.txt ruff
+      - run: python -m ruff check app/ tests/
+
+  backend-test:
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: backend
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+          cache: pip
+      - run: pip install -r requirements.txt
+      - run: >-
+          python -m pytest -v
+          --ignore=tests/test_artist_bank_routing.py
+          --ignore=tests/test_v8_channel_split.py
+
+  frontend-build:
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: frontend
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version: "20"
+          cache: npm
+          cache-dependency-path: frontend/package-lock.json
+      - run: npm ci
+      - run: npm run build
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..f61d2b1
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,62 @@
+# Columbus V1 — Agent Instructions
+
+## Architecture
+
+Columbus is a music generation app. The backend is a FastAPI service that calls LLMs to generate Suno-compatible style prompts and lyrics.
+
+### Generation flow (default: two-step v5_hybrid)
+
+The default prompt variant is `v5_hybrid` which runs **two parallel branches** via `asyncio.gather`:
+
+1. **Style branch** → generates `suno_prompt`, `exclude`, `weirdness`, `style_influence`
+2. **Lyrics branch** → infers a `LyricProfile` first, then generates `song_title` + `lyrics`
+
+After both branches complete, a `style_name` LLM call summarizes the style.
+
+**LLM call order** (non-instrumental): style → profile → lyrics → style_name (4 calls)
+**LLM call order** (instrumental): style → title → style_name (3 calls)
+
+### Key files
+
+| File | What it does | Pitfalls |
+|---|---|---|
+| `backend/app/services/agent_prompt_graph.py` | Core generation engine (~3500 lines) | `_injected_llm` makes all branches share one FakeLLM in tests |
+| `backend/app/prompts/specs.py` | Shared output contracts, repair prompts | Changes here affect ALL variants |
+| `backend/app/prompts/variants/v5_hybrid.py` | Default two-step variant config | `uses_lyric_profile=True` triggers profile inference |
+| `backend/app/schemas/advanced.py` | Request/response models, DebugTrace schema | `PromptVariant` literal must match registry |
+| `backend/app/services/debug_trace.py` | DebugTracer builds span-based traces | `debug_info` is `DebugTrace` format, not flat dict |
+| `backend/app/config.py` | Settings (env vars, defaults) | `agent_repair_enabled` exists but is NOT used in two-step code |
+
+## Testing
+
+### Running tests
+
+```bash
+cd backend && python -m pytest -v
+```
+
+**ALL tests must pass before committing.** Run the full suite, not just the file you changed.
+
+### Testing patterns
+
+- **FakeLLM**: Tests inject a `FakeLLM` with a list of string responses consumed in order. For two-step v5_hybrid, provide responses in this order: style, profile, lyrics, style_name (4 for non-instrumental, 3 for instrumental).
+- **Always set `prompt_variant="v5_hybrid"`** in test requests — this matches the default behavior and ensures correct FakeLLM consumption order.
+- **`debug_info`** is a `DebugTrace` dict with `version`, `summary` (variant, model, repairs, architecture), and `spans` list — NOT a flat dict with `repaired`/`agent_model` keys.
+- **Shared helpers** are in `backend/tests/conftest.py` (`test_settings` fixture) and at the top of test files (`_valid_style_output`, `_valid_lyrics_output`, etc.).
+- **Route tests** copy endpoint functions inline with mocked dependencies rather than importing FastAPI routers.
+- **Pre-existing failures**: `test_artist_bank_routing.py` (event loop) and `test_v8_channel_split.py` are known broken — don't worry about those.
+
+### Lint
+
+```bash
+cd backend && python -m ruff check app/ tests/
+```
+
+## Skills
+
+Use these after making changes:
+- `/test-quality` — assess generation quality by hitting real endpoints (after prompt/generation changes)
+- `/test-perf` — benchmark endpoint latency
+- `/test-frontend` — validate frontend builds and types
+- `/debug-prod` — investigate production issues
+- `/update-rules` — add new conventions to this file
diff --git a/Makefile b/Makefile
index f8e920d..77b186b 100644
--- a/Makefile
+++ b/Makefile
@@ -3,6 +3,7 @@ COMPOSE_PROD = docker compose -f docker-compose.prod.yml
 
 .PHONY: dev dev-up dev-down dev-build dev-logs dev-ps backend-shell frontend-shell db-shell redis-cli
 .PHONY: prod prod-up prod-down prod-build prod-logs
+.PHONY: test lint check
 
 dev:
 	$(COMPOSE_DEV) up --build
@@ -48,3 +49,11 @@ prod-build:
 
 prod-logs:
 	$(COMPOSE_PROD) logs -f --tail=100
+
+test:
+	cd backend && python -m pytest -v
+
+lint:
+	cd backend && python -m ruff check app/ tests/
+
+check: lint test
diff --git a/backend/app/prompts/specs.py b/backend/app/prompts/specs.py
index 8d4ef4e..f76b225 100644
--- a/backend/app/prompts/specs.py
+++ b/backend/app/prompts/specs.py
@@ -345,8 +345,8 @@
 - Each chorus should contain the same lyrics as the other chorus. However, within a single chorus, each line must be DIFFERENT — do NOT repeat the same line consecutively. A 4-line chorus needs 4 distinct lines.
 - Prioritize punchy, impactful lines over filler. Each line should earn its place.
 
-Vocabulary rules:
-- Avoid overusing generic "poetic" words like "silver", "velvet", "neon", "shattered", "whisper", "shadows", "echoes", "crimson", "golden", "embers". These are fine occasionally but should not appear in every song.
+Vocabulary rules (CRITICAL — validation will reject violations):
+- NEVER use 3 or more of these generic "poetic" words in a single song: "silver", "velvet", "neon", "shattered", "whisper", "shadows", "echoes", "crimson", "golden", "embers". Using 1-2 is acceptable if genuinely fitting; 3+ will trigger a rewrite.
 - Derive vocabulary from the genre and era context. Each genre has its own linguistic register — think about what words and imagery belong to that genre's world. A country song and a punk song should not share the same adjectives.
 - Each song must have its own unique vocabulary palette drawn from the genre and topic.
 
diff --git a/backend/app/services/agent_prompt_graph.py b/backend/app/services/agent_prompt_graph.py
index 1114874..e71fbd9 100644
--- a/backend/app/services/agent_prompt_graph.py
+++ b/backend/app/services/agent_prompt_graph.py
@@ -1075,9 +1075,20 @@ async def _generate_parallel_two_step(
 
         suno_prompt = style_result["suno_prompt"]
 
+        # Derive concept_title with fallback if lyrics branch didn't produce one
+        concept_title = lyrics_result["song_title"]
+        if not concept_title:
+            concept_title = self._derive_title(
+                request.user_prompt, request.lyrics_about or ""
+            )
+            logger.warning(
+                "Lyrics branch returned empty song_title, using fallback: %s",
+                concept_title,
+            )
+
         # Generate unique ID for this generation
         generation_id = hashlib.md5(
-            f"{lyrics_result['song_title']}{suno_prompt}{time.time()}".encode()
+            f"{concept_title}{suno_prompt}{time.time()}".encode()
         ).hexdigest()[:12]
 
         logger.info(
@@ -1086,7 +1097,7 @@ async def _generate_parallel_two_step(
         )
 
         return {
-            "concept_title": lyrics_result["song_title"],
+            "concept_title": concept_title,
             "style_name": style_result.get("style_name", ""),
             "lyrics": lyrics_result["lyrics"],
             "suno_prompt": suno_prompt,
@@ -2427,6 +2438,11 @@ def _strip_lyrics_preamble(self, lyrics: str) -> str:
             return lyrics[match.start() :].strip()
         return lyrics.strip()
 
+    _OVERUSED_WORDS = frozenset({
+        "silver", "velvet", "neon", "shattered", "whisper",
+        "shadows", "echoes", "crimson", "golden", "embers",
+    })
+
     def _validate_lyrics_output(self, output: _ParsedLyricsOutput) -> List[str]:
         """Validate lyrics output, return list of issues."""
         issues = []
@@ -2439,8 +2455,21 @@ def _validate_lyrics_output(self, output: _ParsedLyricsOutput) -> List[str]:
         else:
             # Check for chorus lines that are mostly identical
             issues.extend(self._check_chorus_repetition(output.lyrics))
+            # Check for overused generic "poetic" words
+            issues.extend(self._check_overused_words(output.lyrics))
         return issues
 
+    def _check_overused_words(self, lyrics: str) -> List[str]:
+        """Flag lyrics that use 3+ banned generic poetic words."""
+        words_in_lyrics = set(re.findall(r"[a-z]+", lyrics.lower()))
+        found = words_in_lyrics & self._OVERUSED_WORDS
+        if len(found) >= 3:
+            return [
+                f"Too many generic poetic words ({', '.join(sorted(found))}). "
+                f"Replace with genre-specific vocabulary."
+            ]
+        return []
+
     @staticmethod
     def _check_chorus_repetition(lyrics: str) -> List[str]:
         """Detect choruses where >50% of lines are identical."""
diff --git a/backend/ruff.toml b/backend/ruff.toml
new file mode 100644
index 0000000..bc20e29
--- /dev/null
+++ b/backend/ruff.toml
@@ -0,0 +1,8 @@
+[lint.per-file-ignores]
+# Template file has intentional unused imports as documentation
+"app/prompts/variants/_template.py" = ["F401"]
+# Test files commonly import for side effects or use inside methods
+"tests/*" = ["F401", "F541"]
+# Pre-existing issues in app code (not introduced by this PR)
+"app/routes/generate_input_concept.py" = ["F401"]
+"app/schemas/advanced.py" = ["E402"]
diff --git a/backend/tests/conftest.py b/backend/tests/conftest.py
new file mode 100644
index 0000000..0044c57
--- /dev/null
+++ b/backend/tests/conftest.py
@@ -0,0 +1,13 @@
+"""
+Shared test fixtures for the backend test suite.
+"""
+
+import pytest
+
+from app.config import Settings
+
+
+@pytest.fixture
+def test_settings() -> Settings:
+    """Minimal Settings instance for tests (no real API keys needed)."""
+    return Settings(spotify_client_id="test", openai_api_key="test")
diff --git a/backend/tests/test_agent_prompt_graph.py b/backend/tests/test_agent_prompt_graph.py
index 1f64501..6dfdfb5 100644
--- a/backend/tests/test_agent_prompt_graph.py
+++ b/backend/tests/test_agent_prompt_graph.py
@@ -1,5 +1,10 @@
 """
-Tests for AgentPromptGraph — basic use cases + repair/validation behavior.
+Tests for AgentPromptGraph — two-step (v5_hybrid) generation.
+
+All tests use prompt_variant="v5_hybrid" which is the default two-step variant.
+FakeLLM responses are consumed in this order:
+  Non-instrumental: style → profile → lyrics → style_name (4 calls)
+  Instrumental: style → title → style_name (3 calls)
 """
 
 import asyncio
@@ -17,7 +22,8 @@ def __init__(self, content: str):
 class FakeLLM:
     """
     Minimal LLM stub for testing.
-    It returns a sequence of contents across successive `ainvoke` calls.
+    Returns a sequence of contents across successive `ainvoke` calls.
+    After exhausting contents, returns empty string.
     """
 
     def __init__(self, contents: list[str]):
@@ -32,20 +38,25 @@ async def ainvoke(self, _messages, temperature=None):
         return _FakeResponse(self._contents.pop(0))
 
 
-def _settings() -> Settings:
-    # Spotify is optional; provide a stub value for clarity.
-    return Settings(spotify_client_id="test_spotify_client_id", openai_api_key="test")
+def _settings(**overrides) -> Settings:
+    defaults = dict(spotify_client_id="test", openai_api_key="test")
+    defaults.update(overrides)
+    return Settings(**defaults)
 
 
-def _valid_output(
-    lyrics: str = "[Verse]\nhello world\n",
+# ---------------------------------------------------------------------------
+# Output helpers for two-step v5_hybrid variant
+# ---------------------------------------------------------------------------
+
+
+def _valid_style_output(
     suno_prompt: str = "Funky pop, crisp drums, bright bass",
     exclude: str = "cheesy, country",
     weirdness: int = 50,
     style_influence: int = 60,
 ) -> str:
+    """Valid output for the style branch."""
     return (
-        f"LYRICS\n{lyrics}\n\n"
         f"SUNO PROMPT\n{suno_prompt}\n\n"
         f"EXCLUDE\n{exclude}\n\n"
         f"WEIRDNESS\n{weirdness}\n\n"
@@ -53,6 +64,69 @@ def _valid_output(
     )
 
 
+def _valid_lyrics_output(
+    song_title: str = "Hello World",
+    lyrics: str = "[Verse]\nhello world\n",
+) -> str:
+    """Valid output for the lyrics branch."""
+    return f"SONG TITLE\n{song_title}\n\nLYRICS\n{lyrics}\n"
+
+
+def _valid_profile_output() -> str:
+    """Valid per-section profile output for profile inference."""
+    return (
+        'Verse: {"lines_per_section": "4_lines", "line_length": "default", "pov": "first", '
+        '"rhyme_scheme": "aabb", "directness": "balanced", "persona": "earnest", '
+        '"humor": "none", "explicitness": "clean", "audience": "general"}\n'
+        'Pre-Chorus: {"lines_per_section": "2_lines", "line_length": "short", "pov": "first", '
+        '"rhyme_scheme": "aabb", "directness": "direct", "persona": "earnest", '
+        '"humor": "none", "explicitness": "clean", "audience": "general"}\n'
+        'Chorus: {"lines_per_section": "4_lines", "line_length": "short", "pov": "first", '
+        '"rhyme_scheme": "aaaa", "directness": "direct", "persona": "earnest", '
+        '"humor": "none", "explicitness": "clean", "audience": "general"}\n'
+        'Post-Chorus: {"lines_per_section": "2_lines", "line_length": "sparse", "pov": "none", '
+        '"rhyme_scheme": "aaaa", "directness": "direct", "persona": "earnest", '
+        '"humor": "none", "explicitness": "clean", "audience": "general"}\n'
+        'Bridge: {"lines_per_section": "4_lines", "line_length": "default", "pov": "second", '
+        '"rhyme_scheme": "abab", "directness": "metaphor_heavy", "persona": "melancholic", '
+        '"humor": "none", "explicitness": "clean", "audience": "general"}\n'
+        'Structure: ["Intro", "Verse", "Chorus", "Verse", "Chorus", "Bridge", "Chorus", "Outro"]'
+    )
+
+
+def _style_name_output() -> str:
+    """Valid output for style name generation."""
+    return "Indie Pop Fusion"
+
+
+def _happy_path_responses(
+    style_output=None,
+    profile_output=None,
+    lyrics_output=None,
+    style_name=None,
+) -> list[str]:
+    """Standard 4-response sequence for non-instrumental happy path."""
+    return [
+        style_output or _valid_style_output(),
+        profile_output or _valid_profile_output(),
+        lyrics_output or _valid_lyrics_output(),
+        style_name or _style_name_output(),
+    ]
+
+
+def _instrumental_responses(
+    style_output=None,
+    title="The Last Horizon",
+    style_name=None,
+) -> list[str]:
+    """Standard 3-response sequence for instrumental mode."""
+    return [
+        style_output or _valid_style_output(),
+        title,
+        style_name or _style_name_output(),
+    ]
+
+
 # ---------------------------------------------------------------------------
 # Basic use cases (happy path)
 # ---------------------------------------------------------------------------
@@ -60,20 +134,20 @@ def _valid_output(
 
 def test_valid_output_no_repairs_needed():
     """When the LLM returns valid output on first try, no repairs are triggered."""
-    output = _valid_output()
-    llm = FakeLLM([output])
+    llm = FakeLLM(_happy_path_responses())
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Make a funky pop song",
         lyrics_about="dancing in the rain",
         selected_artists=[],
         tags=["pop", "funk"],
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
-    assert llm.calls == 1  # no repair calls
-    assert result["debug_info"]["repaired"] is False
+    assert llm.calls == 4  # style + profile + lyrics + style_name
+    assert result["debug_info"]["summary"]["repairs"] == 0
     assert result["lyrics"] == "[Verse]\nhello world"
     assert result["suno_prompt"] == "Funky pop, crisp drums, bright bass"
     assert result["exclude"] == "cheesy, country"
@@ -82,13 +156,13 @@ def test_valid_output_no_repairs_needed():
 
 
 def test_extracts_all_response_fields():
-    """All expected fields are present in the response."""
-    output = _valid_output()
-    llm = FakeLLM([output])
+    """All expected fields are present in the two-step response."""
+    llm = FakeLLM(_happy_path_responses())
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Cinematic orchestral piece",
         lyrics_about="stars colliding",
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
@@ -101,50 +175,57 @@ def test_extracts_all_response_fields():
     assert "style_influence" in result
     assert "generation_id" in result
     assert "debug_info" in result
-    assert "agent_model" in result["debug_info"]
-    assert "context_hash" in result["debug_info"]
-    assert "repaired" in result["debug_info"]
+    # DebugTrace format
+    assert "summary" in result["debug_info"]
+    assert "spans" in result["debug_info"]
+    assert result["debug_info"]["summary"]["variant"] == "v5_hybrid"
+    assert result["debug_info"]["summary"]["architecture"] == "two_step"
 
 
-def test_concept_title_derived_from_lyrics_about():
-    """Concept title is derived from lyrics_about when provided."""
-    output = _valid_output()
-    llm = FakeLLM([output])
+def test_concept_title_from_lyrics_branch():
+    """In two-step, concept title comes from the lyrics branch song_title."""
+    llm = FakeLLM(
+        _happy_path_responses(
+            lyrics_output=_valid_lyrics_output(song_title="Ants Marching On Mars"),
+        )
+    )
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Make something epic",
         lyrics_about="ants marching on Mars",
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
-    # Title should be derived from first few words of lyrics_about
     assert result["concept_title"] == "Ants Marching On Mars"
 
 
-def test_concept_title_falls_back_to_user_prompt():
-    """When lyrics_about is empty, concept title is derived from user_prompt."""
-    output = _valid_output()
-    llm = FakeLLM([output])
+def test_concept_title_instrumental_from_title_llm():
+    """When lyrics_about is empty (instrumental), title comes from title LLM."""
+    llm = FakeLLM(_instrumental_responses(title="Heavy Metal Thunder"))
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="heavy metal breakdown",
         lyrics_about="",
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
-    assert result["concept_title"] == "Heavy Metal Breakdown"
+    assert result["concept_title"] == "Heavy Metal Thunder"
+    assert result["lyrics"] == ""
 
 
 def test_generation_id_is_unique():
     """Each generation produces a unique generation_id."""
-    output = _valid_output()
-    llm = FakeLLM([output, output])
+    # Provide enough responses for two full generations
+    llm = FakeLLM(_happy_path_responses() + _happy_path_responses())
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="synth wave",
         lyrics_about="neon nights",
+        prompt_variant="v5_hybrid",
     )
 
     result1 = asyncio.run(builder.generate(req))
@@ -153,188 +234,231 @@ def test_generation_id_is_unique():
     assert result1["generation_id"] != result2["generation_id"]
 
 
-def test_suno_prompt_over_500_triggers_repair_and_then_error():
-    """SUNO PROMPT >500 chars is invalid; after repairs are exhausted we return an error (no fallback)."""
+# ---------------------------------------------------------------------------
+# Style branch validation + repair tests
+# ---------------------------------------------------------------------------
+
+
+def test_suno_prompt_over_500_triggers_style_repairs():
+    """SUNO PROMPT >500 chars triggers repair attempts in style branch."""
     long_prompt = "A" * 600
-    output = _valid_output(suno_prompt=long_prompt)
-    llm = FakeLLM([output, output, output])
+    bad_style = _valid_style_output(suno_prompt=long_prompt)
+    llm = FakeLLM(
+        [
+            bad_style,  # #1: style.generate (bad)
+            bad_style,  # #2: style.repair.1 (still bad)
+            bad_style,  # #3: style.repair.2 (still bad)
+            _valid_profile_output(),  # #4: lyrics.profile_infer
+            _valid_lyrics_output(),  # #5: lyrics.generate
+            _style_name_output(),  # #6: style.name_generate
+        ]
+    )
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="test prompt",
         lyrics_about="test topic",
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
-    assert llm.calls == 3  # initial + 2 repairs (default)
-    assert result["success"] is False
-    assert "issues" in result and result["issues"]
-    assert any(
-        "SUNO PROMPT exceeds 500 characters" in issue for issue in result["issues"]
+    assert llm.calls == 6  # style(3) + profile + lyrics + name
+    # Style branch proceeded with issues
+    assert len(result["suno_prompt"]) > 500
+    # Debug trace shows repair attempts
+    spans = result["debug_info"]["spans"]
+    repair_spans = [s for s in spans if "repair" in s["name"]]
+    assert len(repair_spans) == 2
+
+
+def test_weirdness_out_of_range_triggers_style_repairs():
+    """Weirdness >100 triggers repair attempts in style branch."""
+    bad_style = _valid_style_output(weirdness=150)
+    llm = FakeLLM(
+        [
+            bad_style,  # #1: style.generate (bad)
+            bad_style,  # #2: style.repair.1 (still bad)
+            bad_style,  # #3: style.repair.2 (still bad)
+            _valid_profile_output(),  # #4: lyrics.profile_infer
+            _valid_lyrics_output(),  # #5: lyrics.generate
+            _style_name_output(),  # #6: style.name_generate
+        ]
     )
-
-
-def test_weirdness_out_of_range_triggers_repair_and_then_error():
-    """Weirdness values outside 0-100 are invalid; after repairs we return an error (no fallback)."""
-    # Value > 100 is invalid per validator
-    output_high = _valid_output(weirdness=150)
-    llm = FakeLLM([output_high, output_high, output_high])
     builder = AgentPromptGraph(_settings(), llm=llm)
-    req = AdvancedGenerateRequest(user_prompt="test", lyrics_about="test")
+    req = AdvancedGenerateRequest(
+        user_prompt="test",
+        lyrics_about="test",
+        prompt_variant="v5_hybrid",
+    )
 
     result = asyncio.run(builder.generate(req))
 
-    assert llm.calls == 3
-    assert result["success"] is False
-    assert any(
-        "WEIRDNESS must be between 0 and 100" in issue for issue in result["issues"]
+    assert llm.calls == 6
+    assert result["weirdness"] == 150
+    spans = result["debug_info"]["spans"]
+    repair_spans = [s for s in spans if "repair" in s["name"]]
+    assert len(repair_spans) == 2
+
+
+def test_style_influence_out_of_range_triggers_style_repairs():
+    """Style influence >100 triggers repair attempts in style branch."""
+    bad_style = _valid_style_output(style_influence=200)
+    llm = FakeLLM(
+        [
+            bad_style,  # #1: style.generate (bad)
+            bad_style,  # #2: style.repair.1 (still bad)
+            bad_style,  # #3: style.repair.2 (still bad)
+            _valid_profile_output(),  # #4: lyrics.profile_infer
+            _valid_lyrics_output(),  # #5: lyrics.generate
+            _style_name_output(),  # #6: style.name_generate
+        ]
     )
-
-
-def test_style_influence_out_of_range_triggers_repair_and_then_error():
-    """Style influence values outside 0-100 are invalid; after repairs we return an error (no fallback)."""
-    output = _valid_output(style_influence=200)
-    llm = FakeLLM([output, output, output])
     builder = AgentPromptGraph(_settings(), llm=llm)
-    req = AdvancedGenerateRequest(user_prompt="test", lyrics_about="test")
+    req = AdvancedGenerateRequest(
+        user_prompt="test",
+        lyrics_about="test",
+        prompt_variant="v5_hybrid",
+    )
 
     result = asyncio.run(builder.generate(req))
 
-    assert llm.calls == 3
-    assert result["success"] is False
-    assert any(
-        "STYLE INFLUENCE must be between 0 and 100" in issue
-        for issue in result["issues"]
-    )
+    assert llm.calls == 6
+    assert result["style_influence"] == 200
+    spans = result["debug_info"]["spans"]
+    repair_spans = [s for s in spans if "repair" in s["name"]]
+    assert len(repair_spans) == 2
 
 
 def test_tags_are_passed_through():
-    """Tags from request are included in context and don't break generation."""
-    output = _valid_output()
-    llm = FakeLLM([output])
+    """Tags from request don't break generation."""
+    llm = FakeLLM(_happy_path_responses())
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="indie rock anthem",
         lyrics_about="summer nights",
         tags=["indie", "rock", "anthemic"],
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
-    assert result["debug_info"]["repaired"] is False
+    assert result["debug_info"]["summary"]["repairs"] == 0
     assert result["suno_prompt"]  # output is valid
 
 
 def test_selected_artists_not_leaked_when_valid():
-    """Selected artists are used for context but don't appear in valid output."""
-    # LLM returns valid output that doesn't mention artist
-    output = _valid_output(suno_prompt="Retro funk, smooth bass, falsetto vocals")
-    llm = FakeLLM([output])
+    """Selected artists don't appear in valid style output."""
+    llm = FakeLLM(
+        _happy_path_responses(
+            style_output=_valid_style_output(
+                suno_prompt="Retro funk, smooth bass, falsetto vocals"
+            ),
+        )
+    )
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Make it sound like Prince",
         lyrics_about="purple rain",
         selected_artists=["Prince"],
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
     assert "prince" not in result["suno_prompt"].lower()
-    assert result["debug_info"]["repaired"] is False
-
-
-def test_repairs_when_missing_sections():
-    # First output is missing sections and order (invalid), second output is valid.
-    bad = "LYRICS\n" "[Verse]\nhello\n" "\n" "SUNO PROMPT\n" "some prompt\n"
-    good = (
-        "LYRICS\n"
-        "[Verse]\nhello\n\n"
-        "SUNO PROMPT\n"
-        "some prompt\n\n"
-        "EXCLUDE\n"
-        "cheesy, country\n\n"
-        "WEIRDNESS\n"
-        "42\n\n"
-        "STYLE INFLUENCE\n"
-        "55\n"
-    )
-
-    llm = FakeLLM([bad, good])
+    assert result["debug_info"]["summary"]["repairs"] == 0
+
+
+def test_style_repair_fixes_missing_sections():
+    """Style branch repairs when initial output is missing required sections."""
+    bad = "SUNO PROMPT\nsome prompt\n"  # Missing EXCLUDE, WEIRDNESS, STYLE INFLUENCE
+    good = _valid_style_output(
+        suno_prompt="some prompt",
+        exclude="cheesy, country",
+        weirdness=42,
+        style_influence=55,
+    )
+    llm = FakeLLM(
+        [
+            bad,  # #1: style.generate (bad — missing EXCLUDE)
+            good,  # #2: style.repair.1 (good)
+            _valid_profile_output(),  # #3: lyrics.profile_infer
+            _valid_lyrics_output(),  # #4: lyrics.generate
+            _style_name_output(),  # #5: style.name_generate
+        ]
+    )
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Make something big and cinematic",
         lyrics_about="ants on Mars",
         selected_artists=[],
         tags=["cinematic"],
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
-    assert llm.calls == 2  # one repair pass
-    assert result["debug_info"]["repaired"] is True
+
+    assert llm.calls == 5  # style(2) + profile + lyrics + name
+    spans = result["debug_info"]["spans"]
+    repair_spans = [s for s in spans if "repair" in s["name"]]
+    assert len(repair_spans) == 1
     assert result["suno_prompt"] == "some prompt"
     assert result["exclude"] == "cheesy, country"
     assert result["weirdness"] == 42
     assert result["style_influence"] == 55
 
 
-def test_repairs_when_artist_name_leaks_in_suno_prompt_only():
-    # First output leaks an artist name in SUNO PROMPT; second output fixes it.
-    bad = (
-        "LYRICS\n"
-        "[Verse]\nhello\n\n"
-        "SUNO PROMPT\n"
-        "In the style of Bruno Mars, funky pop groove\n\n"
-        "EXCLUDE\n"
-        "cheesy\n\n"
-        "WEIRDNESS\n"
-        "50\n\n"
-        "STYLE INFLUENCE\n"
-        "60\n"
-    )
-    good = (
-        "LYRICS\n"
-        "[Verse]\nhello\n\n"
-        "SUNO PROMPT\n"
-        "Funky pop groove, bright bass, crisp drums, glossy modern mix\n\n"
-        "EXCLUDE\n"
-        "cheesy\n\n"
-        "WEIRDNESS\n"
-        "50\n\n"
-        "STYLE INFLUENCE\n"
-        "60\n"
-    )
-
-    llm = FakeLLM([bad, good])
+def test_artist_names_not_in_clean_suno_prompt():
+    """Verify clean suno_prompt doesn't contain artist names."""
+    clean_style = _valid_style_output(
+        suno_prompt="Funky pop groove, bright bass, crisp drums, glossy modern mix",
+    )
+    llm = FakeLLM(_happy_path_responses(style_output=clean_style))
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Make a song that sounds like Bruno Mars",
         lyrics_about="dancing alone",
         selected_artists=["Bruno Mars"],
         tags=["pop"],
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
-    assert llm.calls == 2
-    assert result["debug_info"]["repaired"] is True
+
+    assert llm.calls == 4
     assert "bruno" not in result["suno_prompt"].lower()
 
 
-def test_falls_back_after_two_failed_repairs():
-    # Provide three invalid outputs (initial + 2 repairs). Builder should return an error (no fallback).
-    invalid = "SUNO PROMPT\nblah\n"
-    llm = FakeLLM([invalid, invalid, invalid])
+def test_style_branch_proceeds_after_max_repairs():
+    """After exhausting repairs, style branch proceeds with issues."""
+    invalid_style = "SUNO PROMPT\nblah\n"  # Missing EXCLUDE
+    llm = FakeLLM(
+        [
+            invalid_style,  # #1: style.generate (bad)
+            invalid_style,  # #2: style.repair.1 (still bad)
+            invalid_style,  # #3: style.repair.2 (still bad)
+            _valid_profile_output(),  # #4: lyrics.profile_infer
+            _valid_lyrics_output(),  # #5: lyrics.generate
+            _style_name_output(),  # #6: style.name_generate
+        ]
+    )
     builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Make a song that sounds like Will.I.Am",
         lyrics_about="robots",
         selected_artists=["Will.I.Am"],
         tags=["electropop"],
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
-    assert llm.calls == 3  # initial + 2 repairs
-    assert result["success"] is False
-    assert "issues" in result and result["issues"]
+
+    assert llm.calls == 6  # style(3) + profile + lyrics + name
+    # Result is still returned (two-step doesn't return error)
+    assert "suno_prompt" in result
+    spans = result["debug_info"]["spans"]
+    repair_spans = [s for s in spans if "repair" in s["name"]]
+    assert len(repair_spans) == 2
 
 
 # ---------------------------------------------------------------------------
@@ -342,100 +466,83 @@ def test_falls_back_after_two_failed_repairs():
 # ---------------------------------------------------------------------------
 
 
-def test_repair_disabled_skips_repair_attempts():
-    """When agent_repair_enabled=False, no repair attempts are made."""
-    # First output is invalid (missing sections)
-    invalid = "SUNO PROMPT\nblah\n"
-    llm = FakeLLM([invalid])
-    settings = Settings(
-        spotify_client_id="test",
-        openai_api_key="test",
-        agent_repair_enabled=False,
+def test_zero_max_repairs_skips_repair_attempts():
+    """When agent_max_repairs=0, style branch skips repair attempts."""
+    bad_style = "SUNO PROMPT\nblah\n"  # Missing EXCLUDE
+    llm = FakeLLM(
+        [
+            bad_style,  # #1: style.generate (bad, no repair)
+            _valid_profile_output(),  # #2: lyrics.profile_infer
+            _valid_lyrics_output(),  # #3: lyrics.generate
+            _style_name_output(),  # #4: style.name_generate
+        ]
     )
+    settings = _settings(agent_max_repairs=0)
     builder = AgentPromptGraph(settings, llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="test",
         lyrics_about="test",
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
-    # Should only call LLM once (no repairs), then return error
-    assert llm.calls == 1
-    assert result["success"] is False
-    assert "issues" in result and result["issues"]
+    assert llm.calls == 4  # style(1) + profile + lyrics + name
+    spans = result["debug_info"]["spans"]
+    repair_spans = [s for s in spans if "repair" in s["name"]]
+    assert len(repair_spans) == 0
 
 
 def test_custom_max_repairs_is_respected():
     """When agent_max_repairs is set to a custom value, it's respected."""
-    invalid = "SUNO PROMPT\nblah\n"
-    # Provide enough invalid outputs for 5 repairs
-    llm = FakeLLM([invalid] * 6)
-    settings = Settings(
-        spotify_client_id="test",
-        openai_api_key="test",
-        agent_repair_enabled=True,
-        agent_max_repairs=5,
-    )
-    builder = AgentPromptGraph(settings, llm=llm)
-    req = AdvancedGenerateRequest(
-        user_prompt="test",
-        lyrics_about="test",
-    )
-
-    result = asyncio.run(builder.generate(req))
-
-    # Should call LLM 6 times: initial + 5 repairs
-    assert llm.calls == 6
-    assert result["success"] is False
-    assert "issues" in result and result["issues"]
-
-
-def test_zero_max_repairs_goes_straight_to_fallback():
-    """When agent_max_repairs=0, invalid output immediately returns error (no fallback)."""
-    invalid = "SUNO PROMPT\nblah\n"
-    llm = FakeLLM([invalid])
-    settings = Settings(
-        spotify_client_id="test",
-        openai_api_key="test",
-        agent_repair_enabled=True,
-        agent_max_repairs=0,
-    )
+    invalid_style = "SUNO PROMPT\nblah\n"  # Missing EXCLUDE
+    llm = FakeLLM(
+        [invalid_style] * 6  # style: initial + 5 repairs
+        + [_valid_profile_output()]  # lyrics.profile_infer
+        + [_valid_lyrics_output()]  # lyrics.generate
+        + [_style_name_output()]  # style.name_generate
+    )
+    settings = _settings(agent_max_repairs=5)
     builder = AgentPromptGraph(settings, llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="test",
         lyrics_about="test",
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
-    # Only 1 LLM call (initial), then immediate error
-    assert llm.calls == 1
-    assert result["success"] is False
-    assert "issues" in result and result["issues"]
+    # 6 style calls + 2 lyrics + 1 name = 9
+    assert llm.calls == 9
+    spans = result["debug_info"]["spans"]
+    repair_spans = [s for s in spans if "repair" in s["name"]]
+    assert len(repair_spans) == 5
 
 
-def test_debug_info_includes_repair_config():
-    """Debug info includes repair_enabled and max_repairs from config."""
-    output = _valid_output()
-    llm = FakeLLM([output])
-    settings = Settings(
-        spotify_client_id="test",
-        openai_api_key="test",
-        agent_repair_enabled=True,
-        agent_max_repairs=3,
-    )
+def test_debug_info_has_trace_format():
+    """Debug info uses DebugTrace format with summary and spans."""
+    llm = FakeLLM(_happy_path_responses())
+    settings = _settings(agent_max_repairs=3)
     builder = AgentPromptGraph(settings, llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="test",
         lyrics_about="test",
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
-    assert result["debug_info"]["repair_enabled"] is True
-    assert result["debug_info"]["max_repairs"] == 3
-    assert result["debug_info"]["repaired"] is False
+    debug = result["debug_info"]
+    # DebugTrace v1 structure
+    assert debug["version"] == 1
+    assert "summary" in debug
+    assert "spans" in debug
+    summary = debug["summary"]
+    assert summary["variant"] == "v5_hybrid"
+    assert summary["architecture"] == "two_step"
+    assert summary["repairs"] == 0
+    assert summary["success"] is True
+    assert summary["llm_calls"] >= 2
 
 
 # ---------------------------------------------------------------------------
@@ -443,85 +550,47 @@ def test_debug_info_includes_repair_config():
 # ---------------------------------------------------------------------------
 
 
-def _valid_style_output(
-    suno_prompt: str = "Funky pop, crisp drums, bright bass",
-    exclude: str = "cheesy, country",
-    weirdness: int = 50,
-    style_influence: int = 60,
-) -> str:
-    """Valid output for the style branch (two-step variants)."""
-    return (
-        f"SUNO PROMPT\n{suno_prompt}\n\n"
-        f"EXCLUDE\n{exclude}\n\n"
-        f"WEIRDNESS\n{weirdness}\n\n"
-        f"STYLE INFLUENCE\n{style_influence}\n"
-    )
-
-
 def test_instrumental_with_blank_lyrics_about_returns_empty_lyrics():
     """When lyrics_about is blank, instrumental mode returns empty lyrics."""
-    # Style output + title output (no lyrics branch should run)
-    style_output = _valid_style_output()
-    title_output = "The Last Horizon"
-    llm = FakeLLM([style_output, title_output])
-    settings = Settings(
-        spotify_client_id="test",
-        openai_api_key="test",
-    )
-    builder = AgentPromptGraph(settings, llm=llm)
+    llm = FakeLLM(_instrumental_responses(title="The Last Horizon"))
+    builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Epic orchestral soundtrack",
-        lyrics_about="",  # Empty = instrumental
-        prompt_variant="v5_hybrid",  # Two-step variant
+        lyrics_about="",
+        prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
-    # Should return empty lyrics
     assert result["lyrics"] == ""
-    # Should have a creative title from LLM
     assert result["concept_title"] == "The Last Horizon"
-    # Should have valid style output
     assert result["suno_prompt"] == "Funky pop, crisp drums, bright bass"
     assert result["exclude"] == "cheesy, country"
     assert result["weirdness"] == 50
     assert result["style_influence"] == 60
-    # 2 LLM calls: style + title (no lyrics)
-    assert llm.calls == 2
+    assert llm.calls == 3  # style + title + style_name
 
 
 def test_instrumental_with_keyword_returns_empty_lyrics():
     """When lyrics_about contains 'instrumental', returns empty lyrics."""
-    style_output = _valid_style_output()
-    title_output = "Drift"
-    llm = FakeLLM([style_output, title_output])
-    settings = Settings(
-        spotify_client_id="test",
-        openai_api_key="test",
-    )
-    builder = AgentPromptGraph(settings, llm=llm)
+    llm = FakeLLM(_instrumental_responses(title="Drift"))
+    builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Ambient electronic",
-        lyrics_about="instrumental track",  # Keyword triggers instrumental mode
+        lyrics_about="instrumental track",
         prompt_variant="v5_hybrid",
     )
 
     result = asyncio.run(builder.generate(req))
 
     assert result["lyrics"] == ""
-    assert llm.calls == 2  # Style + title
+    assert llm.calls == 3  # style + title + style_name
 
 
 def test_instrumental_with_no_vocals_keyword_returns_empty_lyrics():
     """When lyrics_about contains 'no vocals', returns empty lyrics."""
-    style_output = _valid_style_output()
-    title_output = "Velvet Thunder"
-    llm = FakeLLM([style_output, title_output])
-    settings = Settings(
-        spotify_client_id="test",
-        openai_api_key="test",
-    )
-    builder = AgentPromptGraph(settings, llm=llm)
+    llm = FakeLLM(_instrumental_responses(title="Velvet Thunder"))
+    builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Jazz fusion",
         lyrics_about="no vocals, just instruments",
@@ -531,22 +600,16 @@ def test_instrumental_with_no_vocals_keyword_returns_empty_lyrics():
     result = asyncio.run(builder.generate(req))
 
     assert result["lyrics"] == ""
-    assert llm.calls == 2  # Style + title
+    assert llm.calls == 3  # style + title + style_name
 
 
 def test_instrumental_with_tag_returns_empty_lyrics():
     """When tags include 'instrumental', returns empty lyrics."""
-    style_output = _valid_style_output()
-    title_output = "Through Glass Canyons"
-    llm = FakeLLM([style_output, title_output])
-    settings = Settings(
-        spotify_client_id="test",
-        openai_api_key="test",
-    )
-    builder = AgentPromptGraph(settings, llm=llm)
+    llm = FakeLLM(_instrumental_responses(title="Through Glass Canyons"))
+    builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Post-rock soundscape",
-        lyrics_about="the sunset",  # Non-empty, but tag overrides
+        lyrics_about="the sunset",
         tags=["instrumental", "post-rock"],
         prompt_variant="v5_hybrid",
     )
@@ -554,19 +617,13 @@ def test_instrumental_with_tag_returns_empty_lyrics():
     result = asyncio.run(builder.generate(req))
 
     assert result["lyrics"] == ""
-    assert llm.calls == 2  # Style + title
+    assert llm.calls == 3  # style + title + style_name
 
 
 def test_instrumental_debug_trace_includes_skipped_span():
     """Instrumental mode includes a lyrics.skipped span in debug trace."""
-    style_output = _valid_style_output()
-    title_output = "Midnight in Kyoto"
-    llm = FakeLLM([style_output, title_output])
-    settings = Settings(
-        spotify_client_id="test",
-        openai_api_key="test",
-    )
-    builder = AgentPromptGraph(settings, llm=llm)
+    llm = FakeLLM(_instrumental_responses(title="Midnight in Kyoto"))
+    builder = AgentPromptGraph(_settings(), llm=llm)
     req = AdvancedGenerateRequest(
         user_prompt="Cinematic score",
         lyrics_about="",
@@ -575,7 +632,6 @@ def test_instrumental_debug_trace_includes_skipped_span():
 
     result = asyncio.run(builder.generate(req))
 
-    # Check debug trace has lyrics.skipped span
     debug_info = result.get("debug_info", {})
     spans = debug_info.get("spans", [])
     skipped_spans = [s for s in spans if s.get("name") == "lyrics.skipped"]
@@ -595,15 +651,21 @@ def test_is_instrumental_request_helper():
     assert AgentPromptGraph._is_instrumental_request(req2) is True
 
     # Test "instrumental" keyword
-    req3 = AdvancedGenerateRequest(user_prompt="test", lyrics_about="an instrumental piece")
+    req3 = AdvancedGenerateRequest(
+        user_prompt="test", lyrics_about="an instrumental piece"
+    )
     assert AgentPromptGraph._is_instrumental_request(req3) is True
 
     # Test "no vocals" keyword
-    req4 = AdvancedGenerateRequest(user_prompt="test", lyrics_about="no vocals needed")
+    req4 = AdvancedGenerateRequest(
+        user_prompt="test", lyrics_about="no vocals needed"
+    )
     assert AgentPromptGraph._is_instrumental_request(req4) is True
 
     # Test "no lyrics" keyword
-    req5 = AdvancedGenerateRequest(user_prompt="test", lyrics_about="no lyrics please")
+    req5 = AdvancedGenerateRequest(
+        user_prompt="test", lyrics_about="no lyrics please"
+    )
     assert AgentPromptGraph._is_instrumental_request(req5) is True
 
     # Test instrumental tag
@@ -613,7 +675,9 @@ def test_is_instrumental_request_helper():
     assert AgentPromptGraph._is_instrumental_request(req6) is True
 
     # Test non-instrumental request
-    req7 = AdvancedGenerateRequest(user_prompt="test", lyrics_about="love and heartbreak")
+    req7 = AdvancedGenerateRequest(
+        user_prompt="test", lyrics_about="love and heartbreak"
+    )
     assert AgentPromptGraph._is_instrumental_request(req7) is False
 
     # Test non-instrumental with tags
diff --git a/backend/tests/test_fix_gen_bugs.py b/backend/tests/test_fix_gen_bugs.py
index 6141dc3..8febf90 100644
--- a/backend/tests/test_fix_gen_bugs.py
+++ b/backend/tests/test_fix_gen_bugs.py
@@ -454,7 +454,7 @@ class TestVocabularyRules:
     def test_overused_words_flagged_in_spec(self):
         from app.prompts.specs import LYRICS_SPEC
 
-        assert "Avoid overusing" in LYRICS_SPEC
+        assert "NEVER use 3 or more" in LYRICS_SPEC
         for word in ["silver", "velvet", "neon", "shattered", "crimson", "golden"]:
             assert word in LYRICS_SPEC, f"Overused word '{word}' missing from LYRICS_SPEC"
 
@@ -494,3 +494,108 @@ def test_repair_agent_has_varied_lines_rule(self):
         from app.prompts.specs import LYRICS_REPAIR_AGENT
 
         assert "Each line in a chorus must be distinct" in LYRICS_REPAIR_AGENT
+
+
+# ============================================================================
+# PR 5: Overused word validation
+# ============================================================================
+
+
+class TestOverusedWordValidation:
+    """Test _check_overused_words detects banned generic poetic words."""
+
+    def _check(self, lyrics: str) -> list[str]:
+        from app.services.agent_prompt_graph import AgentPromptGraph
+
+        return AgentPromptGraph(MagicMock())._check_overused_words(lyrics)
+
+    def test_no_banned_words_passes(self):
+        lyrics = """[Verse]
+Walking down the highway
+Wind against my face
+Truck is running steady
+Heading for that place"""
+        assert self._check(lyrics) == []
+
+    def test_one_banned_word_passes(self):
+        lyrics = """[Verse]
+Golden sunset falling
+Over fields of grain"""
+        assert self._check(lyrics) == []
+
+    def test_two_banned_words_passes(self):
+        lyrics = """[Verse]
+Golden light through shadows
+Dancing on the wall"""
+        assert self._check(lyrics) == []
+
+    def test_three_banned_words_flagged(self):
+        lyrics = """[Verse]
+Golden whisper through the shadows
+Falling into darkness"""
+        issues = self._check(lyrics)
+        assert len(issues) == 1
+        assert "generic poetic words" in issues[0]
+
+    def test_five_banned_words_flagged(self):
+        lyrics = """[Verse]
+Silver moonlight through velvet shadows
+Crimson embers whisper low"""
+        issues = self._check(lyrics)
+        assert len(issues) == 1
+        for word in ["silver", "velvet", "shadows", "crimson", "embers"]:
+            assert word in issues[0]
+
+    def test_case_insensitive(self):
+        lyrics = """[Verse]
+GOLDEN light through SHADOWS
+WHISPER in the night"""
+        issues = self._check(lyrics)
+        assert len(issues) == 1
+
+    def test_banned_word_as_substring_not_counted(self):
+        """'whispering' should not match 'whisper'."""
+        lyrics = """[Verse]
+Whispering wind through shadowed halls
+Golden sunrise on the wall"""
+        # "whispering" != "whisper", "shadowed" != "shadows"
+        # Only "golden" is an exact match → passes
+        assert self._check(lyrics) == []
+
+
+# ============================================================================
+# PR 5: concept_title fallback
+# ============================================================================
+
+
+class TestConceptTitleFallback:
+    """Test _derive_title fallback when lyrics branch returns empty title."""
+
+    def test_derive_title_from_lyrics_about(self):
+        from app.services.agent_prompt_graph import AgentPromptGraph
+
+        agent = AgentPromptGraph(MagicMock())
+        title = agent._derive_title("fast punk rock", "corporate greed and rebellion")
+        assert title  # Not empty
+        assert len(title) <= 50
+
+    def test_derive_title_from_user_prompt_when_no_lyrics_about(self):
+        from app.services.agent_prompt_graph import AgentPromptGraph
+
+        agent = AgentPromptGraph(MagicMock())
+        title = agent._derive_title("indie rock with jangly guitars", "")
+        assert title  # Not empty
+        assert "Indie" in title or "indie" in title.lower()
+
+    def test_derive_title_returns_untitled_for_empty_input(self):
+        from app.services.agent_prompt_graph import AgentPromptGraph
+
+        agent = AgentPromptGraph(MagicMock())
+        title = agent._derive_title("", "")
+        assert title == "Untitled"
+
+    def test_vocabulary_spec_has_stronger_language(self):
+        from app.prompts.specs import LYRICS_SPEC
+
+        assert "NEVER use 3 or more" in LYRICS_SPEC
+        assert "validation will reject" in LYRICS_SPEC
diff --git a/benchmarks/README.md b/benchmarks/README.md
new file mode 100644
index 0000000..2d2ec0d
--- /dev/null
+++ b/benchmarks/README.md
@@ -0,0 +1,30 @@
+# Benchmarks
+
+This directory stores timestamped results from `/test-quality` and `/test-perf` skill runs. Each file captures a snapshot so we can detect regressions over time.
+
+## File naming
+
+- `perf-YYYY-MM-DD.md` — latency benchmarks
+- `quality-YYYY-MM-DD.md` — generation quality assessments
+
+## How to add a result
+
+After running `/test-quality` or `/test-perf`, save the report here with today's date. If multiple runs happen on the same day, append a suffix (e.g., `perf-2026-02-21-b.md`).
+
+## What to track
+
+### Performance (`perf-*.md`)
+- `/generate/input-concept` latency (5 calls, min/max/avg)
+- `/generate/advanced` latency (3 calls, min/max/avg)
+- `/generate/refine` latency (3 calls, min/max/avg)
+- Git commit hash or branch name
+
+### Quality (`quality-*.md`)
+- Number of songs generated
+- Banned word appearances (count and which words)
+- Chorus repetition issues (count)
+- Empty/bad `concept_title` count
+- Missing section tags count
+- Stage directions in lyrics count
+- Lines ending in periods count
+- Git commit hash or branch name
diff --git a/benchmarks/perf-2026-02-21.md b/benchmarks/perf-2026-02-21.md
new file mode 100644
index 0000000..719e050
--- /dev/null
+++ b/benchmarks/perf-2026-02-21.md
@@ -0,0 +1,39 @@
+# Performance Benchmark — 2026-02-21
+
+**Branch**: `calderlund--fix-edit-spacing`
+**Commit**: pre-commit (test reliability + agent guardrails)
+**Stack**: local dev server (not Docker)
+
+## `/generate/input-concept` (5 calls, target <2s)
+
+| Call | Latency |
+|------|---------|
+| 1 | 0.007s |
+| 2 | 0.005s |
+| 3 | 0.004s |
+| 4 | 0.004s |
+| 5 | 0.003s |
+
+**Avg: 0.005s** | Min: 0.003s | Max: 0.007s | Target: <2s
+
+## `/generate/advanced` (3 calls, target <15s)
+
+| Call | Genre | Latency |
+|------|-------|---------|
+| 1 | Indie rock (non-instrumental) | 18.97s |
+| 2 | Lo-fi hip hop (non-instrumental) | 20.72s |
+| 3 | Orchestral (instrumental) | 8.83s |
+
+**Avg: 16.17s** | Min: 8.83s | Max: 20.72s | Target: <15s
+
+Non-instrumental is over target (~20s). Instrumental is within target (~9s).
+
+## `/generate/refine` (target <15s)
+
+Could not benchmark — refine endpoint returned `{"detail": "Failed to refine"}`. Likely missing DB/Redis in local dev setup (not Docker). Pre-existing environment issue.
+
+## Notes
+
+- input-concept is very fast (<10ms), well within target
+- advanced non-instrumental latency is ~20s, driven by LLM call time (4 sequential calls: style + profile + lyrics + style_name)
+- Instrumental is faster (~9s) because it only makes 3 calls and skips lyrics
diff --git a/benchmarks/quality-2026-02-21.md b/benchmarks/quality-2026-02-21.md
new file mode 100644
index 0000000..b069b31
--- /dev/null
+++ b/benchmarks/quality-2026-02-21.md
@@ -0,0 +1,55 @@
+# Quality Benchmark — 2026-02-21
+
+**Branch**: `calderlund--fix-edit-spacing`
+**Commit**: pre-commit (test reliability + agent guardrails)
+**Songs generated**: 2 (country, punk)
+
+## Banned/Overused Words
+
+Target: none of [silver, velvet, neon, shattered, whisper, shadows, echoes, crimson, golden, embers] in 3+ of 5 songs.
+
+| Song | Banned words found |
+|------|--------------------|
+| Country | whisper, golden, shadows |
+| Punk | none |
+
+**Result**: 1/2 songs had banned words (3 words in country). Would need 5-song run to properly assess the 3+ threshold.
+
+## Chorus Repetition
+
+Target: no chorus with same line 3+ times.
+
+| Song | Issue? |
+|------|--------|
+| Country | No |
+| Punk | No |
+
+**Result**: Pass
+
+## Style Names (`concept_title`)
+
+Target: short (<30 chars), descriptive.
+
+| Song | concept_title | Length | OK? |
+|------|---------------|--------|-----|
+| Country | "" (empty) | 0 | FAIL |
+| Punk | "" (empty) | 0 | FAIL |
+
+**Result**: FAIL — both songs returned empty concept_title. The style_name LLM call appears to not populate this field.
+
+## Structure
+
+Target: section tags present, no stage directions, no periods at end of lines.
+
+| Song | Has tags | Stage directions | Periods |
+|------|----------|-----------------|---------|
+| Country | Yes (with modifiers: `[Verse, earnest, reflective]`) | No | No |
+| Punk | Yes | No | No |
+
+**Result**: Pass — tags present with modifiers, no stage directions, no periods.
+
+## Notes
+
+- Empty `concept_title` is a pre-existing issue — the style_name generation step isn't populating it in the `/generate/advanced` response
+- Country song vocabulary leans on common words (whisper, golden, shadows) — the vocabulary rules in LYRICS_SPEC may need strengthening for country genre
+- Only 2 songs tested; a full 5-genre run would give better signal