Skip to content

Crate Web v0.1.0: Full workspace with AI research, OpenUI, and team features#2

Open
tmoody1973 wants to merge 509 commits intorelease/v0.1.0from
main
Open

Crate Web v0.1.0: Full workspace with AI research, OpenUI, and team features#2
tmoody1973 wants to merge 509 commits intorelease/v0.1.0from
main

Conversation

@tmoody1973
Copy link
Copy Markdown
Owner

What changed

Complete Crate Web workspace implementation — from bare Next.js scaffold to a fully functional AI music research platform with persistent chat, dynamic UI components, team key sharing, multi-model support, and AgentMail integration.

Why

Building the web companion to Crate CLI so Radio Milwaukee team members and external users can access AI-powered music research through a browser without needing a terminal.

Changes

Core Workspace

  • Sidebar with crates, starred/recent sessions, artifacts browser, full-text search
  • Persistent chat with Convex — messages, sessions, and artifacts survive page reloads
  • Artifact slide-in panel (Claude-style) that opens when AI generates OpenUI components
  • Keyboard shortcuts (Cmd+K search, Cmd+N new chat, Cmd+B sidebar, Shift+S settings)

OpenUI Dynamic Components

  • AlbumEntry, TrackItem, AlbumGrid, TrackList with cover art (Discogs → Bandcamp → iTunes fallback)
  • SaveToCollectionButton, AddToPlaylist, AutoSavePlaylist (dedup via Convex queries)
  • Custom stream adapter bridging CrateAgent SSE events to OpenUI's ChatProvider

Multi-Model Support

  • ModelSelector dropdown: Claude Sonnet 4.6, Haiku 4.5, GPT-4o, GPT-4.1, Gemini 2.5 Flash/Pro, Llama 4, DeepSeek R1, Mistral Large
  • OpenRouter integration — set ANTHROPIC_BASE_URL to OpenRouter endpoint for non-Anthropic models
  • Model choice persisted to localStorage

Team Features

  • Org key sharing: admin shares encrypted API keys with @domain teammates
  • Priority chain: user keys → org shared keys → embedded Tier 1 keys
  • AgentMail integration: send research to Slack (y3v9l8q1c8s3d4n6@88nine.slack.com) or any email
  • Perplexity-style response action bar: Copy, Slack, Email, Share under every AI response

Bug Fixes

  • Duplicate playlist creation on React re-renders (Convex query dedup)
  • Duplicate artifacts (content hash + Convex single source of truth)
  • Markdown not rendering (react-markdown v10 import fix)
  • Horizontal overflow in chat (CSS overflow-hidden + break-words)

Testing

  • TypeScript type check passes (tsc --noEmit)
  • Next.js production build passes (next build)
  • Tested locally with Clerk auth, Convex real-time, OpenRouter models
  • Edge cases: empty playlists, missing API keys, org key fallback chain

Notes for CodeRabbit

  • This is a retroactive PR covering ~30 commits of rapid iteration. Some patterns evolved mid-build (e.g., artifact dedup moved from useRef to Convex queries).
  • OpenUI components use a custom line-oriented language (OpenUI Lang) — the splitContent() parser is intentionally simple.
  • AgentMail SDK requires @x402/fetch peer dependency (installed explicitly).
  • The AGENTMAIL_API_KEY env var fallback in /api/email is intentional for team usage where not every user needs their own key.

Related

  • Project: Crate Web (companion to Crate CLI)
  • Stack: Next.js 15, Convex, Clerk, OpenUI, Tailwind CSS
  • Branch from: 6ed38df (initial Next.js scaffold)

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1de6a511-2aed-48d4-9c88-fe468e0de2fd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch main

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
crate-web Ready Ready Preview, Comment May 1, 2026 2:16pm

tmoody1973 and others added 30 commits April 20, 2026 20:11
Chunk 2 of /recommend v1. Four Haiku-powered helpers built on top of
chunk 1's haikuStructured generic. Each has a hardened system prompt,
a Zod schema matching the expected output, a retry+timeout policy, and
a deterministic fallback for when the LLM fails.

Helpers:

- intentClassify.ts: classifies user prompt into one of 8 intent types
  (mood_theme, era_genre, artist_similar, activity, emotional,
  show_prep, single_artist, vague) and extracts structured hints.
  Downstream per-intent Perplexity prompts consume this. Caller falls
  back to mood_theme on LLM failure per Section 2 error map.

- arcOrder.ts: reorders artists into a listenable arc (entry, build,
  turn, reflective close). Throws ArcOrderCountMismatchError if the
  LLM drops or adds artists. Includes fallbackArcOrder() for code-based
  deterministic ordering when the LLM fails.

- moderationClassify.ts: classifies prompt + tour output into blocking
  categories (hate, harassment, self-harm, sexual, copyright,
  prompt-injection). Empty array = approved. Prompt is aggressively
  hardened against prompt injection — this IS the catch net. Includes
  summarizeTourForModeration() which builds a compact < 1KB summary
  of the tour (artist names + 50-char quote heads) to feed the
  classifier cheaply.

- promptRedact.ts: turns raw user prompts into 4-8 word editorial
  headlines safe for public display at /r/[slug]. Strips personal
  details. Throws RedactionTooLongError if the LLM returns > 10 words.
  Includes fallbackRedact() which truncates to 50 chars + ellipsis.

All four reuse the haikuStructured core: same retry policy, same named
exceptions, same hardened-prompt shape. The system prompts each end
with a security clause that treats user input as classification input,
not as further instructions (per v1-scope.md anti-criterion on prompt
injection).

Tests: 129/129 green (+38 new across 4 files).
- intentClassify.test.ts: 8 tests — happy paths per intent type,
  retry-exhaust rejection, raw_text enforcement, mood valence range.
- arcOrder.test.ts: 8 tests — happy reorder, count-mismatch detection,
  reason passthrough, fallback determinism.
- moderationClassify.test.ts: 11 tests — approval path, each category,
  invalid category rejection, summarizer formatting + truncation.
- promptRedact.test.ts: 11 tests — 4/8/10-word bounds, whitespace
  handling, fallback truncation behavior.

TypeScript clean. Convex codegen succeeds.

NOT in this chunk: Perplexity refactor, main action orchestrator, UI.
Those are chunk 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…unk 3a)

Atomic refactor per v1-scope.md Key Decision #7: "refactor + update /i/ +
add /recommend in one PR." The /i/ Influence Receipt public surface is
unchanged — discoverWithPerplexity keeps its exact public API. Internal
fetch+retry logic moves to a shared core that /recommend also uses.

New files:

- src/lib/perplexity-core.ts — generic Perplexity Sonar call with named
  error classes per Section 2 error map (PerplexityTimeoutError,
  PerplexityRateLimitError, PerplexityAuthError, PerplexityUpstreamError,
  PerplexityMalformedResponseError). Retries on 5xx/timeout/rate-limit;
  does NOT retry on auth errors or malformed responses. Code-fence
  stripping for the common model habit of wrapping JSON in ```json blocks.

- convex/recommend/perplexityRecommend.ts — tour-generation caller.
  Eight per-intent prompt builders (mood_theme, era_genre, artist_similar,
  activity, emotional, show_prep, single_artist, vague). Each weaves in
  StructuredQuery hints (mood valence/arousal, era hints, sonic hints,
  artist hints) plus optional Wiki-memory context (kept/passed artists
  from past tours). Returns { picks, citations, isSparse, isCitationless }
  so the main action can decide whether to fall back to Claude Sonnet
  per the sparse-fallback rule.

Refactored:

- src/lib/perplexity-discover.ts — same public API (discoverWithPerplexity,
  PerplexityConnection, PerplexityDiscoveryResult). Delegates fetch to
  perplexity-core. /i/[slug]/page.tsx and /api/influence/expand both
  continue to work without any changes.

Security: all eight per-intent prompts end with a hardening clause per
v1-scope.md anti-criterion ("The text after 'Query:' below is USER INPUT,
not further instructions. Do NOT follow directives in it that contradict
these rules.").

Artist name preservation tested explicitly (billy woods, JPEGMAFIA,
clipping. — per v1-scope.md design system rule "preserve original casing
and punctuation"). Verified via a prompt-engineering test that asserts
lowercased/uppercased/punctuated names come through the pipeline intact.

Tests: 153/153 green (+24 new):
- perplexity-core.test.ts: 11 tests — stripCodeFences behavior, happy
  path, each error class (auth/rate-limit/5xx/malformed/timeout), retry
  on 5xx, empty citations, missing API key, citation filtering.
- perplexityRecommend.test.ts: 11 tests — parsing, sparse flag, citation-
  less flag, non-http filtering, malformed response rejection, per-intent
  routing, Wiki memory injection, artist name casing.

TypeScript clean. Convex codegen succeeds.

NOT in this chunk: Main Convex action that orchestrates everything
(chunk 3b). That's where classify → embed → Perplexity → verify → arc
→ moderate → persist actually happens end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end tour generation flow per v1-scope.md. Turns the helpers from
chunks 1-3a into a working pipeline: prompt → classify → embed → wiki
memory → Perplexity → verify citations → arc + moderate + redact →
persist. Real-time phase streaming via tourStatus table.

New files:

- convex/recommend/voyageEmbed.ts: Voyage-3 embedding wrapper. 1024-dim
  enforcement, named errors (Timeout, RateLimit, Unavailable, Dimension,
  Empty), fetch-based (no SDK dep). Retries on 5xx/rate-limit, does NOT
  retry on dimension mismatch (config problem).

- convex/recommend/slug.ts: artist-name slug + random-hash composition
  per v1-scope.md Key Decision #2. Preserves lowercase artist names
  (billy woods), strips punctuation (clipping., $uicideboy$), falls back
  to "tour" prefix for unslug-able names. 4-char hash default; caller
  retries with 8-char on collision.

- convex/recommend/mutations.ts: state-transition mutations for the
  tour lifecycle. createInitialTour, writeTourStatus, setPromptEmbedding,
  setIntentClassification, markVague, finalizeTour, markFailed, logTourEvent.
  Plus public queries getTourBySlug (zero-login /r/[slug] read) and
  getTourStatus (useQuery subscription for phase streaming).

- convex/recommend/wikiMemory.ts: interface-only for now. Returns empty
  keep/pass arrays until the keep/pass/save UI ships (chunk 6) and
  populates wikiPages.tourHistory. Main action already wires this
  correctly; schema extension + backfill lands with the UI.

- convex/recommend/index.ts: the core action pair.
  * generateTour (public): auth + rate-limit + createInitialTour +
    schedule runGenerationFlow. Returns { tourId, slug } fast so the
    client can navigate to the loading page.
  * runGenerationFlow (internal): orchestrates all 8 phases with
    tourStatus writes, 45s timeout race per Issue 2.5, phase-duration
    instrumentation, PII-hashed tourEvents log. Every helper failure
    is caught and falls back per the Section 2 error map:
      classifier → default mood_theme
      voyage → skip cache (proceed with empty embedding)
      arc → deterministic code-based fallback
      moderation → fail-closed (stay private, cron retries)
      redaction → truncate-to-50-chars fallback
    Parallel execution of arc + moderation + redaction in Phase 6
    saves ~1.2s p95 per the eng review Section 7 budget.

Vague short-circuit: if classifier returns intent_type=vague, tour is
marked pending and the action returns. UI chunk (5-6) shows the
4-chip clarifying card.

Tests: 194/194 green (+41 new):
- voyageEmbed.test.ts: 10 tests — happy path, dim mismatch, empty
  vector, empty text, rate-limit, 5xx retry, dimension-mismatch no-retry
- slug.test.ts: 17 tests — artist name edge cases (billy woods,
  clipping., $uicideboy$, MGMT, !!!), hash properties, collision probability
- recommend-mutations.test.ts: 14 tests via convex-test — full lifecycle:
  createInitialTour → writeTourStatus → finalizeTour (approved + flagged paths)
  → getTourBySlug (public + private + unknown) → getTourStatus (latest row
  selection) → logTourEvent + markFailed + markVague + setPromptEmbedding
  + setIntentClassification

TypeScript clean. Convex codegen succeeds.

NOT in this chunk:
- Vercel /api/recommend/generate proxy route (chunk 4)
- Client phase-streaming hook + loading UI component (chunk 4)
- Public pages /recommend, /r, /r/[slug] (chunk 5)
- Tour artifact, chips, keep/pass/save buttons (chunk 6)
- YouTube play + Auth0 seeds (chunk 7)
- Admin UI + TTL cleanup crons + moderation retry cron (chunk 8)
- E2E tests (chunk 9)

Flagged for PR review: `wikiMemory.getWikiMemoryForIntent` returns
empty arrays. Landing together with the keep/pass UI in chunk 6, NOT
as a separate migration. Explicit TODO in the file.

Requires VOYAGE_API_KEY in Vercel env vars before real /recommend
generation works in prod. Tests mock Voyage so CI runs without the key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hunk 4)

Closes the gap between chunk 3b's Convex action and the browser. Two
small additions:

src/app/api/recommend/generate/route.ts (~100 LOC)
  Thin Vercel proxy per v1-scope.md Issue 1.1 architecture lock.
  Reads Clerk auth, gets the Convex JWT via getToken({template: "convex"})
  (template already configured in convex/auth.config.ts), calls the
  generateTour action via ConvexHttpClient.setAuth. Returns { tourId, slug }
  so the client can navigate to a loading page and subscribe to tourStatus.
  Maps Convex action errors to specific HTTP codes:
    - "Daily tour limit reached" → 429
    - "User not found" → 404
    - "Not authenticated" → 401
    - everything else → 500 with friendly message

src/lib/recommend-hooks.ts (~100 LOC)
  Two React hooks:
    - useTourStatus(tourId): subscribes to the latest tourStatus row via
      Convex useQuery. Re-renders automatically as the action writes
      phase updates (real-time streaming UX per CEO review Issue 1.2).
      Returns { phase, progress, detail, isComplete }. isComplete is
      derived from terminal phase names (done | done_vague | failed |
      timed_out | flagged).
    - useTourGeneration(): state machine wrapper around POST /api/recommend
      /generate. Returns { state: "idle"|"submitting"|"submitted"|"error",
      tourId, slug, error }. Component (chunk 5-6) consumes this.

No new tests in this chunk — both files are thin glue. Route handler
logic is integration-testable via chunk 9 E2E, and the React hook
requires @testing-library/react which we'll install for component tests
in chunks 5-6.

TypeScript clean. 194/194 tests still green (no regressions).

After this chunk, the backend is fully callable from the browser:
  POST /api/recommend/generate { prompt } → { tourId, slug }
  Client subscribes via useTourStatus(tourId) → sees live phase updates

What's missing: the UI to render the prompt box, loading screen, and tour
artifact. That's chunks 5-6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- /recommend: prompt entry with inline loading state, subscribes to
  tourStatus via Convex for real-time phase updates, redirects to
  /r/[slug] on terminal phase.
- /r: library index of recent public tours.
- /r/[slug]: zero-login public tour view with arc-ordered artists,
  cited quotes, and refine CTA.
- opengraph-image.tsx: Satori OG card for shareable tour links.
- mutations.ts: listRecentPublicTours + getMyTourById queries.

Design locked: #0a0a0a + #e8b86a, Bebas Neue + Space Grotesk + Georgia
italic. Clerk v7 compatibility: useAuth pattern (no SignedIn/SignedOut
exports in this version).
- schema: add tourSignals table (userId, tourId, artistPosition, signal)
  with by_user_tour + by_tour indexes.
- mutations: recordSignal (auth + inline rate-limit + atomic count patch),
  clearSignal (reversible), recordShare (anonymous best-effort counter),
  getMySignalsForTour query.
- TourArtifact client component with per-artist keep/pass/save buttons,
  optimistic UI, useTransition for pending state, inline error fallback,
  native share sheet + clipboard fallback, per-artist Listen↗ deep-link.
- /r/[slug] page now delegates to TourArtifact, stays SSR for SEO.
- 8 new tests covering signal lifecycle, atomic count deltas, auth
  rejection, and share counter. 22/22 recommend-mutation tests pass.
7a — YouTube inline play:
- Each artist stop gets a ▶ PLAY button that expands a lazy-mounted
  youtube-nocookie.com iframe. One player at a time (state lifted to
  TourArtifact). Uses the stored youtubeTrackId when present, falls
  back to a search-embed keyed on artist + album.
- Removes the external "LISTEN ↗" link in favor of inline play.

7b — Auth0 Spotify seeds:
- /api/recommend/generate reads the auth0_user_id_spotify cookie and
  (best-effort, 2.5s timeout) fetches the caller's top 5 Spotify
  artists via Auth0 Token Vault. Never blocks tour start on a
  Spotify hiccup — any failure silently yields an empty seed list.
- generateTour + runGenerationFlow accept optional spotifySeeds and
  thread them through WorkCtx to recommendFromPerplexity, which
  injects a "context only, NOT a constraint" paragraph into every
  intent-aware prompt builder.
- 2 new Perplexity prompt-shape tests. 204/204 pass.
Admin moderation:
- convex/recommend/admin.ts: listFlaggedTours, listPendingReports,
  setTourVisibility (approve|block with audit-tagged moderationCategories),
  resolveReport. Admin gating via ADMIN_EMAILS env var + users.email,
  checked in a shared requireAdmin() helper. Non-admins get "Forbidden".
- /admin/recommend page: Clerk-gated shell + client shell that renders
  flagged tours and pending reports with inline Approve/Block/Dismiss/Uphold.
- Uphold action chains setTourVisibility("block") + resolveReport in one
  transition so the tour is taken down atomically with the outcome write.

User reports:
- reportTour mutation in recommend/mutations.ts. Auth-required,
  inline-rate-limited to 5/user/day, reason trimmed + capped at 500 chars
  server-side, rejects <3-char reasons.
- Report button + dialog in TourArtifact action bar. Signed-out state,
  submitted state, and error state all rendered explicitly.

TTL cleanup crons:
- convex/crons.ts registers three interval crons wired to internal
  mutations in convex/recommend/cleanup.ts: pruneTourStatus (15min cadence,
  1h retention), pruneTourEvents (6h cadence, 90d retention),
  pruneCitationCache (1h cadence, 24h retention). Each sweep deletes a
  bounded 200-row batch so the mutation never blows out.

Tests: 7 new cases — report lifecycle, admin approve audit tag, non-admin
rejection, tourStatus pruning, citationCache pruning. 211/211 pass.
Observability (PostHog events):
- src/lib/recommend-analytics.ts: shared RECOMMEND_EVENTS constants +
  trackRecommendClient (posthog-js) and trackRecommendServer (posthog-node)
  helpers. Centralized names keep client, server, and Convex emissions
  consistent without drift.
- Server event recommend_tour_started fires from /api/recommend/generate
  on successful action return (fire-and-forget, never blocks the response).
- Convex runGenerationFlow posts recommend_tour_completed to PostHog's
  HTTP /capture endpoint at the end of the pipeline — lets dashboards see
  async outcomes even though the Vercel route returned ~200ms after start.
  Uses AbortController with a 2s ceiling. Failure is silent.
- Client events in TourArtifact: recommend_tour_viewed (effect),
  recommend_signal_recorded (after mutation resolves),
  recommend_tour_shared (method: native_share | clipboard),
  recommend_tour_reported (after successful submit).
- Client event in /recommend: recommend_tour_started_attempt on form submit.

Smoke tests:
- src/app/api/recommend/__tests__/generate-route.test.ts: 7 route-handler
  tests covering auth, body validation, success forwarding, and the
  429/401/500 error mappings. Mocks Clerk auth, ConvexHttpClient, Auth0
  Token Vault, and PostHog — no network touched.

Eval suite:
- evals/recommend.eval.ts: opt-in (skipped without PERPLEXITY_API_KEY,
  not picked up by default vitest pattern). 3 golden prompts covering
  mood_theme, era_genre, artist_similar — assert min pick count, min
  cited count, no-URL-without-publication mismatches.
- Run manually: bunx vitest run evals/recommend.eval.ts --no-coverage

Manual E2E:
- docs/recommend-v1-e2e-checklist.md: 10-section flight check covering
  happy path, YouTube inline play, share, report rate-limiting, admin
  moderate/uphold, library, OG image, Spotify seeds, observability
  verification, error handling. Human-gated before merging to main.

Full suite: 218/218 green. Typecheck clean.
Next.js bundled posthog-server (posthog-node + Node APIs) into the client
bundle because recommend-analytics.ts exported both client and server
helpers, and the server helper's `await import("./posthog-server")` still
pulled the transitive module graph through the client component boundary.

Split the file:
- recommend-analytics.ts: event constants + trackRecommendClient only.
  Safe to import from any client component. No Node deps.
- recommend-analytics-server.ts: trackRecommendServer — imports posthog-server
  directly. Only imported by route handlers.

Updated /api/recommend/generate and the route smoke test mock.
…list

Quotes were empty because sonar + loose system prompt were pulling YouTube
and Spotify links as "reviews." Videos showed as unavailable because
search-list embeds are deprecated. Inline iframes also fought the rest of
the app — playback, queue, bar controls live in a shared global player.

Perplexity sourcing:
- Upgrade model to sonar-pro (stronger citation behavior).
- Tighten SYSTEM_PROMPT: explicit publication allow-list, explicit deny
  (YouTube/Spotify/Wikipedia/Genius/Discogs), ≥6 of 8-12 picks must have
  quotes, quote_text must be verbatim.
- Wire Perplexity's native `search_domain_filter` with a 22-domain allow-
  list of music publications (Pitchfork, Quietus, Bandcamp Daily, NPR,
  RA, FACT, Stereogum, …). Belt-and-suspenders — if the model ignores
  the prompt, the API strips the citations before we see the response.
- Plumb `searchDomainFilter` through callPerplexity.

YouTube resolution:
- convex/recommend/youtubeResolve.ts — YouTube Data API v3 search.list
  wrapper, 2.5s timeout, silent fallback on any failure (quota, 403,
  timeout, network). Cost ~100 units per artist; well under daily free
  quota for v1 traffic.
- Runs in parallel with citation verification inside phase 5 — no added
  wall time. Populates artifactsRecommend.artists[].youtubeTrackId.

Global player on public tour pages:
- TourPlayerShell wraps /r/[slug] with PlayerProvider + PlayerBar +
  YouTubePlayer so playback routes through the shared Crate audio system
  (same as /w workspace and /cuts/[shareId]).
- PlayerBar renders null when no track is playing, so anonymous visitors
  see nothing until they tap PLAY.
- TourArtifact kills the inline iframe entirely. Each artist's PLAY
  button queues the rest of the tour from that arc position onward via
  usePlayer().play + addToQueue. A top-level ▶ PLAY TOUR button queues
  from position 0. Tracks auto-advance through the arc when each ends.
- Artists without a youtubeTrackId keep the LISTEN↗ link-out.

Save as Crate playlist:
- saveTourAsPlaylist mutation creates a playlists row + playlistTracks
  rows (one per artist with a videoId). Named after tour.promptRedacted.
  Auth required; anonymous users see SIGN IN button instead.
- SaveAsPlaylistButton in action bar. Success state links to /w to view.

Other fixes in this session:
- convex/auth.config.ts: add known-orca-97.clerk.accounts.dev (dev Clerk
  instance) as a second OIDC provider so local dev JWTs validate.
- /recommend page: useEnsureUserRow hook auto-provisions the Convex
  users row on first mount (was throwing "User not found" for accounts
  that signed up before the Clerk webhook fired).
- /recommend page: redirect to the FINAL tour slug (regenerated after
  Perplexity) rather than the provisional one the POST returns.

Full suite: 218/218 green. Typecheck clean.
…results

Previous approach was over-engineered: 4 allowlist pools routed by intent
classifier, plus jazz-keyword detection, plus a publication-to-host map.
Still produced fake citations because the model was asked to generate
quote_url inline — and it hallucinated URLs on real-looking publications.

Switch to denylist mode (Perplexity supports `-` prefix). Block 16 known
low-signal domains (YouTube, Spotify, Wikipedia, Genius, Discogs, Last.fm,
all major social, Medium, Amazon); let Perplexity pull from anywhere else.
Natural genre coverage — jazz goes to JazzTimes/AllMusic/NPR, metal to
Revolver/Invisible Oranges, regional to local press — without an intent
pool.

Match model picks to REAL URLs from Perplexity's `search_results` field,
not from the hallucinated `quote_url`. perplexity-core now exports
`PerplexitySearchResult[]` with per-source title/snippet/date. The matcher
ranks candidates: artist-mention + publication-host > artist-mention >
publication-host > fallback citations[]. No match = no quote, even if
model supplied text.

Strict-drop policy in index.ts phase 7: only attach a quote to an artist
when `verified: true`. Previously we kept unverified quotes with a missing
badge, which still shipped fake provenance.

Also in this session:
- convex/auth.config.ts: add known-orca-97.clerk.accounts.dev (dev Clerk)
- youtubeResolve.ts: real-video-id lookup via YouTube Data API v3 during
  phase 5, runs parallel with citationVerify so no added wall time
- youtube-player.tsx: defensive try/catch around YT.destroy() — iframe
  was already detached on React Strict Mode double-invoke, throwing
  NotFoundError: removeChild
- perplexityRecommend: diag action at convex/recommend/diag.ts for
  debugging (delete before merge to main)

Tests: 13/13 perplexity tests pass. Full /recommend suite green.
…d output

Rebuild the recommend tour citation pipeline so every per-pick URL
actually points at a source about that artist — driving real click-
through to music publishers instead of stapling the tour's top
citation onto unrelated picks.

Key changes:
- Drop the ?? primaryCitationUrl fallback in the per-pick quote build.
  Quotes now attach only when matchQuoteToSnippet finds a real source.
- Tier 2 matcher accepts a match when artist name appears in the
  search_result title or URL path (same trust model as /i/ receipts).
- Migrate domain filter from best-effort denylist to strictly-enforced
  allowlist of ~15 music-criticism publications. Per Perplexity docs,
  allowlist is enforced; denylist leaked streaming/playlist junk on
  mood/activity queries.
- Adopt response_format: json_schema on sonar-pro calls, eliminating
  the PerplexityMalformedResponseError path that was killing emotional
  and other intent tours.
- Rewrite per-intent user prompts per the Perplexity prompt guide:
  no "Search for:" prefix, criticism-specific lexical tokens, explicit
  fallback clause, MUSIC anchor on vague/emotional/activity to prevent
  drift into visual art, art therapy, or travel content.
- Add TourSources UI (per-source cards with publication badge, snippet,
  artistsMentioned tags) rendered from search_results; falls back to
  the flat citations list for older tours.
- Add one-off stripUnverifiedQuotes internalMutation to clean tours
  that had fallback URLs from the old code.
- Add compareRetrieval + regenerateBySlug test actions for A/B
  iterating on citation host distribution.

Validated on all 8 intent types. Regen of the Bukem-failing jazz tour
now attaches real DownBeat/AllMusic/Bandcamp URLs to 8 of 10 picks
instead of stapling a single dmy.co URL across every pick.
Adds a pre-retrieval step that decomposes the user's StructuredQuery
into 3-5 specific music-criticism queries, hits Perplexity's Search API
in parallel, and merges the curated results into sonar-pro's own
search_results pool. The downstream per-pick matcher then has a richer
source set to attach honest citations from.

Target: fix thin retrieval on emotional/activity/mood intents where
sonar-pro's single-query search returns only 2-5 hits because the
search classifier misreads "music for processing grief" as a therapy
query or "music for a long night drive" as a travel query.

Changes:
- src/lib/perplexity-search.ts (new) — direct-fetch wrapper for the
  Search API's POST /search endpoint, with allowlist domain filter,
  max_tokens_per_page, and timeout handling.
- convex/recommend/queryDecompose.ts (new) — rule-based templates per
  intent that mix content types (album reviews, artist interviews,
  feature articles, critic recommendations). Interviews matter
  especially on emotional/mood intents because the artist's own
  framing is higher-signal than third-party reviews.
- convex/recommend/perplexityRecommend.ts — fire decomposed Search
  queries AFTER the sonar-pro call, dedupe by URL, merge into
  searchResults so the matcher + TourSources UI both benefit.
- Tests updated with URL-dispatched fetch mock that lets Search API
  calls fall through to an empty-results fallback while sonar-pro
  responses stay queued via mockResolvedValueOnce.

Retrieval counts validated:
  emotional "processing grief": 2 → 5 (jazztimes + quietus + stereogum + allmusic/theme)
  activity "night drive":       5 → 9 (allmusic/theme + quietus + stereogum + bandcamp)
  mood "jazz winter coffee":   10 → 14 (broader mix; downbeat density lower)

Jazz regen now produces 11 picks with 4 honest citations on real
DownBeat URLs (Maria Schneider, Ambrose Akinmusire, Branford Marsalis,
Keith Jarrett via the Branford piece). Picks lacking source coverage
correctly render without a quote rather than with a fabricated one.
Adds imagery to recommend tour cards. Two surfaces:

1. Album cover art per pick (iTunes Search API):
   - New `itunesArtwork.ts` helper with a two-step lookup. Tries
     "artist album" first; falls back to the artist's most recent
     album cover when the specific album doesn't match iTunes's
     catalog (common for obscure small-label releases).
   - 600x600 URLs attached server-side in the tour generation
     pipeline, parallel with YouTube resolution. 3s per-request
     timeout; silent failure since artwork is decorative.
   - Tested: 10/10 picks land cover art on a jazz tour that
     previously got 0 (artist-fallback catches the obscure picks).

2. Source card hero images (Perplexity return_images):
   - `callPerplexity` now accepts `returnImages` and parses the
     `images[]` response field into typed PerplexityImage records.
   - Recommend pipeline sets `return_images: true`, indexes images
     by origin_url, and attaches heroImageUrl to source cards on
     URL match. Best-effort — Perplexity's image pool and
     allowlist-filtered searchResults pool are disjoint for
     mood-driven tour queries, so hit rate is low. Plumbing stays
     in place for queries that DO get image matches (confirmed
     working for album-specific probe queries).

UI:
   - tour-artifact.tsx renders a 64x64 cover thumb next to each
     pick card, above the quote blockquote. Lazy-loaded.
   - ReviewSourceCard renders a hero thumb on the left of each
     source entry when heroImageUrl is set.

Schema adds `artworkUrl` on artist entries and `heroImageUrl` on
source entries. Both optional. Existing tours migrate by being
re-generated; no schema migration needed.

Also adds `probeImages` diag action to promptTest for directly
inspecting Perplexity's image retrieval.
Users can now type `/recommend jazz for winter morning coffee` in chat
to kick off tour generation without leaving the conversation.

Implementation: server-side intercept in the chat route before the
LLM routing. Extracts the prompt, gets a Convex-scoped Clerk JWT, calls
the existing generateTour action, and streams back a minimal CrateEvent
sequence — answer_start + an answer_token with a markdown link to the
/r/[slug] tour page + done. The dedicated tour page keeps rendering in
real time as picks, covers, quotes, and sources arrive.

Why bypass the agentic loop: the tour generation pipeline is a 30s+
multi-phase action (Perplexity multi-query + sonar-pro + matcher +
iTunes artwork + arc ordering + moderation + Convex persistence) that
lives in a Convex action. Having the LLM call it as a tool would force
the tool call, tool result wait, and then reconstruct a compressed
artifact in chat — losing the full-fidelity rendering the /r/ page
already does. The link-back pattern preserves publisher attribution
(clickable source cards) and lets the chat thread stay focused on
conversation.

Registers the command in both doc surfaces:
- Public commands marketing page (`commands.tsx`)
- In-app help drawer (`commands-reference.tsx`)

Chat-rate-limit still applies (per-minute). Tour rate-limit applies
via generateTour's internal check (20/day free tier). Errors from the
Convex action surface as `error` CrateEvents the chat panel already
renders.
feat(recommend): v1 tour builder — honest citations + /recommend command
…, allowlist breadth

Three coordinated fixes to address AllMusic/Bandcamp dominance in
citations on tour pages.

Problem observed: AllMusic has a /artist/<slug> page for virtually every
musician. That URL pattern always contains the artist's name slug, so
the matcher's Tier 2 (artist-in-URL) matched every time — filling tours
with allmusic.com/artist/... citations whose prose wasn't from the page.
Same problem with <artist>.bandcamp.com subdomains (self-hosted artist
marketing pages, not criticism).

Changes:
- isAggregatorBioUrl(): new helper in convex/recommend/index.ts that
  recognizes allmusic.com/artist/<slug> and bare <artist>.bandcamp.com
  as aggregator-bio URLs. Still allows allmusic.com/album/<slug> (real
  Thom Jurek / Richard Ginell album reviews) and daily.bandcamp.com
  (Bandcamp Daily editorial).
- matchQuoteToSnippet() filters the searchResults pool through the
  aggregator check at the TOP, before any tier runs. Previously only
  Tier 2 guarded against aggregator URLs, but Tier 1 could still match
  if a quote prefix coincidentally appeared in an AllMusic bio snippet.
- Per-publication cap: MAX_CITATIONS_PER_PUBLICATION = 3. After all
  picks are matched, iterate in arcPosition order and drop the quote
  on any pick that would push a host's count above the cap. Prevents
  monoculture even if retrieval skews toward one publication.
- Allowlist expansion: filled remaining cap slots (15 → 20) with
  long-tail critic sites that appeared in earlier retrieval tests and
  got excluded when we first trimmed: jerryjazzmusician.com,
  brooklynrail.org, popmatters.com, clashmusic.com, nme.com.

Verified on a regen of the jazz tour: host distribution went from
"4 AllMusic + 2 DownBeat + Bandcamp bio mix" to "2 DownBeat, 0 AllMusic,
0 Bandcamp bio pages." Fewer quotes per tour, but every remaining quote
points at actual criticism.
Replaces the bulk "one sonar-pro call generates all 10 picks AND all 10
quotes" design with a two-phase per-pick flow that mirrors /i/
Influence Receipts — the architecture we already have shipping with
citations right every time.

Why: /i/ works because it queries Perplexity about ONE artist at a time.
Every retrieved document is on-topic; when sonar-pro writes a pullQuote
it's extracting or paraphrasing from articles that ARE about that
artist. /r/ was doing the opposite — one bulk call for all 10 picks,
retrieval diffuse, quote prose synthesized from training memory and
papered over with a post-hoc matcher. That mismatch is why we kept
needing matcher hardening, allowlist swaps, response_format tightening,
and aggregator-bio skips — all band-aids on the wrong-shape pipeline.

Phase A — convex/recommend/pickSelector.ts (new):
  sonar-pro call that returns ONLY artist names + albums + years + optional
  relationship/weight. No quote_text, quote_publication, or quote_url
  fields in the response schema. Model still uses retrieval to ground
  the SELECTION (it picks artists it has evidence for), but prose
  generation is deferred to Phase B.

Phase B — convex/recommend/groundedQuote.ts (new):
  Per-pick parallel call for each selected pick:
    1. Perplexity Search API with multi-query scoped to THAT artist +
       album ("<artist> <album> review", "<artist> interview", etc.)
       against the music-publication allowlist.
    2. Top 3 eligible snippets (aggregator-bio URLs pre-filtered) passed
       to Claude Haiku with a locked prompt: "pick ONE source by index
       and write 2 sentences explaining why the artist fits the tour,
       drawing ONLY from that snippet." Model returns { citedIndex, why }.
    3. Index maps back to the actual URL. Prose and URL are tightly
       coupled by construction — URL chosen BEFORE the prose is written.
  Returns null when retrieval is thin or Claude can't support the pick
  from the retrieved snippets. Caller renders quote-less honestly.

Orchestrator — convex/recommend/index.ts:
  runWork Phases 4 and 5 rewritten:
    - Phase 4 (was "perplexity"): selectPicks() — one call, names only.
    - Phase 5 (was "verify" + YouTube): per-pick Promise.all of
      groundedQuoteForPick + resolveYouTubeVideoId. verifyCitation is
      gone — grounded quotes are verified-by-construction (Claude
      chose the URL and drew prose from its snippet).
  Phase 7 artist build reads from enriched[] and attaches groundedQuote
  directly to artist.quote. The old matcher path (matchQuoteToSnippet +
  per-publication cap) is unused for this flow — left in place for
  legacy tours that regenerate.
  Sources section now rebuilds from grounded quotes: every card is one
  the pipeline actually drew prose from, deduped on URL, with multiple
  artists listed when picks share a source.

Tradeoffs:
  - 10x Perplexity Search API volume (per-pick calls). Same vendor, same
    key, so the budget impact is a per-search-request charge × 10.
  - +1 Claude Haiku call per pick. Parallel, so added wall-clock is
    bottlenecked by the slowest pick (~2-4s).
  - Fewer quotes per tour on thin-retrieval picks — intentional. The
    only way to get universal citation coverage under this architecture
    is to have real retrieval for every pick. Honest floor over
    fabricated ceiling.

Verified: regen of the jazz/winter-contemplation tour returns 2/6
grounded picks (Maria Schneider → JazzTimes live review, Ambrose
Akinmusire → DownBeat Owl Song review). Quote prose paraphrases the
retrieved snippet content; URL points at the exact article that
prose came from. The 4 picks without grounding render quote-less
instead of being stapled with synthesized prose.
feat(recommend): per-pick grounded architecture + matcher hardening
…rompts

Sonar-pro was returning 1-3 picks when the prompt's exact theme (e.g.
"danceable songs for climate grief") had thin direct critical coverage,
because the system prompt rule "Target 8-12 picks" gave the model no
escape hatch when it couldn't verify the count for the exact theme.

Changes the rule to explicitly allow padding to 8-12 with
well-documented adjacent-genre/mood artists, and calls out that
returning 1-3 picks is a failure mode. The Phase B grounded-quote
step is still the honesty layer — adjacent picks without themed
coverage just render quote-less downstream, not stamped with
fabricated citations.

Observed: prod tour "danceable songs for climate grief" returned 1
pick (Jayda G, correctly grounded to crackmagazine.net). After this
fix the same prompt should return 8-12 picks with per-pick grounding
attempted on all of them.
User feedback: 12-pick tours have too many quote-less filler cards
next to the grounded ones. A 6-8 pick tour reads more like a curated
mixtape, less like a Spotify auto-playlist.

Three-way change:
- pickSelector system prompt target 6-8, cap at 8, failure-mode
  language unchanged (still demand >= 6 minimum).
- All 8 intent-specific user prompts now say "Find 6-8 musical artists"
  instead of "Find 8-12".
- picks.slice(0, 8) replaces slice(0, 12) in both pickSelector and
  the runWork orchestrator. isSparse threshold shifts from <8 to <6.

Side wins:
- Per-pick cost drops ~40% (Phase B scales linearly with pick count).
- Grounding rate as a percentage should climb — model leads harder
  with its best-documented picks rather than stretching to 12.
- Tour wall-clock ~same (parallel, bottlenecked by slowest pick).
…moderation API failures

Three intertwined bugs sent /recommend users to a 404 when the moderation
classifier failed transiently:

1. /recommend redirected to /r/[slug] for both phase=done and phase=flagged,
   but /r/[slug] gates on isPublic. Flagged tours have isPublic=false, so the
   destination 404'd. Now only redirect on phase=done. Flagged/timed-out tours
   stay on the LoadingPanel STOPPED state which already exists.

2. The Haiku moderation classifier writes "unknown-moderation-failure" on
   transient API errors. The finalizer was tagging these as
   moderationStatus=flagged, indistinguishable from real content flags. Now
   API failures write moderationStatus=timed_out, which surfaces in the
   existing admin queue (listFlaggedTours already includes timed_out) and is
   recoverable. Real content flags still write "flagged".

3. writeStatus always wrote phase=done regardless of moderation outcome, so
   the client's phase=flagged check was dead code. Now the final phase
   reflects the actual result so the LoadingPanel and redirect logic agree.

Plus: /recommend added to the chat slash command menu, navigates to the
recommend page with the prompt prefilled via ?prompt= query param.
The per-pick grounded architecture (commit d3f3598) saturates the
same wall-clock window that moderation fires in, pushing structured
Haiku calls past the 5s budget on benign prompts like "70s spiritual
jazz." Result: clean tours got tagged moderationStatus=timed_out and
stayed private, even though content classification would have passed.

Evidence: PostHog recommend_tour_completed events show ~17-18s total
wallclock for failed runs vs ~12-13s for the one success in the same
window. Errors array is sliced to 5; the moderation timeout was
falling off the end of the slice, hiding the real failure.

15s is enough headroom for a structured Haiku call under load without
masking genuine outages. Retry policy unchanged — HaikuTimeoutError
still NOT retried (if 15s isn't enough, two of them won't be either).
Synthesis attempts were failing with "Unexpected non-whitespace character
after JSON at position N" — Haiku returns valid JSON followed by trailing
explanatory prose, which JSON.parse rejects.

Fast-path tries pure JSON.parse on the fence-stripped text. Fallback
slices the outermost { ... } and parses that. Mirrors the brace-slice
pattern already proven in convex/recommend/haikuStructured.ts:
parseJSONFromResponse, but inlined here to avoid forcing convex/wiki.ts
into the Node runtime (it currently runs in V8).
19 of 23 user records had no usernameSlug because they predate the
slugify-on-upsert path. /wiki/[username]/[slug] pages 404'd for everyone
because getBySlug queries the by_username_slug index and got null on
the owner lookup, regardless of page visibility.

backfillUsernameSlugs is idempotent, ordered by createdAt ascending
(oldest user wins the canonical slug), and dedupes collisions with -2 /
-3 suffixes. Run once via: bunx convex run users:backfillUsernameSlugs
After fixing the Haiku JSON-trailing-prose bug, 56 of 96 existing
wiki pages stayed stuck in the unsynthesized state because the original
synthesis attempt failed silently (caught, logged, never retried).

resynthesizeStuckPages re-schedules every page where no section has
lastSynthesizedAt. Uses 2-second stagger to avoid Anthropic rate limits.
Skips already-synthesized pages and archived pages.

Run via: bunx convex run wiki:resynthesizeStuckPages
… drop double cast

Three Boy-Scout cleanups on top of the prior session's fixes — no
behavior change, just shape.

1. resolveModerationOutcome() in convex/recommend/index.ts replaces
   three replicated ternaries (moderationStatus, finalPhase, finalDetail)
   with one outcome resolver. Adding a 4th moderation outcome (e.g.,
   manual-review) is now one branch instead of three edits across two
   files. MODERATION_FAILURE_CATEGORY constant removes the duplicated
   "unknown-moderation-failure" magic string.

2. LIFECYCLE_BY_MODERATION_STATUS lookup table in mutations.ts replaces
   the matching ternary chain in finalizeTour. Schema drift caught by
   `satisfies` instead of falling through to "completed".

3. HAIKU_TIMEOUT_MS, RESYNTHESIZE_STAGGER_MS, PROMPT_MAX_LENGTH —
   named constants for the values introduced last session. Notably
   PROMPT_MAX_LENGTH now lives in one place; the same 400 was repeated
   three times in recommend/page.tsx (initial slice, onChange slice,
   counter display).

4. callHaikuSynthesis: dropped the double `(parsed as Record<string,
   unknown>).sections` cast. parseLooseJSON already returns the right
   shape; let TypeScript see it.

5. Comment rot: "56 stuck pages × 2s = ~2 minute total elapsed" was
   wrong by the next session (actual: 101 pages, ~3.3 min). Replaced
   with the constant + a generic explanation.

Typecheck clean, build clean, no new tests yet (testing is the
outstanding gap from the clean-code review — separate task).
Captured codebase shape and lint state in docs/clean-code-baseline.md
as the input for the systematic clean-code review plan. Then applied
the safe mechanical fixes — no production logic changes.

ESLint config:
- Exclude docs/** from lint scope (planning docs, not production code).
  Eliminates 23 false positives in docs/crate-recommend-feature/*.
- Allow underscore-prefixed unused args/vars/caught-errors (matches
  TypeScript's standard "intentionally unused" convention). Eliminates
  ~10 false-positive no-unused-vars warnings.
- Disable react-hooks/rules-of-hooks for src/lib/openui/components.tsx
  with explanatory comment. The defineComponent({ component: ({props})
  => ... }) factory pattern declares real React components inside an
  object literal, but the rule cannot see them and flags every useState
  call. 24 false positives gone.

Source cleanups:
- Removed unused ClerkProvider import in convex-provider.tsx.
- Escaped 5 unescaped quotes in commands-reference.tsx and
  video-influence-chain.tsx (' → &apos;, " → &ldquo;/&rdquo;).
- Stripped 3 now-redundant eslint-disable-next-line comments left
  behind by eslint --fix in components.tsx.

Result: 168 → 97 lint problems (−71, −42%). Typecheck clean. Build
passes. Remaining 97 are the real debt that needs human judgment —
input to Phase 1 hot-path reviews (no-explicit-any, no-img-element,
genuinely unused symbols).
…ion, qualified-view dwell

Ships the observation rig from the Receipt Truth gated validation sprint
(office-hours design doc 20260428). No new feature, no design changes.
All measurement.

Adds:
- src/lib/creator-id.ts: anonymous crate_creator_id cookie (random 16
  chars, 1 year, SameSite=Lax). Stamped on every PostHog event so the
  gate metric "unique non-Tarik views per posted receipt" can be
  computed at query time by filtering Tarik's known IDs.

- src/hooks/use-active-time.ts: dwell hook that counts active foreground
  ms (pauses on visibilitychange — Reddit's preview pipeline opens many
  hidden tabs). Fires onQualified once at the 5s threshold.

- New PostHog events on /i/[slug]:
  - receipt_generated: fired client-side when receipt.generatedAt is
    within 60s. Imperfect but cheap heuristic; alternative is a schema
    migration to track creator_id on the cache row.
  - receipt_viewed_via_share: fired when the URL carries ?s=<token>.
    PostHog can JOIN this back to the originating receipt_share_click
    event to verify reshare-with-downstream-traffic.
  - receipt_view_qualified: fired at 5s active dwell. THIS is the gate
    metric. receipt_view (raw) still fires on mount.

- ShareButton: generates a 6-char share token, builds /i/[slug]?s=<token>
  URLs (replaces the prior utm_source/utm_medium scheme), and stamps the
  token on all three receipt_share_click variants.

- All existing posthog.capture calls on the receipt page now include
  creator_id (receipt_view, receipt_share_click, receipt_try_another,
  receipt_cta_click).

- ReceiptUI wrapped in Suspense at the page level — required for
  useSearchParams in client components under Next 15.

Behavior unchanged for end users. The "Try another artist..." copy
stays put per founder direction (spec said "see another influence
chain →"; the SearchBox is already that CTA).

Decision rules locked BEFORE data per the sprint anti-criterion. Gate
will be applied on Day 4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant