Improve CLI interaction flow by meanaverage · Pull Request #8 · gobbleyourdong/tsunami

meanaverage · 2026-04-02T23:18:46Z

This PR improves the CLI interaction flow and makes the terminal UI feel much closer to a real shell while staying agent-first.

What’s included:

stronger slash-command autocomplete
clearer /project and /serve help text that matches actual command usage
↑/↓ navigation for suggestions and prompt history recall
Tab and Enter completion behavior that is less confusing in practice
cursor placement fixes after accepting completions
file name and path completion for /attach /unattach
/unattach support for removing attached files
/project del support
Esc and /stop support for stopping active runs without killing the app
't' to toggle a more useful trace-tail view during active runs
persistent CLI prompt history across app restarts
ctrl-c now erases text first if there is text in the prompt, next ctrl-c is true abort

Files changed:

cli/app.jsx
tsunami/server.py
Validation:

CLI rebundled successfully with esbuild after changes
python3 -m py_compile tsunami/server.py passed

Default: 9B wave + 2B eddies (auto-scale to leftover mem) + SD-Turbo Lite: 2B only, no image gen (for 4GB systems) SD-Turbo (2GB) now included in full mode memory budget. README simplified to two modes. 13 scaling tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Smoke test validates full one-click experience: models respond → wave reasons → eddies parallel → image gen → agent e2e 7/8 pass (image gen needs diffusers in system python — now fixed in installer). Setup.sh now installs diffusers+torch+accelerate alongside core deps. One command, everything works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The 9B wave built this page with zero human intervention: - Pure HTML + Tailwind CSS CDN (no build step) - Hero with wave background + typewriter CLI animation - Stats bar, architecture diagram, features grid - Install section with curl command - 258 lines, 12.8KB, oceanic theme 4 LLM calls, 34K tokens, $0.00 (local model). This page was built by the tool it describes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ints - tool_choice: required (Ark: MUST respond with function calling) - Format: NEVER use bullets for deliverables, paragraphs mandatory - Search: description now says use 3 query variants, visit sources - Code: stronger "save to file first" rule, no inline complex code - Skills/waveforms removed — dead code, AGI doesn't need plugins - Disclosure protection already existed (verified) - 607 tests passing, live 9B verified with tool_choice: required Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Solves the monolith problem — instead of the wave writing one huge file, it specs components, dispatches eddies to write each one, then assembles. Eddies return code via done tool (no filesystem write needed). Registered in bootstrap tools. 607 tests passing. Also: Snake game built autonomously by the 9B wave (209 lines). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tool_choice: required overflows the 9B server when combined with 20 tool schemas (2515 tokens) + system prompt (4000 tokens). The prompt rule "MUST respond with exactly one tool call" enforces the same behavior without crashing the server. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Swell: more elegant, more violent. When agents spawn, the swell rises. tide.py → swell.py, tide_analyze → swell_analyze, tide_build → swell_build. README rewritten from scratch — no jargon, no walls of tables. One command install, what it does, how it works, what you need. Written for people who want to use it, not read about it. 607 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

20 tools (2515 tokens) + system prompt overflowed 9B context on complex prompts causing 500 Internal Server Errors. Moved swell_analyze, swell_build, shell_view, plan_advance, file_append to loadable toolbox. 15 tools (1829 tokens) fits comfortably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Break: where the wave meets the shore. Results converge at the break. Full oceanic naming: wave → swell → eddies → break → output. 9B now runs with 32K context (was 16K) — fixes 500 errors on large file generation. Alphabet tracer building. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CRITICAL FIX: Verify loop now in prompt as hard rule: "NEVER call message_result on code you haven't verified. Write → verify → fix → deliver." Snake game: 3 iterations (broken) → 13 iterations (working) after adding verify. This was the gobbleyourdong#1 reason all 4 apps were broken. Also: agent loop nudges "save to files" every 5 iterations if no file writes detected. Prevents context overflow on long tasks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Watcher caught 6 issues in alphabet tracer build that would have shipped broken. Config updated: 9B wave, 8192 max tokens, watcher on by default with 2B eddy at interval 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wave builds → drag tests → wave fixes → drag retests → ship. Static analysis (bracket matching, canvas checks, Three.js checks) + headless Playwright (console errors, page errors, canvas dims). Found pinball bug immediately: "Cannot access scoreDisplay before initialization." Snake, alphabet tracer, node editor all pass. Named "drag" — the undertow that pulls back what's not ready. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…k, undertow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three layers enforce fact-checking: 1. Prompt: mandatory 4-step triangulation (hypothesis → search → cross-ref → deduce) 2. Watcher: 2B checks if message_result has unverified claims 3. Agent loop: triangulation gate blocks delivery of factual claims that were never search-verified. Injects warning, forces verification. From Manus's methodology: parametric memory is unreliable for specifics. External sources win over training data. Always. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Current: measures tension (0.0 grounded → 1.0 hallucinating) - Heuristic: red flags, quality anchors, text patterns - Model probe: 2B eddy evaluates factual reliability - Correction: iterative re-asking until tension drops Circulation: routes based on tension reading - Low → deliver directly - Capability gap → force search - Truth gap → explain contradiction - Critical → refuse ("I don't know") - Post-tool validation: reject results that increase tension Pressure: tracks tension over time - Escalates: calm → moderate → heavy → crushing - Forces search after 2 consecutive high readings - Forces refusal after 4 consecutive high readings Validated against THEGAP.md: correctly flags unverified claims, delivers well-sourced content, forces search on hallucinated stats. 27 new tests, 634 total, all green. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaces hacky keyword triangulation gate with real tension measurement. Before delivery: measure_heuristic scores the response (0.0-1.0). Circulation routes based on score: - deliver (grounded) - force search (elevated tension, no prior search) - refuse (critical tension — say "I don't know") Pressure tracks tension across session, escalates over time. The wave can't hallucinate past the current anymore. 634 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Current/circulation/pressure now checks BOTH: 1. Tool choice: "is this the right tool?" (before execution) 2. Response: "is this grounded?" (before delivery) Pressure tracks consecutive high-tension decisions: 2+ → force search to ground the agent 4+ → force message_ask for user guidance Watcher (2B text reviewer) removed — replaced by the tension system which is more rigorous and doesn't need an extra LLM call. 634 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

duckduckgo_search renamed to ddgs — was returning 0 results. Added arXiv API search for research queries (https, follow redirects). Both tested and working. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- tide→swell throughout (tagline, footer, architecture) - 607→634 tests - Architecture cards: wave/eddies/swell/break/undertow (was flow/tide/whirlpool) - Install URL: github raw (was hallucinated tsunami.ark.sh) - Scaling table: real auto-scaling tiers (was hallucinated linear speedup) - Eddies: up to 32 (was 4) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The undertow tests code. This tests logic. After the wave writes a synthesis with reasoning claims, an eddy receives it as a hostile peer reviewer: "assume this is wrong, find every logical error, scope error, unstated assumption." If the reviewer finds flaws → objections injected back to wave, delivery blocked until addressed. If the reviewer finds no flaws → deliver. Would have caught the tsunami_gap.md errors: - "gap is narrower" (wrong — rescaling moves T³ to R³) - "KNSS proven on R³" (wrong — only axisymmetric cases) - Sobolev chain applied in wrong context (T³ vs R³ after rescaling) Three quality gates now: 1. Current/tension — catches hallucinated facts 2. Undertow — catches broken code 3. Adversarial — catches broken reasoning 634 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Undertow completely rewritten as a lever-puller. The wave provides what to test, the undertow just does it and reports facts. No diagnosis. - Lever types: screenshot, press, click, read_text, console, wait - Auto-generates levers from HTML (every ID, key binding, button) - Eddy vision: screenshot description vs user intent comparison - DOM + pixel diff for interaction levers (catches subtle changes) - Visibility check before clicking (no more 30s timeouts) - code_tension metric (lever fail ratio) feeds into pressure - Research-before-building mandatory in prompt - 500 retryable with backoff in model layer - Delivery gate capped at 2 blocks (was infinite loop) - Info loop detector tightened to 3/6 (was 5/10) - arXiv User-Agent header (fixes 429s) - Current measures prose tension only (undertow measures code) Results: pinball went from black screen (0.62 tension) to rendered 3D table (0.21 tension) in fewer iterations (13 vs 25). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Root now has only run.py (main entry) and serve_diffusion.py (SD-Turbo). All test harnesses, stress tests, and verification scripts in tests/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added sections for current/circulation/pressure (the tension system), the undertow lever-puller architecture, and research-before-building. Includes before/after results: black screen → rendered 3D pinball. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New search_type="code" hits GitHub API for repos and source code. No auth needed (public API). Returns repos sorted by stars with direct links to browse code. Wave reads real implementations instead of hallucinating API calls. Tested: "three.js pinball physics" → found pinball-xr (cannon-es), Three.js forum discussions with working CCD physics examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Left: session/project list (click to switch, + New Session) Center: chat/prompt area Right: workspace with self-contained file tree + code + preview + terminal Code view has its own sidebar showing project files. Click a file to view. Tabs switch between Code/Preview/Terminal. Sessions maintain separate message histories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…o-create Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added overflow:hidden + min-height:0 to chat container and messages. Prevents long message logs from pushing the input off screen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…plicates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

./workspace/deliverables/x, workspace/deliverables/x, deliverables/x all resolve correctly now. Was breaking on ./workspace/ prefix causing 'Need to use relative path' errors in CLI builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Agent loop: - Remove max_iterations hard cap — while True, runs until task_complete - Research gate — nudge search before first write on visual projects - generate_image + vision_ground in bootstrap (19 tools) - Vision grounding: VL model extracts element positions → writes layout.css with ratio-based CSS (aspect-ratio, percentages, resolution-independent) - Auto-ground pipeline: generate_image → Qwen-VL → layout.css → agent uses it - Mid-loop auto-wire: App.tsx wired when 2+ components exist, not just on exit - Reference save: image search results auto-saved to reference.md - Swell compile gate: vite build must pass at delivery for React projects - Skip static HTML undertow for React projects (dev server handles it) - Dedup loop detection: 3+ identical cached calls → nudge different approach - Generate nudge every 12 iterations for visual projects - DDG image search: curl-based fallback bypasses TLS fingerprint detection - generate.py path fix: /workspace/... resolved relative to actual workspace Prompt: - RESEARCH FIRST mandatory, GENERATE ASSETS step, EXTRACT POSITIONS step - COMPARE to reference, no iteration limit, iterate relentlessly Infrastructure: - Dockerfile + docker-compose.yml + docker-entrypoint.sh for containerized runs - serve_daemon.py — persistent dev server like ComfyUI (:9876, auto-detects projects) - cli.py: auto-serve latest deliverable after each task, fix manus→tsunami import Windows: - setup.ps1: VRAM detection (not RAM) for mode selection, matching setup.bat behavior - Pre-built llama-server download (pinned b8628 from ggml-org) — no cmake needed - CUDA DLL bundling alongside llama-server - exit→return so shell doesn't close on error - wave/eddy naming (removed queen/bee references) Scaffolds: - Fix PixiJS SpriteAnimator: BaseTexture removed in v8, use Texture.from() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Double-clicking the .ps1 opens a new window that closes when the script finishes. Added ReadKey pause at end and before early returns so users can read the output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

generate_image now runs SD-Turbo directly in the agent process. Auto-downloads the 2GB model on first use via HuggingFace diffusers. No separate server needed. 1 inference step, <1s on GPU. - generate.py: _try_sd_turbo_local as primary backend, placeholder fallback - serve_diffusion.py: rewritten for SD-Turbo (was Qwen-Image-2512/13GB) - setup.sh: installs diffusers+torch+transformers+accelerate Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The 2B isn't "gimped" — it's the same model acting as wave on lite boxes. All eddy endpoint references now read from config.eddy_endpoint (propagated via TSUNAMI_EDDY_ENDPOINT env var). On lite mode, eddy_endpoint points at the wave's port (:8090) — one server, one model, both roles. - config.py: added eddy_endpoint field - agent.py: propagates config.eddy_endpoint to env var at init - Replaced all hardcoded :8092 / TSUNAMI_BEE_ENDPOINT across 10 files - launcher.py: lite mode starts ONE server, sets TSUNAMI_EDDY_ENDPOINT=:8090 - docker-entrypoint.sh: same lite mode fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When file_write sees useState/useEffect/useRef in a .tsx file without a React import, auto-prepend the import. The 2B (lite mode wave) consistently forgets this — builds pass but runtime crashes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Battle-tested on actual Windows hardware. Key fixes over previous version: - tsu.ps1: model priority chain (27B → MoE → 8B/2B), vision mmproj auto-detect, Windows JSON escaping for --chat-template-kwargs, FastAPI backend on :3000, Node CLI with Python REPL fallback - setup.ps1: encoding and escaping fixes from real Windows debugging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SD-Turbo goes to CPU on tight VRAM (~1min instead of <1s). Worth it for the 9B wave quality over 2B. 8GB cards get the full stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- tsu.ps1: hide llama-server internals from user (no model names, no localhost URLs, no "waiting 120s"). Just "Loading model...", "Starting up...", "Ready". - requirements.txt: added fastapi, uvicorn, websockets, ddgs, pillow as required (were optional/missing, tsu.ps1 backend needs them) - setup.sh: DEPS includes all core packages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- vision_ground removed from bootstrap (17 tools). The 2B calls it repeatedly when no VL model exists. Auto-ground in agent.py still works — it imports the tool directly when generate_image fires. - message_info + message_result strip non-ASCII before printing. Windows cp1252 console crashes on emoji from the 2B. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Both tsu (bash) and tsu.ps1 (Windows) silently check for updates on every launch. Fetch, compare, pull if behind. No user action needed. Offline gracefully ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

meanaverage · 2026-04-02T23:45:44Z

Hello Mr. Gobbleyourdong,

Please see requested PR before other potentially merge conflicting commits, but the surface here is minimal.

gobbleyourdong and others added 30 commits March 31, 2026 18:18

Update README.md

b672d90

Rename drag→riptide — pulls back what shouldn't ship

aaf6587

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename riptide→undertow — pulls back what's not ready

4f5220b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add star on github button to demo page hero

8adf6ef

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update demo CLI animation with real naming: wave, eddies, swell, brea…

94be5e6

…k, undertow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

README: add undertow to flow diagram + glossary

99deb93

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix search: ddgs package + arXiv API backend

f4face9

duckduckgo_search renamed to ddgs — was returning 0 results. Added arXiv API search for research queries (https, follow redirects). Both tested and working. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move test/run scripts to tests/ — clean root

9035e53

Root now has only run.py (main entry) and serve_diffusion.py (SD-Turbo). All test harnesses, stress tests, and verification scripts in tests/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Soften undertow language in README

bd690b1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gobbleyourdong and others added 28 commits April 2, 2026 09:36

Fix watcher: 3s poll, send latest file on connect

57c8ac5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

VS Code layout: file sidebar, scoped to active project

ff5c3a4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Name sessions from first prompt — extracts key words

09c331f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Hide sessions (non-destructive) — X button archives, files stay on disk

838701d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sessions only created on prompt — deleting last shows welcome, no aut…

dbac377

…o-create Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Persist sessions + broadcast + replay buffer

9e28a3a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Deduplicate consecutive chat messages — no more delivery spam

140706a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix chat scroll — input pinned at bottom, messages scroll

4303086

Added overflow:hidden + min-height:0 to chat container and messages. Prevents long message logs from pushing the input off screen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Skip replay buffer if session already loaded from localStorage

90a99b3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Disable replay buffer — localStorage handles it

5d13384

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Render chat from log — full re-render on load/switch, no appending du…

221c0e2

…plicates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wave emoji favicon — no more missing icon

3ac2d81

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Short session names — max 2 words, 20 chars, more stop words

eb704a6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Stronger dedup — skip messages with same first 50 chars

bce18b0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix delivery spam at source — stop streaming after message_result

d6cf334

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Lower full-mode threshold to 8GB — 9B fits in 5.3GB

fef325a

SD-Turbo goes to CPU on tight VRAM (~1min instead of <1s). Worth it for the 9B wave quality over 2B. 8GB cards get the full stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Auto-update on launch — like Claude Code

fd06b24

Both tsu (bash) and tsu.ps1 (Windows) silently check for updates on every launch. Fetch, compare, pull if behind. No user action needed. Offline gracefully ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Improve CLI interaction flow

52a6841

gobbleyourdong force-pushed the main branch from b265759 to c37c7a2 Compare April 11, 2026 04:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CLI interaction flow#8

Improve CLI interaction flow#8
meanaverage wants to merge 185 commits intogobbleyourdong:mainfrom
meanaverage:cli-updates

meanaverage commented Apr 2, 2026 •

edited

Loading

Uh oh!

meanaverage commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

meanaverage commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meanaverage commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

meanaverage commented Apr 2, 2026 •

edited

Loading