Skip to content

Functional acceptance test battery — scripted with intent annotations #20

@arcavenai

Description

@arcavenai

Summary

During alpha testing on a Raspberry Pi (aarch64, tmux 3.5a), a battery of functional acceptance tests was developed through bug discovery and fix verification. These verify the installed binary on real hardware — not unit or integration tests in the codebase.

These should be formalized as a dual-layer test suite:

  1. Intent layer — human/AI-readable description: what is being tested, why it matters, what the user experience should be, what failure looks like. This layer is for agents and humans who need to understand the test, not just run it.
  2. Script layer — executable commands with assertions. Automatable where possible, flagged as manual-only where human judgment is needed (e.g. "does the TUI look right?").

Tests that require subjective evaluation (TUI rendering, persona voice quality) should describe the expected experience so an AI agent or human tester can make the call.

Test environment

  • Linux aarch64 (Raspberry Pi OS)
  • tmux 3.5a
  • Binary installed at ~/.local/bin/aclaude-a
  • No sudo access
  • Terminal: xterm-256color, truecolor, no image protocol (no kitty/sixel)

Group 1: Self-update

1.1 — Version identity

Intent: The binary must report its version, commit, build timestamp, and release channel. This is how users and support identify what's installed. Two formats exist: version (detailed) and --version (compact). Both must agree on the commit hash.

Why it matters: Mismatch between reported version and actual binary caused confusion during early testing — we couldn't tell which build was running. Checksum verification was added to the test workflow because of this. (Issues #1, #2, tmux-cmc#1)

Automatable: Yes

# Verify both version formats report the same commit
FULL=$(aclaude-a version 2>&1)
COMPACT=$(aclaude-a --version 2>&1)
COMMIT_FULL=$(echo "$FULL" | grep 'commit:' | awk '{print $2}')
COMMIT_COMPACT=$(echo "$COMPACT" | grep -oP '[0-9a-f]{7}' | head -1)
[[ "$COMMIT_FULL" == "$COMMIT_COMPACT" ]] || echo "FAIL: commit mismatch"

# Verify channel is reported
echo "$FULL" | grep -q 'channel:' || echo "FAIL: no channel in version output"

# Record checksum for comparison
sha256sum ~/.local/bin/aclaude-a

1.2 — Update cycle

Intent: The self-updater must detect a newer release on the correct channel, download it, replace the binary, and the new binary must report a different commit and have a different checksum. The update is the primary distribution mechanism — if it breaks, users are stuck on old versions forever.

Why it matters: The updater was initially broken — it detected updates but didn't install them (issue #5). Later, it picked stable releases instead of alpha (issue #19). This test catches both failure modes.

Automatable: Partially — requires a newer release to exist

BEFORE_SUM=$(sha256sum ~/.local/bin/aclaude-a | awk '{print $1}')
BEFORE_COMMIT=$(aclaude-a version 2>&1 | grep 'commit:' | awk '{print $2}')

OUTPUT=$(aclaude-a update 2>&1)
echo "$OUTPUT"

AFTER_SUM=$(sha256sum ~/.local/bin/aclaude-a | awk '{print $1}')
AFTER_COMMIT=$(aclaude-a version 2>&1 | grep 'commit:' | awk '{print $2}')

# If update was available, checksums must differ
if echo "$OUTPUT" | grep -q "Updated"; then
    [[ "$BEFORE_SUM" != "$AFTER_SUM" ]] || echo "FAIL: checksum unchanged after update"
    [[ "$BEFORE_COMMIT" != "$AFTER_COMMIT" ]] || echo "FAIL: commit unchanged after update"
    echo "PASS: update installed (${BEFORE_COMMIT} -> ${AFTER_COMMIT})"
fi

1.3 — Channel filtering

Intent: An alpha binary must only see alpha releases. A stable binary must only see stable releases. Cross-channel pollution caused a regression where alpha users received a main-branch build missing develop-branch features (issue #18, caused by #6).

Why it matters: This is the guardrail for the dual-track release model. Without it, alpha users get stable code (missing features) or stable users get alpha code (untested).

Automatable: Yes

OUTPUT=$(aclaude-a update 2>&1)
CHANNEL=$(aclaude-a version 2>&1 | grep 'channel:' | awk '{print $2}')

# The "Latest:" line must match the current channel
if echo "$OUTPUT" | grep -q "Latest:"; then
    LATEST=$(echo "$OUTPUT" | grep "Latest:" | awk '{print $2}')
    if [[ "$CHANNEL" == "alpha" ]]; then
        echo "$LATEST" | grep -q "^alpha-" || echo "FAIL: alpha channel saw non-alpha release: $LATEST"
    elif [[ "$CHANNEL" == "stable" ]]; then
        echo "$LATEST" | grep -q "^stable-" || echo "FAIL: stable channel saw non-stable release: $LATEST"
    fi
fi

Group 2: Session lifecycle

2.1 — Clean start

Intent: Starting a session from a clean state (no tmux server) must create a named session, launch aclaude inside it, and provide a correct reattach command. The reattach hint must use the actual binary name (aclaude-a, not aclaude). This is the first-run experience.

Why it matters: The attach hint was hardcoded to aclaude (issue #11). The send-keys targeted the wrong pane (issue #10), leaving users with a bare bash shell instead of aclaude running.

Automatable: Yes (with --no-attach to avoid interactive tmux)

tmux -L ac kill-server 2>/dev/null
BINARY_NAME=$(basename $(which aclaude-a))
OUTPUT=$(aclaude-a session start --no-attach 2>&1)
echo "$OUTPUT"

# Session was created
echo "$OUTPUT" | grep -q "Creating session" || echo "FAIL: no creation message"

# Attach hint uses correct binary name
echo "$OUTPUT" | grep -q "${BINARY_NAME} session attach" || echo "FAIL: attach hint uses wrong binary name"

# Session exists in tmux
tmux -L ac list-sessions 2>&1 | grep -v "^_ctrl" | grep -q "windows" || echo "FAIL: no user session found"

2.2 — Session listing

Intent: Users must be able to discover their sessions without knowing tmux commands. list shows full details, --names shows just names (for scripting), --all includes internal control sessions. Users should never need tmux -L ac ls.

Why it matters: Before these commands existed, users had to know about the -L ac socket (issue #9). These were part of a regression in build 69453ca (issue #18).

Automatable: Yes

# Full list
aclaude-a session list 2>&1 | grep -q "windows" || echo "FAIL: list shows no sessions"

# Names only
NAMES=$(aclaude-a session list --names 2>&1)
[[ -n "$NAMES" ]] || echo "FAIL: --names is empty"

# --all includes control sessions (only if multiple user sessions exist)
aclaude-a session start --no-attach -t second 2>/dev/null
ALL=$(aclaude-a session list --all 2>&1)
echo "$ALL" | grep -q "_ctrl" || echo "FAIL: --all doesn't show control sessions"
aclaude-a session stop -t second 2>/dev/null

2.3 — Session status

Intent: Status gives a formatted table with session name, window count, creation timestamp, and state (attached/detached). The CREATED column must show an actual timestamp. Control sessions are hidden by default with a count footer.

Why it matters: CREATED column was empty (issue #16). Control sessions cluttered the output (issue #12).

Automatable: Yes

OUTPUT=$(aclaude-a session status 2>&1)
echo "$OUTPUT"

# Header present
echo "$OUTPUT" | head -1 | grep -q "SESSION" || echo "FAIL: no header"

# CREATED column has a timestamp (YYYY-MM-DD HH:MM:SS pattern)
echo "$OUTPUT" | grep -qP '\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}' || echo "FAIL: no timestamp in CREATED"

# Control sessions hidden
echo "$OUTPUT" | grep -qv "^_ctrl" || echo "FAIL: _ctrl visible without --all"

2.4 — Detach/reattach

Intent: A user must be able to detach from a session (Ctrl-b d) and reattach later. The session persists across detach. This is fundamental to the tmux value proposition — sessions survive disconnection.

Why it matters: This is core tmux behavior surfaced through aclaude commands. Verified working.

Automatable: No — requires interactive terminal for Ctrl-b d. Manual test only.

Manual steps:

  1. aclaude-a session start (attaches)
  2. Press Ctrl-b d (detaches, returns to shell)
  3. aclaude-a session attach (reattaches)
  4. Verify: same session, same state, aclaude still running

2.5 — Stop individual session

Intent: Stopping a named session must kill only that session. Other sessions and the control session must be unaffected.

Automatable: Yes

aclaude-a session start --no-attach -t keep 2>/dev/null
aclaude-a session start --no-attach -t remove 2>/dev/null
aclaude-a session stop -t remove
aclaude-a session list --names 2>&1 | grep -q "keep" || echo "FAIL: keep session was removed"
aclaude-a session list --names 2>&1 | grep -q "remove" && echo "FAIL: remove session still exists"
aclaude-a session stop -t keep 2>/dev/null

2.6 — Stop all

Intent: session stop --all must kill all user sessions, all control sessions, and the tmux server. Clean slate. The user should not need to know tmux -L ac kill-server.

Why it matters: Verified working since early builds. This is the cleanup command.

Automatable: Yes

aclaude-a session start --no-attach 2>/dev/null
aclaude-a session stop --all
tmux -L ac list-sessions 2>&1 | grep -q "no server" || echo "FAIL: tmux server still running"

Group 3: Control session behavior

3.1 — Single session creates no control session

Intent: When only one user session exists, tmux-cmc should reuse that session's server without creating a separate _ctrl session. This reduces clutter and resource usage.

Why it matters: Originally every session created its own control session 1:1 (issue #15). For marvel managing dozens of agents, this doesn't scale.

Automatable: Yes

tmux -L ac kill-server 2>/dev/null
aclaude-a session start --no-attach
COUNT=$(tmux -L ac list-sessions 2>&1 | wc -l)
[[ "$COUNT" -eq 1 ]] || echo "FAIL: expected 1 session, got $COUNT"
tmux -L ac list-sessions 2>&1 | grep -q "_ctrl" && echo "FAIL: _ctrl exists with single session"
aclaude-a session stop --all 2>/dev/null

3.2 — Multiple sessions share one control session

Intent: When a second session is created, a single shared _ctrl session appears. Not one per user session. This is the scalability design for marvel.

Automatable: Yes

tmux -L ac kill-server 2>/dev/null
aclaude-a session start --no-attach -t first
aclaude-a session start --no-attach -t second
CTRL_COUNT=$(tmux -L ac list-sessions 2>&1 | grep "^_ctrl" | wc -l)
[[ "$CTRL_COUNT" -eq 1 ]] || echo "FAIL: expected 1 _ctrl, got $CTRL_COUNT"
aclaude-a session stop --all 2>/dev/null

3.3 — Control session is idle

Intent: The control session must not run a shell profile, Claude Code, or aclaude. It is internal plumbing — a tmux-cmc protocol endpoint only.

Why it matters: Early builds ran Claude Code in the control session (issue #8), wasting an API connection. Later, send-keys targeted the control session instead of the user session (issue #10), launching aclaude in the wrong place.

Automatable: Partially — requires judgment on "idle"

# Create two sessions to force _ctrl
tmux -L ac kill-server 2>/dev/null
aclaude-a session start --no-attach -t first
aclaude-a session start --no-attach -t second
CTRL_CONTENT=$(tmux -L ac capture-pane -t _ctrl -p 2>&1)

# Should be empty or minimal — NOT showing Claude Code TUI
echo "$CTRL_CONTENT" | grep -q "Claude Code" && echo "FAIL: Claude Code running in _ctrl"
echo "$CTRL_CONTENT" | grep -q "Welcome" && echo "FAIL: aclaude running in _ctrl"
aclaude-a session stop --all 2>/dev/null

3.4 — Cleanup removes control session

Intent: When the last user session is stopped, the control session and tmux server must be cleaned up. No orphaned tmux processes.

Automatable: Yes — covered by test 2.6


Group 4: Session content

4.1 — aclaude launches in the correct pane

Intent: After session start, the user's session pane must be running aclaude (which spawns Claude Code). The tmux status bar should show the binary name. This is the whole point — the user gets an aclaude session, not a bare shell.

Why it matters: send-keys targeted the wrong pane (issue #10), leaving users with bash. This was the most user-visible bug.

Automatable: Partially — TUI rendering requires judgment

tmux -L ac kill-server 2>/dev/null
aclaude-a session start --no-attach
SESSION=$(aclaude-a session list --names 2>&1 | head -1)
sleep 3  # give aclaude time to start
CONTENT=$(tmux -L ac capture-pane -t "$SESSION" -p 2>&1)

# Should show Claude Code TUI indicators
echo "$CONTENT" | grep -q "Claude Code" || echo "WARN: Claude Code TUI not visible (may need more startup time)"
aclaude-a session stop --all 2>/dev/null

Manual verification: Attach to the session. You should see the Claude Code TUI with the persona prompt, not a bash shell. The tmux status bar should show aclaude | 0:aclaude-a*.

4.2 — Persona voice in session

Intent: When aclaude launches with a persona theme configured, the AI's responses should reflect that persona. The default theme is the-expanse with role dev — responses should use Belter Creole inflections ("Sa, bossmang", "Cooked for...").

Why it matters: Persona theming is aclaude's primary differentiator from vanilla Claude Code.

Automatable: No — requires subjective judgment on voice quality. AI agent or human tester should evaluate.

Manual steps:

  1. Attach to an aclaude session
  2. Ask a question
  3. Verify: response has persona flavor, not generic Claude voice

Group 5: One-shot prompt

5.1 — Text output (default)

Intent: aclaude-a -p "prompt" must output human-readable text, not raw JSON. This matches Claude Code's own -p behavior. JSON output requires an explicit flag.

Why it matters: Initially returned raw NDJSON (issue #13), making -p useless for quick shell queries.

Automatable: Yes

OUTPUT=$(aclaude-a -p "say the word hello" 2>&1)
# Should be plain text, not JSON
echo "$OUTPUT" | grep -q "^{" && echo "FAIL: -p returned JSON instead of text"
# Should contain a response (not empty)
[[ -n "$OUTPUT" ]] || echo "FAIL: empty response"

5.2 — JSON output (explicit)

Intent: --output-format json must return the full NDJSON response with type, result, usage, cost fields. This is for programmatic use.

Automatable: Yes

OUTPUT=$(aclaude-a -p "say hello" --output-format json 2>&1)
echo "$OUTPUT" | grep -q '"type":"result"' || echo "FAIL: no JSON result object"
echo "$OUTPUT" | grep -q '"usage"' || echo "FAIL: no usage data in JSON"

Group 6: Configuration

6.1 — Config display

Intent: aclaude-a config must show the resolved configuration — all layers merged (defaults → global → local → env → CLI). Must show config file paths so the user knows where to edit.

Automatable: Yes

OUTPUT=$(aclaude-a config 2>&1)
echo "$OUTPUT" | grep -q "Config paths:" || echo "FAIL: no config paths"
echo "$OUTPUT" | grep -q "[tmux]" || echo "FAIL: no tmux config section"
echo "$OUTPUT" | grep -q "socket" || echo "FAIL: no socket in tmux config"

6.2 — Persona list

Intent: aclaude-a persona list must show available themes with descriptions. This is how users discover what personas are available.

Automatable: Yes

OUTPUT=$(aclaude-a persona list 2>&1)
# Should contain known themes
echo "$OUTPUT" | grep -q "the-expanse" || echo "FAIL: the-expanse not in persona list"
# Should have multiple themes (at least 10)
COUNT=$(echo "$OUTPUT" | wc -l)
[[ "$COUNT" -gt 10 ]] || echo "FAIL: only $COUNT themes listed"

Group 7: tmux-cmc protocol (diagnostic)

7.1 — Debug trace

Intent: TMUX_CMC_DEBUG=1 must produce a line-by-line protocol trace on stderr. This is the primary diagnostic tool for tmux-cmc issues. Each line shows the parsed text and raw bytes.

Why it matters: This is how we diagnosed the serial mismatch (tmux-cmc#2), DCS prefix (tmux-cmc#1), and pty echo issues. Without it, failures present as opaque timeouts.

Automatable: Yes

tmux -L ac kill-server 2>/dev/null
OUTPUT=$(TMUX_CMC_DEBUG=1 aclaude-a session start --no-attach 2>&1)

# Debug lines present
echo "$OUTPUT" | grep -q "\[tmux-cmc\]" || echo "FAIL: no debug output"

# Handshake visible
echo "$OUTPUT" | grep -q "%begin" || echo "FAIL: no %begin in trace"
echo "$OUTPUT" | grep -q "%end" || echo "FAIL: no %end in trace"

aclaude-a session stop --all 2>/dev/null

7.2 — No tcgetattr regression

Intent: The original bug — tmux control mode failing with tcgetattr failed: Inappropriate ioctl for device because stdin was a pipe instead of a pty. This must never regress.

Why it matters: This was the first bug discovered (tmux-cmc#1) and blocked all tmux functionality. The fix required spawning tmux with a pty.

Automatable: Yes

tmux -L ac kill-server 2>/dev/null
OUTPUT=$(TMUX_CMC_DEBUG=1 aclaude-a session start --no-attach 2>&1)
echo "$OUTPUT" | grep -q "tcgetattr" && echo "FAIL: tcgetattr regression — pty fix broken"
echo "$OUTPUT" | grep -q "Session ready" || echo "FAIL: session not created"
aclaude-a session stop --all 2>/dev/null

Design notes

Two audiences

Each test has:

  • Intent — what it verifies, why it exists, what issue it traces to. Readable by humans and AI agents who need to understand whether a failure is expected, a regression, or a new bug.
  • Script — executable assertions. Run by CI, a justfile recipe, or an AI agent doing automated verification.

Manual vs automated

Tests marked Automatable: No require human or AI judgment — TUI rendering, persona voice quality, interactive terminal behavior. These should have descriptive acceptance criteria so an agent can evaluate them.

Regression anchors

Every test traces to at least one issue. The issue provides the full discovery context — what went wrong, what the debug output looked like, what the fix was. If a test fails, the linked issue tells you where to look.

Origin

Developed during alpha testing sessions (2026-04-02 through 2026-04-05) on Raspberry Pi. Issues #1 through #19 across ArcavenAE/aclaude and ArcavenAE/tmux-cmc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions