Functional acceptance test battery — scripted with intent annotations

## Summary

During alpha testing on a Raspberry Pi (aarch64, tmux 3.5a), a battery of functional acceptance tests was developed through bug discovery and fix verification. These verify the installed binary on real hardware — not unit or integration tests in the codebase.

These should be formalized as a **dual-layer test suite**:

1. **Intent layer** — human/AI-readable description: what is being tested, why it matters, what the user experience should be, what failure looks like. This layer is for agents and humans who need to understand the test, not just run it.
2. **Script layer** — executable commands with assertions. Automatable where possible, flagged as manual-only where human judgment is needed (e.g. "does the TUI look right?").

Tests that require subjective evaluation (TUI rendering, persona voice quality) should describe the expected experience so an AI agent or human tester can make the call.

## Test environment

- Linux aarch64 (Raspberry Pi OS)
- tmux 3.5a
- Binary installed at `~/.local/bin/aclaude-a`
- No sudo access
- Terminal: xterm-256color, truecolor, no image protocol (no kitty/sixel)

---

## Group 1: Self-update

### 1.1 — Version identity

**Intent:** The binary must report its version, commit, build timestamp, and release channel. This is how users and support identify what's installed. Two formats exist: `version` (detailed) and `--version` (compact). Both must agree on the commit hash.

**Why it matters:** Mismatch between reported version and actual binary caused confusion during early testing — we couldn't tell which build was running. Checksum verification was added to the test workflow because of this. (Issues #1, #2, tmux-cmc#1)

**Automatable:** Yes

```bash
# Verify both version formats report the same commit
FULL=$(aclaude-a version 2>&1)
COMPACT=$(aclaude-a --version 2>&1)
COMMIT_FULL=$(echo "$FULL" | grep 'commit:' | awk '{print $2}')
COMMIT_COMPACT=$(echo "$COMPACT" | grep -oP '[0-9a-f]{7}' | head -1)
[[ "$COMMIT_FULL" == "$COMMIT_COMPACT" ]] || echo "FAIL: commit mismatch"

# Verify channel is reported
echo "$FULL" | grep -q 'channel:' || echo "FAIL: no channel in version output"

# Record checksum for comparison
sha256sum ~/.local/bin/aclaude-a
```

### 1.2 — Update cycle

**Intent:** The self-updater must detect a newer release on the correct channel, download it, replace the binary, and the new binary must report a different commit and have a different checksum. The update is the primary distribution mechanism — if it breaks, users are stuck on old versions forever.

**Why it matters:** The updater was initially broken — it detected updates but didn't install them (issue #5). Later, it picked stable releases instead of alpha (issue #19). This test catches both failure modes.

**Automatable:** Partially — requires a newer release to exist

```bash
BEFORE_SUM=$(sha256sum ~/.local/bin/aclaude-a | awk '{print $1}')
BEFORE_COMMIT=$(aclaude-a version 2>&1 | grep 'commit:' | awk '{print $2}')

OUTPUT=$(aclaude-a update 2>&1)
echo "$OUTPUT"

AFTER_SUM=$(sha256sum ~/.local/bin/aclaude-a | awk '{print $1}')
AFTER_COMMIT=$(aclaude-a version 2>&1 | grep 'commit:' | awk '{print $2}')

# If update was available, checksums must differ
if echo "$OUTPUT" | grep -q "Updated"; then
    [[ "$BEFORE_SUM" != "$AFTER_SUM" ]] || echo "FAIL: checksum unchanged after update"
    [[ "$BEFORE_COMMIT" != "$AFTER_COMMIT" ]] || echo "FAIL: commit unchanged after update"
    echo "PASS: update installed (${BEFORE_COMMIT} -> ${AFTER_COMMIT})"
fi
```

### 1.3 — Channel filtering

**Intent:** An alpha binary must only see alpha releases. A stable binary must only see stable releases. Cross-channel pollution caused a regression where alpha users received a main-branch build missing develop-branch features (issue #18, caused by #6).

**Why it matters:** This is the guardrail for the dual-track release model. Without it, alpha users get stable code (missing features) or stable users get alpha code (untested).

**Automatable:** Yes

```bash
OUTPUT=$(aclaude-a update 2>&1)
CHANNEL=$(aclaude-a version 2>&1 | grep 'channel:' | awk '{print $2}')

# The "Latest:" line must match the current channel
if echo "$OUTPUT" | grep -q "Latest:"; then
    LATEST=$(echo "$OUTPUT" | grep "Latest:" | awk '{print $2}')
    if [[ "$CHANNEL" == "alpha" ]]; then
        echo "$LATEST" | grep -q "^alpha-" || echo "FAIL: alpha channel saw non-alpha release: $LATEST"
    elif [[ "$CHANNEL" == "stable" ]]; then
        echo "$LATEST" | grep -q "^stable-" || echo "FAIL: stable channel saw non-stable release: $LATEST"
    fi
fi
```

---

## Group 2: Session lifecycle

### 2.1 — Clean start

**Intent:** Starting a session from a clean state (no tmux server) must create a named session, launch aclaude inside it, and provide a correct reattach command. The reattach hint must use the actual binary name (`aclaude-a`, not `aclaude`). This is the first-run experience.

**Why it matters:** The attach hint was hardcoded to `aclaude` (issue #11). The send-keys targeted the wrong pane (issue #10), leaving users with a bare bash shell instead of aclaude running.

**Automatable:** Yes (with `--no-attach` to avoid interactive tmux)

```bash
tmux -L ac kill-server 2>/dev/null
BINARY_NAME=$(basename $(which aclaude-a))
OUTPUT=$(aclaude-a session start --no-attach 2>&1)
echo "$OUTPUT"

# Session was created
echo "$OUTPUT" | grep -q "Creating session" || echo "FAIL: no creation message"

# Attach hint uses correct binary name
echo "$OUTPUT" | grep -q "${BINARY_NAME} session attach" || echo "FAIL: attach hint uses wrong binary name"

# Session exists in tmux
tmux -L ac list-sessions 2>&1 | grep -v "^_ctrl" | grep -q "windows" || echo "FAIL: no user session found"
```

### 2.2 — Session listing

**Intent:** Users must be able to discover their sessions without knowing tmux commands. `list` shows full details, `--names` shows just names (for scripting), `--all` includes internal control sessions. Users should never need `tmux -L ac ls`.

**Why it matters:** Before these commands existed, users had to know about the `-L ac` socket (issue #9). These were part of a regression in build 69453ca (issue #18).

**Automatable:** Yes

```bash
# Full list
aclaude-a session list 2>&1 | grep -q "windows" || echo "FAIL: list shows no sessions"

# Names only
NAMES=$(aclaude-a session list --names 2>&1)
[[ -n "$NAMES" ]] || echo "FAIL: --names is empty"

# --all includes control sessions (only if multiple user sessions exist)
aclaude-a session start --no-attach -t second 2>/dev/null
ALL=$(aclaude-a session list --all 2>&1)
echo "$ALL" | grep -q "_ctrl" || echo "FAIL: --all doesn't show control sessions"
aclaude-a session stop -t second 2>/dev/null
```

### 2.3 — Session status

**Intent:** Status gives a formatted table with session name, window count, creation timestamp, and state (attached/detached). The CREATED column must show an actual timestamp. Control sessions are hidden by default with a count footer.

**Why it matters:** CREATED column was empty (issue #16). Control sessions cluttered the output (issue #12).

**Automatable:** Yes

```bash
OUTPUT=$(aclaude-a session status 2>&1)
echo "$OUTPUT"

# Header present
echo "$OUTPUT" | head -1 | grep -q "SESSION" || echo "FAIL: no header"

# CREATED column has a timestamp (YYYY-MM-DD HH:MM:SS pattern)
echo "$OUTPUT" | grep -qP '\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}' || echo "FAIL: no timestamp in CREATED"

# Control sessions hidden
echo "$OUTPUT" | grep -qv "^_ctrl" || echo "FAIL: _ctrl visible without --all"
```

### 2.4 — Detach/reattach

**Intent:** A user must be able to detach from a session (Ctrl-b d) and reattach later. The session persists across detach. This is fundamental to the tmux value proposition — sessions survive disconnection.

**Why it matters:** This is core tmux behavior surfaced through aclaude commands. Verified working.

**Automatable:** No — requires interactive terminal for Ctrl-b d. Manual test only.

**Manual steps:**
1. `aclaude-a session start` (attaches)
2. Press `Ctrl-b d` (detaches, returns to shell)
3. `aclaude-a session attach` (reattaches)
4. Verify: same session, same state, aclaude still running

### 2.5 — Stop individual session

**Intent:** Stopping a named session must kill only that session. Other sessions and the control session must be unaffected.

**Automatable:** Yes

```bash
aclaude-a session start --no-attach -t keep 2>/dev/null
aclaude-a session start --no-attach -t remove 2>/dev/null
aclaude-a session stop -t remove
aclaude-a session list --names 2>&1 | grep -q "keep" || echo "FAIL: keep session was removed"
aclaude-a session list --names 2>&1 | grep -q "remove" && echo "FAIL: remove session still exists"
aclaude-a session stop -t keep 2>/dev/null
```

### 2.6 — Stop all

**Intent:** `session stop --all` must kill all user sessions, all control sessions, and the tmux server. Clean slate. The user should not need to know `tmux -L ac kill-server`.

**Why it matters:** Verified working since early builds. This is the cleanup command.

**Automatable:** Yes

```bash
aclaude-a session start --no-attach 2>/dev/null
aclaude-a session stop --all
tmux -L ac list-sessions 2>&1 | grep -q "no server" || echo "FAIL: tmux server still running"
```

---

## Group 3: Control session behavior

### 3.1 — Single session creates no control session

**Intent:** When only one user session exists, tmux-cmc should reuse that session's server without creating a separate `_ctrl` session. This reduces clutter and resource usage.

**Why it matters:** Originally every session created its own control session 1:1 (issue #15). For marvel managing dozens of agents, this doesn't scale.

**Automatable:** Yes

```bash
tmux -L ac kill-server 2>/dev/null
aclaude-a session start --no-attach
COUNT=$(tmux -L ac list-sessions 2>&1 | wc -l)
[[ "$COUNT" -eq 1 ]] || echo "FAIL: expected 1 session, got $COUNT"
tmux -L ac list-sessions 2>&1 | grep -q "_ctrl" && echo "FAIL: _ctrl exists with single session"
aclaude-a session stop --all 2>/dev/null
```

### 3.2 — Multiple sessions share one control session

**Intent:** When a second session is created, a single shared `_ctrl` session appears. Not one per user session. This is the scalability design for marvel.

**Automatable:** Yes

```bash
tmux -L ac kill-server 2>/dev/null
aclaude-a session start --no-attach -t first
aclaude-a session start --no-attach -t second
CTRL_COUNT=$(tmux -L ac list-sessions 2>&1 | grep "^_ctrl" | wc -l)
[[ "$CTRL_COUNT" -eq 1 ]] || echo "FAIL: expected 1 _ctrl, got $CTRL_COUNT"
aclaude-a session stop --all 2>/dev/null
```

### 3.3 — Control session is idle

**Intent:** The control session must not run a shell profile, Claude Code, or aclaude. It is internal plumbing — a tmux-cmc protocol endpoint only.

**Why it matters:** Early builds ran Claude Code in the control session (issue #8), wasting an API connection. Later, send-keys targeted the control session instead of the user session (issue #10), launching aclaude in the wrong place.

**Automatable:** Partially — requires judgment on "idle"

```bash
# Create two sessions to force _ctrl
tmux -L ac kill-server 2>/dev/null
aclaude-a session start --no-attach -t first
aclaude-a session start --no-attach -t second
CTRL_CONTENT=$(tmux -L ac capture-pane -t _ctrl -p 2>&1)

# Should be empty or minimal — NOT showing Claude Code TUI
echo "$CTRL_CONTENT" | grep -q "Claude Code" && echo "FAIL: Claude Code running in _ctrl"
echo "$CTRL_CONTENT" | grep -q "Welcome" && echo "FAIL: aclaude running in _ctrl"
aclaude-a session stop --all 2>/dev/null
```

### 3.4 — Cleanup removes control session

**Intent:** When the last user session is stopped, the control session and tmux server must be cleaned up. No orphaned tmux processes.

**Automatable:** Yes — covered by test 2.6

---

## Group 4: Session content

### 4.1 — aclaude launches in the correct pane

**Intent:** After `session start`, the user's session pane must be running aclaude (which spawns Claude Code). The tmux status bar should show the binary name. This is the whole point — the user gets an aclaude session, not a bare shell.

**Why it matters:** send-keys targeted the wrong pane (issue #10), leaving users with bash. This was the most user-visible bug.

**Automatable:** Partially — TUI rendering requires judgment

```bash
tmux -L ac kill-server 2>/dev/null
aclaude-a session start --no-attach
SESSION=$(aclaude-a session list --names 2>&1 | head -1)
sleep 3  # give aclaude time to start
CONTENT=$(tmux -L ac capture-pane -t "$SESSION" -p 2>&1)

# Should show Claude Code TUI indicators
echo "$CONTENT" | grep -q "Claude Code" || echo "WARN: Claude Code TUI not visible (may need more startup time)"
aclaude-a session stop --all 2>/dev/null
```

**Manual verification:** Attach to the session. You should see the Claude Code TUI with the persona prompt, not a bash shell. The tmux status bar should show `aclaude | 0:aclaude-a*`.

### 4.2 — Persona voice in session

**Intent:** When aclaude launches with a persona theme configured, the AI's responses should reflect that persona. The default theme is `the-expanse` with role `dev` — responses should use Belter Creole inflections ("Sa, bossmang", "Cooked for...").

**Why it matters:** Persona theming is aclaude's primary differentiator from vanilla Claude Code.

**Automatable:** No — requires subjective judgment on voice quality. AI agent or human tester should evaluate.

**Manual steps:**
1. Attach to an aclaude session
2. Ask a question
3. Verify: response has persona flavor, not generic Claude voice

---

## Group 5: One-shot prompt

### 5.1 — Text output (default)

**Intent:** `aclaude-a -p "prompt"` must output human-readable text, not raw JSON. This matches Claude Code's own `-p` behavior. JSON output requires an explicit flag.

**Why it matters:** Initially returned raw NDJSON (issue #13), making `-p` useless for quick shell queries.

**Automatable:** Yes

```bash
OUTPUT=$(aclaude-a -p "say the word hello" 2>&1)
# Should be plain text, not JSON
echo "$OUTPUT" | grep -q "^{" && echo "FAIL: -p returned JSON instead of text"
# Should contain a response (not empty)
[[ -n "$OUTPUT" ]] || echo "FAIL: empty response"
```

### 5.2 — JSON output (explicit)

**Intent:** `--output-format json` must return the full NDJSON response with type, result, usage, cost fields. This is for programmatic use.

**Automatable:** Yes

```bash
OUTPUT=$(aclaude-a -p "say hello" --output-format json 2>&1)
echo "$OUTPUT" | grep -q '"type":"result"' || echo "FAIL: no JSON result object"
echo "$OUTPUT" | grep -q '"usage"' || echo "FAIL: no usage data in JSON"
```

---

## Group 6: Configuration

### 6.1 — Config display

**Intent:** `aclaude-a config` must show the resolved configuration — all layers merged (defaults → global → local → env → CLI). Must show config file paths so the user knows where to edit.

**Automatable:** Yes

```bash
OUTPUT=$(aclaude-a config 2>&1)
echo "$OUTPUT" | grep -q "Config paths:" || echo "FAIL: no config paths"
echo "$OUTPUT" | grep -q "[tmux]" || echo "FAIL: no tmux config section"
echo "$OUTPUT" | grep -q "socket" || echo "FAIL: no socket in tmux config"
```

### 6.2 — Persona list

**Intent:** `aclaude-a persona list` must show available themes with descriptions. This is how users discover what personas are available.

**Automatable:** Yes

```bash
OUTPUT=$(aclaude-a persona list 2>&1)
# Should contain known themes
echo "$OUTPUT" | grep -q "the-expanse" || echo "FAIL: the-expanse not in persona list"
# Should have multiple themes (at least 10)
COUNT=$(echo "$OUTPUT" | wc -l)
[[ "$COUNT" -gt 10 ]] || echo "FAIL: only $COUNT themes listed"
```

---

## Group 7: tmux-cmc protocol (diagnostic)

### 7.1 — Debug trace

**Intent:** `TMUX_CMC_DEBUG=1` must produce a line-by-line protocol trace on stderr. This is the primary diagnostic tool for tmux-cmc issues. Each line shows the parsed text and raw bytes.

**Why it matters:** This is how we diagnosed the serial mismatch (tmux-cmc#2), DCS prefix (tmux-cmc#1), and pty echo issues. Without it, failures present as opaque timeouts.

**Automatable:** Yes

```bash
tmux -L ac kill-server 2>/dev/null
OUTPUT=$(TMUX_CMC_DEBUG=1 aclaude-a session start --no-attach 2>&1)

# Debug lines present
echo "$OUTPUT" | grep -q "\[tmux-cmc\]" || echo "FAIL: no debug output"

# Handshake visible
echo "$OUTPUT" | grep -q "%begin" || echo "FAIL: no %begin in trace"
echo "$OUTPUT" | grep -q "%end" || echo "FAIL: no %end in trace"

aclaude-a session stop --all 2>/dev/null
```

### 7.2 — No tcgetattr regression

**Intent:** The original bug — tmux control mode failing with `tcgetattr failed: Inappropriate ioctl for device` because stdin was a pipe instead of a pty. This must never regress.

**Why it matters:** This was the first bug discovered (tmux-cmc#1) and blocked all tmux functionality. The fix required spawning tmux with a pty.

**Automatable:** Yes

```bash
tmux -L ac kill-server 2>/dev/null
OUTPUT=$(TMUX_CMC_DEBUG=1 aclaude-a session start --no-attach 2>&1)
echo "$OUTPUT" | grep -q "tcgetattr" && echo "FAIL: tcgetattr regression — pty fix broken"
echo "$OUTPUT" | grep -q "Session ready" || echo "FAIL: session not created"
aclaude-a session stop --all 2>/dev/null
```

---

## Design notes

### Two audiences

Each test has:
- **Intent** — what it verifies, why it exists, what issue it traces to. Readable by humans and AI agents who need to understand whether a failure is expected, a regression, or a new bug.
- **Script** — executable assertions. Run by CI, a justfile recipe, or an AI agent doing automated verification.

### Manual vs automated

Tests marked **Automatable: No** require human or AI judgment — TUI rendering, persona voice quality, interactive terminal behavior. These should have descriptive acceptance criteria so an agent can evaluate them.

### Regression anchors

Every test traces to at least one issue. The issue provides the full discovery context — what went wrong, what the debug output looked like, what the fix was. If a test fails, the linked issue tells you where to look.

## Origin

Developed during alpha testing sessions (2026-04-02 through 2026-04-05) on Raspberry Pi. Issues #1 through #19 across ArcavenAE/aclaude and ArcavenAE/tmux-cmc.

Functional acceptance test battery — scripted with intent annotations #20

Description

Summary

Test environment

Group 1: Self-update

1.1 — Version identity

1.2 — Update cycle

1.3 — Channel filtering

Group 2: Session lifecycle

2.1 — Clean start

2.2 — Session listing

2.3 — Session status

2.4 — Detach/reattach

2.5 — Stop individual session

2.6 — Stop all

Group 3: Control session behavior

3.1 — Single session creates no control session

3.2 — Multiple sessions share one control session

3.3 — Control session is idle

3.4 — Cleanup removes control session

Group 4: Session content

4.1 — aclaude launches in the correct pane

4.2 — Persona voice in session

Group 5: One-shot prompt

5.1 — Text output (default)

5.2 — JSON output (explicit)

Group 6: Configuration

6.1 — Config display

6.2 — Persona list

Group 7: tmux-cmc protocol (diagnostic)

7.1 — Debug trace

7.2 — No tcgetattr regression

Design notes

Two audiences

Manual vs automated

Regression anchors

Origin

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions