Add a TESTING.md documenting how to run azlin tests. by rysweet · Pull Request #926 · rysweet/azlin

rysweet · 2026-03-30T16:08:22Z

Summary

Add a TESTING.md documenting how to run azlin tests.

Issue

Closes #925

Changes

{"components":[{"action":"create","name":"TESTING.md","purpose":"Root-level test running guide covering all ~3,000 tests"}],"files_to_change":[],"implementation_order":["1. Create TESTING.md with Quick Start, unit/handler/integration/E2E/agentic sections","2. Verify no secrets leaked in documentation","3. Commit"],"new_files":["TESTING.md"],"risks":["Test counts (68 unit groups, 13 handler groups, ~3,000 total) may drift as code evolves — periodic refresh needed"],"security_considerations":["Doc references env var names (ANTHROPIC_API_KEY) but never actual values — verified clean"],"test_files":[]}

Testing

Unit tests added
Local testing completed
Pre-commit hooks pass

Checklist

This PR was created as a draft for review before merging.

Step 16b: Outside-In Testing Results

Scenario 1 — Verify `cargo test --all` matches documented counts

Command: cd rust && cargo test --all 2>&1 | grep "^test result:"
Result: PASS
Output: 2,996 passed; 0 failed; 114 ignored — matches updated TESTING.md claims (~3,000 / ~114)

Scenario 2 — Verify all linked documentation files exist

Command: for f in <8 doc paths>; do test -f "$f" && echo EXISTS; done
Result: PASS
Output: All 8 linked doc files exist (TEST_SUITE_SPECIFICATION.md, AGENTIC_INTEGRATION_TESTS.md, etc.)

Scenario 3 — Verify integration test file listing accuracy

Command: ls rust/crates/azlin/tests/*.rs | wc -l
Result: PASS
Output: 18 files found, matching all 18 files listed in TESTING.md table

Scenario 4 — Verify .cargo/config.toml RUST_MIN_STACK claim

Command: cat rust/.cargo/config.toml
Result: PASS
Output: RUST_MIN_STACK = "8388608" confirmed

Fix iterations: 1 (corrected test group counts: 69→70, 14→15; total ~~2,950→~~3,000)

github-actions · 2026-03-30T16:12:27Z

PR Review

PR: Add a TESTING.md documenting how to run azlin tests
Type: Documentation + Tooling
Author: @rysweet

Note: This is not a dependency update PR, so the standard dependency review template does not apply. Providing a general code review instead.

Summary of Changes (12 files, +647/-391)

File	Change	Purpose
`TESTING.md`	Added (229 lines)	Root-level test running guide
`scripts/test_testing_docs.sh`	Added (215 lines)	Validation script to catch doc drift
`.github/hooks/amplihack-hooks.json`	Added	Claude Code hook configuration
`.github/hooks/pre-tool-use`	Added	Pre-tool validation hook (amplihack + XPIA)
`.github/hooks/post-tool-use`	Added	Post-tool hook wrapper
`.github/hooks/session-start`	Added	Session start hook wrapper
`.github/hooks/session-stop`	Added	Session stop hook wrapper
`.github/hooks/user-prompt-submit`	Added	User prompt hook wrapper
`.github/hooks/error-occurred`	Added	Error logging hook
`AGENTS.md`	Modified (-388 lines)	Removed large auto-routing block (cleanup)
`.claude/context/PROJECT.md`	Modified	Minor edits
`pyproject.toml`	Modified	Version bump 2.7.0 → 2.7.1

TESTING.md Quality Assessment

Strengths:

Comprehensive coverage: unit, handler, integration, live Azure, agentic, E2E, benchmarks, CI
Clear quick-start path (cd rust && cargo test --all)
Environment variables table is thorough
Cross-references to detailed docs in docs/
Integration test file table is complete (18 files)
Correctly flags live Azure tests as #[ignore]-gated

Observations:

Test counts (~2,950 total, 69 unit groups, 14 handler groups) may drift over time — the validation script scripts/test_testing_docs.sh addresses this proactively
The ANTHROPIC_API_KEY mention references env var names only, no actual values (security note in PR is accurate)

scripts/test_testing_docs.sh Assessment

Well-structured validation script with 14 check categories. Reads TESTING.md once to avoid 50+ grep subprocess spawns — good performance consideration. Exit code 1 on drift is CI-friendly.

One observation: The script uses grep -oP (Perl-compatible regex) which requires grep with PCRE support. This is standard on Linux (GNU grep) but worth noting for macOS environments which ship BSD grep by default (would need brew install grep).

Hook Files Assessment

The .github/hooks/ files are Claude Code / amplihack hook wrappers. They follow a consistent pattern: check ~/.amplihack/ first, fall back to repo-local hooks, then emit {} if neither exists. The pre-tool-use hook aggregates amplihack and XPIA pre-tool validation with a clean Python inline script.

Observation: The hook scripts reference $HOME/.amplihack/ paths which are developer-local. These hooks will silently no-op in CI (they emit {}), which is the correct behavior.

AGENTS.md Cleanup

The -388 line deletion removes a large auto-routing prompt block. This reduces duplication and noise in the file, keeping only the essential skill invocation instructions.

Recommendation

Merge — this is a clean documentation PR with good coverage. The addition of scripts/test_testing_docs.sh shows good operational hygiene (docs drift prevention). No breaking changes, no security concerns, no functional code changes to the azlin CLI itself.

Action Items:

Confirm CI passes (mergeable_state is currently unstable)
Verify scripts/test_testing_docs.sh passes against current codebase (test counts match)
Consider adding scripts/test_testing_docs.sh to CI workflow to enforce doc accuracy going forward

Generated by Dependency Review and Prioritization for issue #926

rysweet · 2026-03-30T16:30:23Z

Code Review — PR #926

Overall Assessment: Needs Work (fixable issues, core content is solid)

User Requirements Check

Requirement	Status
Add TESTING.md documenting test running	✅ Met — comprehensive, well-structured
Document all test categories	✅ Met — unit, handler, integration, agentic, E2E, benchmarks
Accurate test counts	❌ Off-by-one — doc says 70 unit / 15 handler, actual is 69 / 14
Validation script	✅ Met — 70 checks, clean bash

Issue 1: Test group counts are wrong (HIGH)

Location: TESTING.md lines 28-29, 39
Impact: High — the validation script itself catches this (68/70 pass, 2 fail)

The doc claims 70 unit test groups and 15 handler test groups, but the actual counts are 69 and 14. The commit ca872f5 ("fix: correct test group counts") was supposed to fix this but introduced the wrong numbers.

Fix: Update TESTING.md:

"70 test groups" → "69 test groups"
"15 test groups" → "14 test groups" / "15 handler test groups" → "14 handler test groups"

Issue 2: Framework artifacts committed (CRITICAL)

Location: Multiple files
Impact: Critical — 10 of 12 changed files are unrelated to the PR

The diff includes files that should NOT be in this PR:

.github/hooks/* (6 new files) — amplihack framework artifacts
.claude/context/PROJECT.md — project name changed to worktree name
AGENTS.md — 388-line deletion of auto-routing prompt (unrelated)
pyproject.toml — version bump 2.7.0 → 2.7.1 (unrelated)

Fix: Reset these files to main. Only TESTING.md and scripts/test_testing_docs.sh belong in this PR.

Issue 3: Commits not squashed (LOW)

Impact: Low — 3 commits should be 1 clean commit

Strengths

TESTING.md is genuinely useful — clear structure, accurate commands, good coverage
Validation script is well-engineered — reads file once, covers 14 categories, clean bash
Approximate counts with date anchors are smart for drift-prone numbers
Refresh command on line 33 helps maintainers
All 8 cross-referenced docs verified to exist

Philosophy Compliance

Dimension	Score
User Requirement Compliance	8/10 (counts wrong)
Simplicity	10/10
Modularity	10/10
Clarity	10/10

Summary

The documentation content is high quality. Two blocking issues:

Fix the off-by-one counts (69 unit, 14 handler)
Remove framework artifacts from the diff

github-actions · 2026-03-30T16:31:49Z

PR Review: Documentation and Tooling

PR Type: Documentation + Tooling (not a dependency update)
Priority: Low — no security, runtime, or dependency risk

Summary

This PR adds a root-level TESTING.md guide and several supporting artifacts.

Files changed (12):

File	Change	Notes
`TESTING.md`	+229 lines	New test running guide
`scripts/test_testing_docs.sh`	+215 lines	Validation script for doc accuracy
`.github/hooks/amplihack-hooks.json`	+47 lines	Hook configuration
`.github/hooks/pre-tool-use`	+73 lines	Pre-tool validation hook
`.github/hooks/error-occurred`	+21 lines	Error logging hook
`.github/hooks/post-tool-use`	+12 lines	Post-tool hook
`.github/hooks/session-start`	+12 lines	Session start hook
`.github/hooks/session-stop`	+18 lines	Session stop hook
`.github/hooks/user-prompt-submit`	+18 lines	User prompt hook
`AGENTS.md`	-388 lines	Large content removal
`pyproject.toml`	2.7.0 → 2.7.1	Version bump
`.claude/context/PROJECT.md`	minor edits	Project name/lang update

TESTING.md Quality Assessment

The new guide covers all major test categories:

Rust unit tests (70 test groups, ~1,430 tests)
Handler tests (15 test groups, ~270 tests)
Integration tests (18 files, ~210 tests)
Live Azure tests (#[ignore]-gated)
Agentic scenario tests (YAML-based via gadugi)
Agentic integration shell tests (10 functions)
E2E tests
Benchmarks
CI pipeline description

The validation script (scripts/test_testing_docs.sh) actively checks for documentation drift against the actual codebase — this is good practice.

Outside-in test results (from PR body):

cargo test --all confirmed ~2,996 passing, matching docs ✅
All 8 cross-referenced doc files verified to exist ✅
18 integration test files confirmed ✅
.cargo/config.toml RUST_MIN_STACK claim verified ✅

Observations

.github/hooks/ additions: Six new shell hook scripts are added. These delegate to Python scripts in $HOME/.amplihack/ or the repo root. The fallback behavior (emit {}) is safe. The pre-tool-use hook aggregates two hook outputs (amplihack + xpia) — review that the xpia hook path is intentional and expected in your environment.
AGENTS.md removal: 388 lines of auto-routing and recipe runner execution instructions were removed. Verify this content is no longer needed or has been moved elsewhere before merging.
Version bump (2.7.0 → 2.7.1): Minor patch bump is appropriate for a documentation-only change.
Test count drift risk: The PR notes this as a known risk — the scripts/test_testing_docs.sh validator addresses it. Consider wiring this script into CI.

Recommendation

Merge — the documentation is accurate (validated by outside-in testing), the validation script reduces drift risk, and there are no dependency or security concerns.

Before merging, confirm:

AGENTS.md content removal is intentional
.github/hooks/ xpia hook path is correct for your environment
scripts/test_testing_docs.sh is wired into CI or pre-commit if desired

Generated by Dependency Review and Prioritization for issue #926

rysweet · 2026-03-30T16:32:23Z

🔐 Security Review — PR #926

Reviewer: Security Agent (Step 17c)
Scope: All files in main..HEAD diff (TESTING.md, scripts/test_testing_docs.sh, + framework artifacts)
Verdict: ✅ PASS — No security vulnerabilities found

Checklist

Check	Status	Notes
Secrets/credentials in code	✅ Pass	No hardcoded secrets, tokens, or passwords
Injection vulnerabilities	✅ Pass	No eval/exec, no shell=True, no user-input interpolation
Path traversal	✅ Pass	All paths derived from script location via `dirname`
Sensitive data in logs/output	✅ Pass	Script outputs only pass/fail counts
Network access	✅ Pass	Script is fully local, no HTTP calls
File write side effects	✅ Pass	Script is read-only (validates, doesn't modify)
Temp file race conditions	✅ Pass	No temp files created
Shell strict mode	✅ Pass	`set -euo pipefail` enabled
Dependency supply chain	✅ Pass	No new dependencies added
Auth/authz changes	✅ N/A	No auth code modified

Detailed Analysis

scripts/test_testing_docs.sh

Uses set -euo pipefail — fails fast on errors, undefined vars, and pipe failures. Good.
REPO_ROOT derived from $(cd "$(dirname "$0")/.." && pwd) — safe, no user input.
All find/grep commands use hardcoded paths and patterns — no injection surface.
pass()/fail() functions echo only script-controlled strings — no untrusted input.
No eval, source, exec, curl, wget, or dynamic code execution.

TESTING.md

Documentation only — no executable code.
Environment variable names documented (ANTHROPIC_API_KEY, AZURE_SUBSCRIPTION_ID, etc.) — standard practice, no values exposed.

Informational (non-blocking)

#	Finding	Severity
1	Line 134 references `~/.claude-msec-k` as API key file path — minor information disclosure of key storage location	INFO
2	Framework artifacts (`.github/hooks/`, AGENTS.md, etc.) in diff — already flagged by code review, not a security issue but should be cleaned before merge	INFO

Neither finding requires action for security approval.

rysweet · 2026-03-30T16:34:02Z

🧘 Philosophy Guardian Review: PR #926 — TESTING.md

Philosophy Score: B+

The intended deliverable (TESTING.md + validation script) is philosophy-aligned. Two issues prevent an A.

Strengths ✓

Ruthless simplicity: Each section in TESTING.md serves exactly one purpose — no filler, no marketing prose
Brick philosophy: Validation script (test_testing_docs.sh) is a self-contained brick with one job: detect documentation drift. Clean input/output contract
Zen minimalism: Doc reads like a reference card — Quick Start up front, categories organized by frequency of use, env vars in a table
Regeneratable: Spec is clear enough that AI could rebuild both files from "document azlin test infrastructure with a self-validating script"
Self-validating: The script itself embodies philosophy — code proves its own correctness instead of relying on manual review
No silent failures: Script uses set -euo pipefail, counts pass/fail explicitly, exits non-zero on drift

Concerns ⚠

Test counts inaccurate (doc=70/15, actual=69/14) — this violates the zero-BS principle. The validation script correctly catches this, proving its value, but the doc should match reality before merge
Framework artifact pollution — 10 of 12 changed files are unrelated framework artifacts (.github/hooks/*, AGENTS.md, pyproject.toml, .claude/context/PROJECT.md). This violates clean module boundaries — a docs PR should contain docs changes only

Forbidden Pattern Violations ✗✗

None detected — no error swallowing, no silent fallbacks, no data loss patterns

Violations ✗

Commit hygiene — 3 commits should be squashed to 1 clean commit. The current history (docs: add... → feat: Add... → fix: correct...) shows iteration that belongs in a single atomic commit

Recommendations

Immediate: Fix test counts (70→69, 15→14) so validation passes 70/70 ✅
Immediate: Remove framework artifacts from the diff — only TESTING.md and scripts/test_testing_docs.sh should be in the changeset
Low: Squash to a single commit with a clear message

Regeneration Assessment

Can AI rebuild this module?

Specification clarity: Clear — "document all test categories, commands, and env vars for the azlin Rust workspace with a self-validating script"
Contract definition: Well-defined — script validates doc accuracy against filesystem reality
Verdict: ✅ Ready for AI regeneration

Philosophy Compliance Checklist

Ruthless simplicity achieved — no unnecessary content
Bricks & studs pattern followed — self-contained doc + self-contained script
Zero-BS implementation — test counts are wrong (69≠70, 14≠15)
No over-engineering — validation script is proportional to the problem
Clean module boundaries — framework artifacts pollute the diff

Overall: Strong philosophy alignment on the deliverable itself. Fix the two data accuracy issues and clean the diff to achieve full compliance.

Adds a top-level TESTING.md covering all test categories, commands, and environment setup. Includes a validation script (scripts/test_testing_docs.sh) that catches documentation drift by checking test counts, file references, and linked docs against the actual codebase. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Under set -euo pipefail, grep pattern extraction lines would cause silent script termination if TESTING.md wording changed. Now uses || true with empty-value checks to report failures via the pass/fail counters instead of crashing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rysweet · 2026-03-30T16:47:18Z

Ready for Final Review

All workflow steps completed:

Ready for merge approval.

rysweet force-pushed the feat/task-unnamed-1774884381 branch from ca872f5 to 266c336 Compare March 30, 2026 16:35

rysweet force-pushed the feat/task-unnamed-1774884381 branch from f7cfe17 to 763a176 Compare March 30, 2026 16:39

rysweet marked this pull request as ready for review March 30, 2026 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a TESTING.md documenting how to run azlin tests.#926

Add a TESTING.md documenting how to run azlin tests.#926
rysweet wants to merge 2 commits intomainfrom
feat/task-unnamed-1774884381

rysweet commented Mar 30, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

rysweet commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

rysweet commented Mar 30, 2026

Uh oh!

rysweet commented Mar 30, 2026

Uh oh!

rysweet commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rysweet commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issue

Changes

Testing

Checklist

Step 16b: Outside-In Testing Results

Scenario 1 — Verify cargo test --all matches documented counts

Scenario 2 — Verify all linked documentation files exist

Scenario 3 — Verify integration test file listing accuracy

Scenario 4 — Verify .cargo/config.toml RUST_MIN_STACK claim

Uh oh!

github-actions bot commented Mar 30, 2026

PR Review

Summary of Changes (12 files, +647/-391)

TESTING.md Quality Assessment

scripts/test_testing_docs.sh Assessment

Hook Files Assessment

AGENTS.md Cleanup

Recommendation

Uh oh!

rysweet commented Mar 30, 2026

Code Review — PR #926

User Requirements Check

Issue 1: Test group counts are wrong (HIGH)

Issue 2: Framework artifacts committed (CRITICAL)

Issue 3: Commits not squashed (LOW)

Strengths

Philosophy Compliance

Summary

Uh oh!

github-actions bot commented Mar 30, 2026

PR Review: Documentation and Tooling

Summary

TESTING.md Quality Assessment

Observations

Recommendation

Uh oh!

rysweet commented Mar 30, 2026

🔐 Security Review — PR #926

Checklist

Detailed Analysis

Informational (non-blocking)

Uh oh!

rysweet commented Mar 30, 2026

🧘 Philosophy Guardian Review: PR #926 — TESTING.md

Philosophy Score: B+

Strengths ✓

Concerns ⚠

Forbidden Pattern Violations ✗✗

Violations ✗

Recommendations

Regeneration Assessment

Philosophy Compliance Checklist

Uh oh!

rysweet commented Mar 30, 2026

Ready for Final Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rysweet commented Mar 30, 2026 •

edited

Loading

Scenario 1 — Verify `cargo test --all` matches documented counts