Skip to content

Add a TESTING.md documenting how to run azlin tests.#926

Open
rysweet wants to merge 2 commits intomainfrom
feat/task-unnamed-1774884381
Open

Add a TESTING.md documenting how to run azlin tests.#926
rysweet wants to merge 2 commits intomainfrom
feat/task-unnamed-1774884381

Conversation

@rysweet
Copy link
Copy Markdown
Owner

@rysweet rysweet commented Mar 30, 2026

Summary

Add a TESTING.md documenting how to run azlin tests.

Issue

Closes #925

Changes

{"components":[{"action":"create","name":"TESTING.md","purpose":"Root-level test running guide covering all ~3,000 tests"}],"files_to_change":[],"implementation_order":["1. Create TESTING.md with Quick Start, unit/handler/integration/E2E/agentic sections","2. Verify no secrets leaked in documentation","3. Commit"],"new_files":["TESTING.md"],"risks":["Test counts (68 unit groups, 13 handler groups, ~3,000 total) may drift as code evolves — periodic refresh needed"],"security_considerations":["Doc references env var names (ANTHROPIC_API_KEY) but never actual values — verified clean"],"test_files":[]}

Testing

  • Unit tests added
  • Local testing completed
  • Pre-commit hooks pass

Checklist

  • Tests pass
  • Documentation updated
  • No stubs or TODOs
  • Code review completed
  • Philosophy check passed

This PR was created as a draft for review before merging.

Step 16b: Outside-In Testing Results

Scenario 1 — Verify cargo test --all matches documented counts

Command: cd rust && cargo test --all 2>&1 | grep "^test result:"
Result: PASS
Output: 2,996 passed; 0 failed; 114 ignored — matches updated TESTING.md claims (~3,000 / ~114)

Scenario 2 — Verify all linked documentation files exist

Command: for f in <8 doc paths>; do test -f "$f" && echo EXISTS; done
Result: PASS
Output: All 8 linked doc files exist (TEST_SUITE_SPECIFICATION.md, AGENTIC_INTEGRATION_TESTS.md, etc.)

Scenario 3 — Verify integration test file listing accuracy

Command: ls rust/crates/azlin/tests/*.rs | wc -l
Result: PASS
Output: 18 files found, matching all 18 files listed in TESTING.md table

Scenario 4 — Verify .cargo/config.toml RUST_MIN_STACK claim

Command: cat rust/.cargo/config.toml
Result: PASS
Output: RUST_MIN_STACK = "8388608" confirmed

Fix iterations: 1 (corrected test group counts: 69→70, 14→15; total 2,950→3,000)

@github-actions
Copy link
Copy Markdown
Contributor

PR Review

PR: Add a TESTING.md documenting how to run azlin tests
Type: Documentation + Tooling
Author: @rysweet

Note: This is not a dependency update PR, so the standard dependency review template does not apply. Providing a general code review instead.


Summary of Changes (12 files, +647/-391)

File Change Purpose
TESTING.md Added (229 lines) Root-level test running guide
scripts/test_testing_docs.sh Added (215 lines) Validation script to catch doc drift
.github/hooks/amplihack-hooks.json Added Claude Code hook configuration
.github/hooks/pre-tool-use Added Pre-tool validation hook (amplihack + XPIA)
.github/hooks/post-tool-use Added Post-tool hook wrapper
.github/hooks/session-start Added Session start hook wrapper
.github/hooks/session-stop Added Session stop hook wrapper
.github/hooks/user-prompt-submit Added User prompt hook wrapper
.github/hooks/error-occurred Added Error logging hook
AGENTS.md Modified (-388 lines) Removed large auto-routing block (cleanup)
.claude/context/PROJECT.md Modified Minor edits
pyproject.toml Modified Version bump 2.7.0 → 2.7.1

TESTING.md Quality Assessment

Strengths:

  • Comprehensive coverage: unit, handler, integration, live Azure, agentic, E2E, benchmarks, CI
  • Clear quick-start path (cd rust && cargo test --all)
  • Environment variables table is thorough
  • Cross-references to detailed docs in docs/
  • Integration test file table is complete (18 files)
  • Correctly flags live Azure tests as #[ignore]-gated

Observations:

  • Test counts (~2,950 total, 69 unit groups, 14 handler groups) may drift over time — the validation script scripts/test_testing_docs.sh addresses this proactively
  • The ANTHROPIC_API_KEY mention references env var names only, no actual values (security note in PR is accurate)

scripts/test_testing_docs.sh Assessment

Well-structured validation script with 14 check categories. Reads TESTING.md once to avoid 50+ grep subprocess spawns — good performance consideration. Exit code 1 on drift is CI-friendly.

One observation: The script uses grep -oP (Perl-compatible regex) which requires grep with PCRE support. This is standard on Linux (GNU grep) but worth noting for macOS environments which ship BSD grep by default (would need brew install grep).

Hook Files Assessment

The .github/hooks/ files are Claude Code / amplihack hook wrappers. They follow a consistent pattern: check ~/.amplihack/ first, fall back to repo-local hooks, then emit {} if neither exists. The pre-tool-use hook aggregates amplihack and XPIA pre-tool validation with a clean Python inline script.

Observation: The hook scripts reference $HOME/.amplihack/ paths which are developer-local. These hooks will silently no-op in CI (they emit {}), which is the correct behavior.

AGENTS.md Cleanup

The -388 line deletion removes a large auto-routing prompt block. This reduces duplication and noise in the file, keeping only the essential skill invocation instructions.


Recommendation

Merge — this is a clean documentation PR with good coverage. The addition of scripts/test_testing_docs.sh shows good operational hygiene (docs drift prevention). No breaking changes, no security concerns, no functional code changes to the azlin CLI itself.

Action Items:

  • Confirm CI passes (mergeable_state is currently unstable)
  • Verify scripts/test_testing_docs.sh passes against current codebase (test counts match)
  • Consider adding scripts/test_testing_docs.sh to CI workflow to enforce doc accuracy going forward

Generated by Dependency Review and Prioritization for issue #926

@rysweet
Copy link
Copy Markdown
Owner Author

rysweet commented Mar 30, 2026

Code Review — PR #926

Overall Assessment: Needs Work (fixable issues, core content is solid)

User Requirements Check

Requirement Status
Add TESTING.md documenting test running ✅ Met — comprehensive, well-structured
Document all test categories ✅ Met — unit, handler, integration, agentic, E2E, benchmarks
Accurate test counts ❌ Off-by-one — doc says 70 unit / 15 handler, actual is 69 / 14
Validation script ✅ Met — 70 checks, clean bash

Issue 1: Test group counts are wrong (HIGH)

Location: TESTING.md lines 28-29, 39
Impact: High — the validation script itself catches this (68/70 pass, 2 fail)

The doc claims 70 unit test groups and 15 handler test groups, but the actual counts are 69 and 14. The commit ca872f5 ("fix: correct test group counts") was supposed to fix this but introduced the wrong numbers.

Fix: Update TESTING.md:

  • "70 test groups" → "69 test groups"
  • "15 test groups" → "14 test groups" / "15 handler test groups" → "14 handler test groups"

Issue 2: Framework artifacts committed (CRITICAL)

Location: Multiple files
Impact: Critical — 10 of 12 changed files are unrelated to the PR

The diff includes files that should NOT be in this PR:

  • .github/hooks/* (6 new files) — amplihack framework artifacts
  • .claude/context/PROJECT.md — project name changed to worktree name
  • AGENTS.md — 388-line deletion of auto-routing prompt (unrelated)
  • pyproject.toml — version bump 2.7.0 → 2.7.1 (unrelated)

Fix: Reset these files to main. Only TESTING.md and scripts/test_testing_docs.sh belong in this PR.


Issue 3: Commits not squashed (LOW)

Impact: Low — 3 commits should be 1 clean commit


Strengths

  • TESTING.md is genuinely useful — clear structure, accurate commands, good coverage
  • Validation script is well-engineered — reads file once, covers 14 categories, clean bash
  • Approximate counts with date anchors are smart for drift-prone numbers
  • Refresh command on line 33 helps maintainers
  • All 8 cross-referenced docs verified to exist

Philosophy Compliance

Dimension Score
User Requirement Compliance 8/10 (counts wrong)
Simplicity 10/10
Modularity 10/10
Clarity 10/10

Summary

The documentation content is high quality. Two blocking issues:

  1. Fix the off-by-one counts (69 unit, 14 handler)
  2. Remove framework artifacts from the diff

@github-actions
Copy link
Copy Markdown
Contributor

PR Review: Documentation and Tooling

PR Type: Documentation + Tooling (not a dependency update)
Priority: Low — no security, runtime, or dependency risk


Summary

This PR adds a root-level TESTING.md guide and several supporting artifacts.

Files changed (12):

File Change Notes
TESTING.md +229 lines New test running guide
scripts/test_testing_docs.sh +215 lines Validation script for doc accuracy
.github/hooks/amplihack-hooks.json +47 lines Hook configuration
.github/hooks/pre-tool-use +73 lines Pre-tool validation hook
.github/hooks/error-occurred +21 lines Error logging hook
.github/hooks/post-tool-use +12 lines Post-tool hook
.github/hooks/session-start +12 lines Session start hook
.github/hooks/session-stop +18 lines Session stop hook
.github/hooks/user-prompt-submit +18 lines User prompt hook
AGENTS.md -388 lines Large content removal
pyproject.toml 2.7.0 → 2.7.1 Version bump
.claude/context/PROJECT.md minor edits Project name/lang update

TESTING.md Quality Assessment

The new guide covers all major test categories:

  • Rust unit tests (70 test groups, ~1,430 tests)
  • Handler tests (15 test groups, ~270 tests)
  • Integration tests (18 files, ~210 tests)
  • Live Azure tests (#[ignore]-gated)
  • Agentic scenario tests (YAML-based via gadugi)
  • Agentic integration shell tests (10 functions)
  • E2E tests
  • Benchmarks
  • CI pipeline description

The validation script (scripts/test_testing_docs.sh) actively checks for documentation drift against the actual codebase — this is good practice.

Outside-in test results (from PR body):

  • cargo test --all confirmed ~2,996 passing, matching docs ✅
  • All 8 cross-referenced doc files verified to exist ✅
  • 18 integration test files confirmed ✅
  • .cargo/config.toml RUST_MIN_STACK claim verified ✅

Observations

  1. .github/hooks/ additions: Six new shell hook scripts are added. These delegate to Python scripts in $HOME/.amplihack/ or the repo root. The fallback behavior (emit {}) is safe. The pre-tool-use hook aggregates two hook outputs (amplihack + xpia) — review that the xpia hook path is intentional and expected in your environment.

  2. AGENTS.md removal: 388 lines of auto-routing and recipe runner execution instructions were removed. Verify this content is no longer needed or has been moved elsewhere before merging.

  3. Version bump (2.7.0 → 2.7.1): Minor patch bump is appropriate for a documentation-only change.

  4. Test count drift risk: The PR notes this as a known risk — the scripts/test_testing_docs.sh validator addresses it. Consider wiring this script into CI.


Recommendation

Merge — the documentation is accurate (validated by outside-in testing), the validation script reduces drift risk, and there are no dependency or security concerns.

Before merging, confirm:

  • AGENTS.md content removal is intentional
  • .github/hooks/ xpia hook path is correct for your environment
  • scripts/test_testing_docs.sh is wired into CI or pre-commit if desired

Generated by Dependency Review and Prioritization for issue #926

@rysweet
Copy link
Copy Markdown
Owner Author

rysweet commented Mar 30, 2026

🔐 Security Review — PR #926

Reviewer: Security Agent (Step 17c)
Scope: All files in main..HEAD diff (TESTING.md, scripts/test_testing_docs.sh, + framework artifacts)
Verdict: ✅ PASS — No security vulnerabilities found


Checklist

Check Status Notes
Secrets/credentials in code ✅ Pass No hardcoded secrets, tokens, or passwords
Injection vulnerabilities ✅ Pass No eval/exec, no shell=True, no user-input interpolation
Path traversal ✅ Pass All paths derived from script location via dirname
Sensitive data in logs/output ✅ Pass Script outputs only pass/fail counts
Network access ✅ Pass Script is fully local, no HTTP calls
File write side effects ✅ Pass Script is read-only (validates, doesn't modify)
Temp file race conditions ✅ Pass No temp files created
Shell strict mode ✅ Pass set -euo pipefail enabled
Dependency supply chain ✅ Pass No new dependencies added
Auth/authz changes ✅ N/A No auth code modified

Detailed Analysis

scripts/test_testing_docs.sh

  • Uses set -euo pipefail — fails fast on errors, undefined vars, and pipe failures. Good.
  • REPO_ROOT derived from $(cd "$(dirname "$0")/.." && pwd) — safe, no user input.
  • All find/grep commands use hardcoded paths and patterns — no injection surface.
  • pass()/fail() functions echo only script-controlled strings — no untrusted input.
  • No eval, source, exec, curl, wget, or dynamic code execution.

TESTING.md

  • Documentation only — no executable code.
  • Environment variable names documented (ANTHROPIC_API_KEY, AZURE_SUBSCRIPTION_ID, etc.) — standard practice, no values exposed.

Informational (non-blocking)

# Finding Severity
1 Line 134 references ~/.claude-msec-k as API key file path — minor information disclosure of key storage location INFO
2 Framework artifacts (.github/hooks/, AGENTS.md, etc.) in diff — already flagged by code review, not a security issue but should be cleaned before merge INFO

Neither finding requires action for security approval.

@rysweet
Copy link
Copy Markdown
Owner Author

rysweet commented Mar 30, 2026

🧘 Philosophy Guardian Review: PR #926 — TESTING.md

Philosophy Score: B+

The intended deliverable (TESTING.md + validation script) is philosophy-aligned. Two issues prevent an A.


Strengths ✓

  • Ruthless simplicity: Each section in TESTING.md serves exactly one purpose — no filler, no marketing prose
  • Brick philosophy: Validation script (test_testing_docs.sh) is a self-contained brick with one job: detect documentation drift. Clean input/output contract
  • Zen minimalism: Doc reads like a reference card — Quick Start up front, categories organized by frequency of use, env vars in a table
  • Regeneratable: Spec is clear enough that AI could rebuild both files from "document azlin test infrastructure with a self-validating script"
  • Self-validating: The script itself embodies philosophy — code proves its own correctness instead of relying on manual review
  • No silent failures: Script uses set -euo pipefail, counts pass/fail explicitly, exits non-zero on drift

Concerns ⚠

  • Test counts inaccurate (doc=70/15, actual=69/14) — this violates the zero-BS principle. The validation script correctly catches this, proving its value, but the doc should match reality before merge
  • Framework artifact pollution — 10 of 12 changed files are unrelated framework artifacts (.github/hooks/*, AGENTS.md, pyproject.toml, .claude/context/PROJECT.md). This violates clean module boundaries — a docs PR should contain docs changes only

Forbidden Pattern Violations ✗✗

  • None detected — no error swallowing, no silent fallbacks, no data loss patterns

Violations ✗

  • Commit hygiene — 3 commits should be squashed to 1 clean commit. The current history (docs: add...feat: Add...fix: correct...) shows iteration that belongs in a single atomic commit

Recommendations

  1. Immediate: Fix test counts (70→69, 15→14) so validation passes 70/70 ✅
  2. Immediate: Remove framework artifacts from the diff — only TESTING.md and scripts/test_testing_docs.sh should be in the changeset
  3. Low: Squash to a single commit with a clear message

Regeneration Assessment

Can AI rebuild this module?

  • Specification clarity: Clear — "document all test categories, commands, and env vars for the azlin Rust workspace with a self-validating script"
  • Contract definition: Well-defined — script validates doc accuracy against filesystem reality
  • Verdict: ✅ Ready for AI regeneration

Philosophy Compliance Checklist

  • Ruthless simplicity achieved — no unnecessary content
  • Bricks & studs pattern followed — self-contained doc + self-contained script
  • Zero-BS implementation — test counts are wrong (69≠70, 14≠15)
  • No over-engineering — validation script is proportional to the problem
  • Clean module boundaries — framework artifacts pollute the diff

Overall: Strong philosophy alignment on the deliverable itself. Fix the two data accuracy issues and clean the diff to achieve full compliance.

@rysweet rysweet force-pushed the feat/task-unnamed-1774884381 branch from ca872f5 to 266c336 Compare March 30, 2026 16:35
Adds a top-level TESTING.md covering all test categories, commands, and
environment setup. Includes a validation script (scripts/test_testing_docs.sh)
that catches documentation drift by checking test counts, file references,
and linked docs against the actual codebase.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rysweet rysweet force-pushed the feat/task-unnamed-1774884381 branch from f7cfe17 to 763a176 Compare March 30, 2026 16:39
Under set -euo pipefail, grep pattern extraction lines would cause
silent script termination if TESTING.md wording changed. Now uses
|| true with empty-value checks to report failures via the pass/fail
counters instead of crashing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rysweet rysweet marked this pull request as ready for review March 30, 2026 16:47
@rysweet
Copy link
Copy Markdown
Owner Author

rysweet commented Mar 30, 2026

Ready for Final Review

All workflow steps completed:

  • Requirements clarified
  • Design approved
  • Implementation complete
  • Tests passing
  • Code review addressed
  • Philosophy compliance verified
  • Final cleanup complete
  • Quality audit passed

Ready for merge approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a TESTING.md documenting how to run azlin tests.

1 participant