feat(ambient): add agent orchestration to ambient mode#149
Conversation
Replace passive skill-only ambient mode with full agent orchestration. Two depth tiers: QUICK (zero overhead) and ORCHESTRATED (skills + agent pipelines). Three new orchestration skills drive intent-specific pipelines: - implementation-orchestration: pre-flight → Coder → quality gates - debug-orchestration: competing hypotheses → parallel Explores → root cause - plan-orchestration: Skimmer → Explores → Plan agent → gap validation Key changes: - Remove ELEVATE tier, replace GUIDED with ORCHESTRATED - Rename BUILD intent to IMPLEMENT for clarity - Skills loaded via Skill tool instead of Read (fixes broken loading) - Add TDD skill to Coder agent frontmatter permanently - Ambient plugin now includes 7 agents + 4 skills - Update ambient-prompt hook preamble for new tiers - Classification conservatism: default to QUICK Closes #84 (superseded by Skill tool approach).
Code Review Summary: Ambient Mode Agent OrchestrationPR: #149 Blocking Issues (Must Address Before Merge)Critical: Stale Integration TestsSeverity: CRITICAL | Confidence: 100% (3 reviewers flagged) The integration test helpers use regex patterns matching the old taxonomy ( Required Fix:
High: Plugin README Not UpdatedSeverity: HIGH | Confidence: 100% (2 reviewers flagged) README still documents the removed 3-tier model (QUICK/GUIDED/ELEVATE), old intent names (BUILD), and claims "no agents spawned." Users will see incorrect information about how the plugin works. Required Fix: Rewrite README to document:
High: Missing Agent/Skill DeclarationsSeverity: HIGH | Confidence: 100% (2 reviewers flagged) Orchestration skills reference "Explore agents" and "Plan agents" in phases, but these are not declared in plugin.json agents array. Also missing: Required Fix:
High:
|
| Dimension | Score | Issues | Key Concern |
|---|---|---|---|
| Security | 8/10 | 1 CRITICAL, 3 HIGH, 1 Should-Fix | Bash tool usage scope needs documentation |
| Architecture | 7/10 | 3 HIGH, 2 MEDIUM | Missing agent declarations; undocumented delta vs /implement//debug |
| Performance | 7/10 | 2 HIGH, 3 MEDIUM | No lightweight path for trivial IMPLEMENT; unbounded DEBUG budget |
| Complexity | 7/10 | 2 HIGH, 3 MEDIUM | Pipeline tables duplicated 3 places; orchestration skills crosstalk |
| Consistency | 5/10 | 2 HIGH, 2 MEDIUM | Task tool unprecedented; search-first dropped; 3→2 tier collapse undocumented |
| Regression | 3/10 | 1 CRITICAL, 3 HIGH | Tests broken; README stale; 5 intentional behavioral changes undocumented |
| Tests | 3/10 | 1 CRITICAL, 1 HIGH | Zero tests for new orchestration skills; integration tests have wrong taxonomy |
| TypeScript | 8/10 | 0 CRITICAL, 0 HIGH | One should-fix on untyped JSON.parse; passes strict compilation |
| Documentation | 5/10 | 2 HIGH, 3 MEDIUM | Plugin README not updated; integration tests stale; no CHANGELOG entry |
Pre-existing Issues (Not Blocking)
Should Consider in Follow-up:
- Settings file written without atomic write (fs.writeFile not atomic)
- Coder agent skill frontmatter growing without bounds (7 skills, may add domain skills)
- No shared Explore/Plan agent definitions (4+ commands define inline, inconsistent behavior)
- Classification conservatism relies on LLM compliance, no mechanical enforcement
- CLAUDE.md mentions "3-tier" system but orchestration skills don't fit existing tiers
Commit Message Note
The commit message states "add agent orchestration to ambient mode" which implies an additive change. In reality, this PR:
- Renames intent BUILD → IMPLEMENT
- Removes two depth tiers (GUIDED/ELEVATE) → single ORCHESTRATED
- Removes two skills from IMPLEMENT (search-first, test-driven-development)
- Changes enforcement model (main-session skills → agent orchestration)
- Downgrades analytical EXPLORE (QUICK/GUIDED split → always QUICK)
Consider updating the commit message to reflect the scope: refactor(ambient): orchestrate agents with taxonomy rename and tier collapse
Recommendation
CHANGES_REQUESTED
This PR is a significant architectural improvement for ambient mode, but requires fixes in 4 blocking areas (integration tests, plugin README, agent declarations, search-first skill) before merge. All blocking items are documentation/manifest corrections, not design changes.
Unblock Sequence:
- Fix integration test regex + assertions (15 min)
- Rewrite plugin README (20 min)
- Audit & add missing agents/skills to plugin.json (10 min)
- Decide on search-first restoration or removal documentation (5 min)
High-priority should-fix items (performance scope gate, debug budget cap, JSON.parse error handling) recommended as follow-ups or pre-merge if time permits.
Automated review via Claude Code + Snyk security scanning
| name: implementation-orchestration | ||
| description: Agent orchestration for IMPLEMENT intent — pre-flight, Coder, quality gates | ||
| user-invocable: false | ||
| allowed-tools: Read, Grep, Glob, Bash, Task, AskUserQuestion |
There was a problem hiding this comment.
Orchestration skills declare Task tool with no precedent
The 'Task' tool appears in allowed-tools for the first time across 32+ existing skills. No other skill in the codebase lists Task.
This is either:
- A new tool permission that should be documented in as a convention for orchestration-tier skills, OR
- A mistaken addition that should be removed
Suggested fix: Document why orchestration skills need Task access, or remove it if not required.
Confidence: HIGH - verified against all 32 pre-existing skills
| | test-driven-development | Always for BUILD | `*.ts`, `*.tsx`, `*.js`, `*.jsx`, `*.py` | | ||
| | implementation-patterns | Always for BUILD | Any code file | | ||
| | search-first | Always for BUILD | Any code file | | ||
| | implementation-orchestration | Always for IMPLEMENT | Any — orchestrates agent pipeline | |
There was a problem hiding this comment.
search-first skill silently dropped from IMPLEMENT intent
The skill-catalog previously listed 'search-first' as 'Always for BUILD'. This PR removes it without migration or explanation.
The search-first skill enforces 'look before you leap' — ensuring developers check for existing implementations before writing new code. This is still valid for ambient IMPLEMENT work.
Suggested fix:
- Add search-first back to the IMPLEMENT intent in skill-catalog, OR
- Document the intentional removal with rationale (e.g., 'redundant with Coder agent's core-patterns skill')
Confidence: HIGH - confirmed across 3 review reports
|
|
||
| Based on classified intent, invoke each selected skill using the Skill tool. | ||
|
|
||
| | Intent | Primary Skills | Secondary (if file type matches) | |
There was a problem hiding this comment.
Duplicate pipeline mapping tables across 3 files create maintenance burden
The intent-to-pipeline mapping appears in 3 places with inconsistent structure:
- ambient.md Phase 4 (lines 64-69)
- ambient-router Step 3 (lines 53-58)
- ambient-router Step 5 (lines 79-86)
When the next orchestration skill is added, all 3 tables must update in lockstep. Any mismatch causes confusing behavior where documentation contradicts implementation.
Suggested fix: Make ambient-router the single source of truth. Replace ambient.md Phase 4 table with:
**ORCHESTRATED:**
Invoke each selected skill using the Skill tool per the ambient-router's
Step 3 (skill selection) and Step 5 (agent orchestration). The ambient-router
skill is the single source of truth for intent-to-pipeline mapping.
This collapses a 6-line duplication into a 2-line reference.
Confidence: HIGH - verified by Complexity and Architecture reviews
|
|
||
| Pass FILES_CHANGED to all quality gate agents. | ||
|
|
||
| ## Phase 5: Quality Gates |
There was a problem hiding this comment.
IMPLEMENT pipeline has no scope gate — trivial changes pay full cost
Phase 5 (Quality Gates) runs the full pipeline (Validator, Simplifier, Scrutinizer, Shepherd) regardless of change scope. A one-line config update gets the same 5-6 agent treatment as a multi-file feature.
The Iron Law says 'no shortcut' for quality, but proportionality matters: trivial changes should have trivial overhead.
Suggested fix: Add a Phase 2.5 scope gate after plan synthesis:
If EXECUTION_PLAN affects <= 1 file and <= 20 lines:
- Run LIGHT pipeline: Coder → Validator (single pass, no retry)
- Skip Simplifier, Scrutinizer, Shepherd
Otherwise: full pipeline (Phase 3-6)
This preserves the Iron Law for substantive changes while avoiding expensive pipelines for trivial work.
Confidence: HIGH - Performance review H1
Code Review Summary: PR #149 - Ambient Mode Agent OrchestrationStatus: CHANGES_REQUESTEDNine comprehensive reviews identified 25+ findings across architecture, documentation, complexity, consistency, performance, regression, security, and tests. 5 HIGH-confidence blocking issues plus additional issues requiring documentation updates. Blocking Issues (Must Fix Before Merge)1. Integration Tests Broken by Taxonomy RenameFiles: The integration test helpers use hardcoded regex for old taxonomy ( Fix: Update regex patterns to match new taxonomy in helpers.ts and test assertions in ambient-activation.test.ts. 2. Unprecedented 'Task' Tool in allowed-toolsFiles: All 3 orchestration skills No other skill in the codebase requests 'Task' access. This is the first precedent. Fix: Document why orchestration skills need Task in docs/reference/skills-architecture.md, OR remove it. 3. search-first Skill Silently DroppedFile: Quality gate removed without explanation. Fix: Restore search-first to IMPLEMENT, OR document intentional removal. 4. Plugin README.md Completely StaleSeverity: HIGH (Not in current diff) Shows old 3-tier model and BUILD intent — all removed by this PR. Fix: Rewrite README for QUICK/ORCHESTRATED model with IMPLEMENT intent and 4 pipelines. 5. Empty CHANGELOG for Breaking ChangesFile: CHANGELOG.md No [Unreleased] entry for breaking changes (intent rename, tier removal, skill removal). Fix: Add comprehensive [Unreleased] section documenting all breaking changes. Additional High-Confidence IssuesTesting Gaps (CRITICAL)
Architectural Documentation
Performance Issues
Consistency Issues
Inline Comments CreatedThe following inline comments have been added for high-confidence code issues:
Recommended Fix Priority
Review Summary: Architecture sound, implementation pace exceeded documentation/testing. Core two-tier QUICK/ORCHESTRATED model is solid. All 9 reviewers agree on blocking issues. |
…dget Code review fixes for PR #149: - Reinstate GUIDED as middle tier: QUICK / GUIDED / ORCHESTRATED - GUIDED: small scope (≤2 files), main session + skills + Simplifier - ORCHESTRATED: large scope (>2 files), full agent pipeline - Scope-based split per intent (IMPLEMENT, DEBUG, PLAN, REVIEW) - Add search-first to Coder agent permanent skills - Add hard cap of 8 Explore agents for DEBUG pipeline - Fix stale integration tests: BUILD→IMPLEMENT, ELEVATE→ORCHESTRATED - Update plugin README, ambient command, hook preamble for three tiers - Add CHANGELOG entry under [Unreleased]
| |--------|---------------|----------------------------------| | ||
| | **BUILD** | test-driven-development, implementation-patterns, search-first | typescript (.ts), react (.tsx/.jsx), go (.go), java (.java), python (.py), rust (.rs), frontend-design (CSS/UI), input-validation (forms/API), security-patterns (auth/crypto) | | ||
| | **DEBUG** | test-patterns, core-patterns | git-safety (if git operations involved) | | ||
| | **IMPLEMENT** | implementation-patterns, search-first | typescript (.ts), react (.tsx/.jsx), go (.go), java (.java), python (.py), rust (.rs), frontend-design (CSS/UI), input-validation (forms/API), security-patterns (auth/crypto) | |
There was a problem hiding this comment.
Missing TDD in GUIDED IMPLEMENT ❌
The GUIDED-depth IMPLEMENT row lists implementation-patterns, search-first but omits test-driven-development. However:
- Line 91 contains a conditional: "If test-driven-development is selected (IMPLEMENT intent), you MUST write the failing test"
- The
test-driven-development/SKILL.mddocuments: "IMPLEMENT/GUIDED -> TDD enforced in main session" plugins/devflow-ambient/README.md:70advertises: "test-driven-development -- TDD enforcement for IMPLEMENT (GUIDED + ORCHESTRATED)"
This creates a broken contract: TDD is documented as enforced at GUIDED depth, but the router never selects it there.
Fix: Add test-driven-development to the GUIDED IMPLEMENT primary skills:
| **IMPLEMENT** | implementation-patterns, search-first, test-driven-development | typescript (.ts), ... |
Confidence: 95% (HIGH - Regression)
scripts/hooks/ambient-prompt
Outdated
| @@ -32,13 +32,23 @@ fi | |||
|
|
|||
| # Inject classification preamble | |||
| PREAMBLE="AMBIENT MODE ACTIVE: Before responding, silently classify this prompt: | |||
There was a problem hiding this comment.
Hook preamble bloat undermines QUICK zero-overhead promise ⚠
The PREAMBLE string grew from ~40 words to ~150 words and is injected as additionalContext on EVERY user prompt ≥2 words. This contradicts the design promise: "QUICK: Zero overhead."
Impact: Over a 50-prompt session, ~7,500 tokens consumed by preamble injection alone, adding latency to every prompt.
Suggested fix: Reduce preamble to minimal trigger (~25 words):
PREAMBLE="AMBIENT MODE ACTIVE: Classify this prompt using the ambient-router skill in your session context.
Default to QUICK. Only state classification for GUIDED/ORCHESTRATED."This cuts injection from ~150 words to ~25 words (83% reduction). The full classification logic lives in ambient-router SKILL.md which is already loaded.
Confidence: 95% (HIGH - Performance)
|
|
||
| Pass FILES_CHANGED to all quality gate agents. | ||
|
|
||
| ## Phase 5: Quality Gates |
There was a problem hiding this comment.
IMPLEMENT pipeline missing agent budget cap ⚠
The debug-orchestration skill correctly caps Explore agents at 8 total (your commit 15849ce). The implementation-orchestration pipeline lacks an equivalent cap.
In worst case (both Validator and Shepherd exhaust retries), the pipeline spawns:
- Coder → Validator (retry 2x) → Coder-fix → Validator → Simplifier → Scrutinizer → Validator (re-validate) → Shepherd (retry 2x) = up to 12 agent invocations per ORCHESTRATED prompt
Impact: 300K-600K tokens, 5-15 minutes per prompt. This is the most expensive path in the system.
Suggested fix: Add explicit agent budget to Phase 5:
## Agent Budget
Hard cap: **8 total agent spawns** across all phases.
| Phase | Allocation |
|-------|-----------|
| Phase 3 (Coder) | 1 (+ up to 1 retry = 2 max) |
| Phase 5 (Quality Gates) | Up to 5 (Validator + Simplifier + Scrutinizer + re-Validate + Shepherd) |
| Phase 5 (Retries) | Retries consume from remaining budget |
If budget exhausted, halt and report what passed.Confidence: 90% (HIGH - Performance)
| "orchestration", | ||
| "agents" | ||
| ], | ||
| "agents": [ |
There was a problem hiding this comment.
Missing Explore agent in plugin manifest ❌
The agents array lists: coder, validator, simplifier, scrutinizer, shepherd, skimmer, reviewer
But the orchestration skills explicitly reference Explore agents:
debug-orchestration/SKILL.md:34-- "Spawn one Explore agent per hypothesis"debug-orchestration/SKILL.md:46-- "Spawn 2-3 Explore agents"plan-orchestration/SKILL.md:33-- "spawn 2-3 Explore agents"
These are ephemeral Task subagents, not declared shared agent definitions. However, the architectural contract is unclear. Either:
- Add an Explore agent definition to
shared/agents/and include "explore" in the agents array, OR - Document the ephemeral subagent pattern in the orchestration skills clarifying that "Explore" means Task(subagent_type="Explore"), not a declared agent
Option 1 is stronger (single source of truth across all commands using Explore agents).
Confidence: 85% (HIGH - Architecture)
Code Review Comments — PR #1491. TDD Enforcement Gap (HIGH — Consistency + Regression)Files: The GUIDED-depth IMPLEMENT row omits
Root cause: TDD was removed from the matrix but documentation and conditionals remain. Fix: Add Also add to skill-catalog.md IMPLEMENT row. 2. Hook Preamble 3x Oversized (HIGH — Performance)File: The PREAMBLE injected on every ambient prompt grew from ~40 words to ~150 words. This contradicts the "QUICK: zero overhead" promise:
Fix: Trim to minimal trigger (~25 words): PREAMBLE="AMBIENT MODE ACTIVE: Classify this prompt using the ambient-router skill in your session context. Default to QUICK. Only state classification for GUIDED/ORCHESTRATED."3. IMPLEMENT Pipeline Missing Agent Budget Cap (HIGH — Performance)File: The DEBUG pipeline has a hard cap (8 agents), but IMPLEMENT has none. Worst case: 12 agent invocations (Coder + Validator with retries + Simplifier + Scrutinizer + Shepherd with retries). Fix: Add budget section: ## Agent Budget
Hard cap: 8 total agent spawns across all phases.
| Phase | Allocation |
|-------|-----------|
| Phase 3 (Coder) | 1 (+ up to 1 retry = 2 max) |
| Phase 5 (Quality Gates) | Up to 5 (Validator + Simplifier + Scrutinizer + Shepherd) |
If budget is exhausted, halt and report results.4. Missing git Agent in Ambient Plugin (HIGH — Security + Architecture)Files: The Coder performs git operations (branch setup, commit, push) but the git agent is missing. The Fix: Add 5. Missing Explore Agent in Plugin Manifest (HIGH — Architecture)Files: Both Fix: Add 6. Plugin Description Drifts Across Locations (HIGH — Consistency)Files: Multiple locations Inconsistent descriptions found:
Fix:
7. Pipeline Logic Duplication (HIGH — Architecture)Files: The Recommendation: Document the intentional delta explicitly at the top of ## Relationship to /implement Command
This is the lightweight ambient variant. Excluded from ambient ORCHESTRATED:
- Git agent (branch management)
- Skimmer + Explore codebase orientation
- Parallel Plan agents
- Sequential/parallel Coder strategies
- PR creation
- Knowledge persistence
For full lifecycle, use `/implement` directly.Lower-Confidence Issues (60-79% — Summary Comment Only)The following issues are captured in the summary comment below due to lower confidence or architectural nuance. Consider addressing in follow-ups:
Review Status: Approve after addressing the 7 HIGH issues above. All are straightforward fixes with clear suggested implementations. |
| @@ -1,18 +1,21 @@ | |||
| # devflow-ambient | |||
There was a problem hiding this comment.
⚠ Plugin README is entirely stale -- shows removed model
This README still documents the old architecture:
- Line 3: "AUTO-LOADS RELEVANT SKILLS" -- updated to "agent orchestration" elsewhere
- Lines 13-20: Examples show old depth tiers (GUIDED, ELEVATE) that no longer exist
- Lines 41-45: "Depth Tiers" table has stale ELEVATE tier
- Line 45: ELEVATE description "workflow nudge" -- this tier was replaced with ORCHESTRATED agent pipelines
The README was updated in the first commit but describes a model that doesn't match the actual code. Users reading this will see:
- "BUILD/GUIDED" when the code implements "IMPLEMENT/GUIDED"
- "ELEVATE" tier when it's now "ORCHESTRATED"
- "GUIDED (2-3 skills)" when GUIDED now means main session + Simplifier
- "ELEVATE (workflow nudge)" when ORCHESTRATED means full agent pipelines
This is the primary user-facing documentation for ambient mode.
Fix: Complete rewrite to match the new three-tier model (QUICK/GUIDED/ORCHESTRATED) with examples showing new behavior.
Confidence: 100% (HIGH - Documentation)
Medium-Priority Issues & Architectural NotesArchitecture & Design
Performance Optimization (Non-blocking)
Test Coverage Gaps
Documentation & Consistency (Non-blocking)
Summary: What's Working Well✅ BUILD → IMPLEMENT, ELEVATE → ORCHESTRATED terminology thoroughly applied RecommendationConditional Approve: Merge after fixing the 7 HIGH issues (TDD, preamble bloat, IMPLEMENT budget, git agent, Explore manifest, descriptions, init.ts/marketplace). The 10 MEDIUM issues (cross-skill coupling, Plan ambiguity, fast-path, caching, router density, duplication, Task tool, agent references, EXPLORE downgrade, README change) are valuable improvements but not blocking if documented in CHANGELOG. |
Code Review SummaryPR: #149 | Branch: feat/ambient-orchestration → main Overall AssessmentThis PR implements ambient mode agent orchestration with three well-structured new orchestration skills and a revised classification system (QUICK/GUIDED/ORCHESTRATED). The architecture is sound with clear phased pipelines and defined error handling. However, 5 blocking issues (≥80% confidence) require attention before merge: Blocking Issues (≥80% Confidence)HIGH - Consistency/Regression Issues:
HIGH - Architecture/Performance Issues:
HIGH - Documentation Issues:
Lower-Confidence Findings (60-79%)Suggested improvements for follow-up (not blocking):
What's Going Well
Files with Inline CommentsSee discussion threads for specific line-by-line feedback:
Next Steps: Address the 5 blocking issues above, then this PR is ready to merge. Estimated effort: ~2-3 hours. Review completed using github-patterns skill with rate-limiting awareness (1-2s between API calls, 60s wait when approaching rate limit). All findings deduplicated across 9 reviewer reports. |
Code Review Summary — PR #149Branch: feat/ambient-orchestration → main VerdictStatus: CHANGES REQUESTED7 HIGH blocking issues require fixes before merge:
10 MEDIUM architectural issues recommended for same PR or follow-up:
Scores by Dimension
Overall: 6.4/10 — Well-designed feature with significant implementation gaps Key Strengths✅ Scope-based classification (≤2 vs >2 files) is practical and clear What Must Be Fixed
Effort Estimate
Total: ~65 minutes Next Steps
Generated by: Claude Code Review Agents |
B1: Restore TDD to GUIDED/IMPLEMENT skill selection (was missing) B2: Trim hook preamble to ~30 words + add git keyword fast-path B3: Remove hard-capped 8-agent budget from debug-orchestration S1: Document pipeline delta (vs /implement, /debug) in orchestration skills S3: Add ORCHESTRATED classification integration test S4: Update stale marketplace.json + init.ts descriptions S5: Fix zero-assertion tests (remove if-guard, add expect) S6: Change debug Phase 5 from implementation-orchestration to GUIDED fix S7: Standardize agent references to Task(subagent_type="X") format S8: Extract CLASSIFICATION_PATTERN constant in test helpers S9: Document behavioral changes in CHANGELOG S10: Full rewrite of ambient plugin README (hook-based, no /ambient) Also removes redundant /ambient slash command — ambient mode is hook-only.
Summary
implementation-orchestration,debug-orchestration,plan-orchestration/ambientslash command — ambient mode is hook-only viadevflow ambient --enableWhat Changed
New skills (3):
implementation-orchestration— Lightweight/implementpipeline: pre-flight → Coder → Validator → Simplifier → Scrutinizer → Shepherddebug-orchestration— Lightweight/debugpipeline: hypotheses → parallel Explores → convergence → report → offer fix (GUIDED, not orchestrated)plan-orchestration— Lightweight Plan pipeline: Skimmer → Explores → Plan agent → gap validationModified:
ambient-routerSKILL.md — Three-tier depth classification, ORCHESTRATED skill selection, Step 5 agent orchestrationambient-prompthook — Trimmed preamble, git keyword fast-pathplugin.json— Added orchestration skills + agents to manifestskill-catalog.md— Full rewrite with GUIDED/ORCHESTRATED depth columnsRemoved:
/ambientcommand (ambient.md) — hook handles everythingBehavioral Changes
Test Plan
npm run build— 35 skills, 17 plugins, all assets distributednpm test— 250/250 tests passnpm run test:integration— Integration tests (requires claude CLI + ambient enabled)devflow ambient --statusworks