Skip to content

feat: create agent skill for nemoclaw CLI usage and multi-step workflows #83

@johntmyers

Description

@johntmyers

Problem Statement

LLMs do not know how to use the nemoclaw CLI by default. To enable agents to operate NemoClaw effectively — from basic provider management and sandbox creation through complex iterative policy refinement and BYOC workflows — we need a dedicated agent skill that teaches the CLI's command structure, guides multi-step workflows, and provides a fallback self-teaching mechanism via --help.

Additionally, the existing generate-sandbox-policy skill has drifted from the actual policy schema (critical: allowed_routing_hints vs allowed_routes field name mismatch that would cause agent-generated policies to fail parsing). Since the new CLI skill will delegate policy content authoring to generate-sandbox-policy, that skill must be fixed as part of this work.

Technical Context

The nemoclaw CLI (crates/navigator-cli) is a comprehensive tool with 30+ commands across 7 command groups (cluster, sandbox, provider, inference, policy, image, forward). The codebase already has 12 agent skills in .agents/skills/ following a well-established pattern: YAML frontmatter, numbered workflow steps, command reference tables, and concrete example scenarios. The new skill needs to cover basic CRUD operations, but more importantly, it must guide agents through multi-step workflows that involve coordinating across command groups (e.g., create providers → create sandbox with policy → monitor logs → pull policy → edit → push → verify reload).

The generate-sandbox-policy skill is a companion to the new CLI skill — the CLI skill handles command orchestration while generate-sandbox-policy handles policy content authoring. The CLI skill should cross-reference it at handoff points rather than duplicating policy semantics.

Affected Components

Component Key Files Role
New CLI skill .agents/skills/nemoclaw-cli/SKILL.md (to create) CLI workflow guidance for agents
Policy generation skill .agents/skills/generate-sandbox-policy/SKILL.md, examples.md Companion skill — needs schema drift fixes
Architecture docs architecture/security-policy.md Has same allowed_routing_hints error — fix alongside
Navigator CLI crates/navigator-cli/src/main.rs, run.rs, ssh.rs CLI command definitions (reference for skill content)
Policy system crates/navigator-policy/src/lib.rs, dev-sandbox-policy.yaml Authoritative schema (source of truth for fixes)

Proposed Approach

Part 1: Fix generate-sandbox-policy schema drift

  • Fix allowed_routing_hintsallowed_routes in SKILL.md, examples.md, and architecture/security-policy.md
  • Reconcile default read_only paths (decide whether skill should match dev-sandbox-policy.yaml or keep tighter defaults with a note)

Part 2: Create nemoclaw-cli agent skill

  • Follow the multi-step workflow archetype (like build-from-issue and generate-sandbox-policy)
  • Organize into tiered workflows: basic operations → intermediate → advanced multi-step
  • Cross-reference generate-sandbox-policy at policy content handoff points
  • Include --help fallback mechanism for self-teaching
  • Consider supplementary reference file for full command tree

Key workflows to document:

  1. Provider management (CRUD, auto-creation from local credentials)
  2. Sandbox lifecycle (create, connect, sync, logs, delete)
  3. Policy iteration loop (logs → pull policy → delegate to generate-sandbox-policy → push → verify reload)
  4. BYOC pipeline (build image → push → create sandbox → port forward)
  5. Agent-assisted sandbox session (parallel monitoring + policy refinement)

Scope Assessment

  • Complexity: Medium — substantial content but follows established patterns; mostly documentation with a critical bug fix
  • Confidence: High — 12 existing skills provide a proven template, CLI structure is well-defined, schema drift is clearly identified
  • Estimated files to change: 4-6 (new SKILL.md + optional reference.md, fix SKILL.md + examples.md in generate-sandbox-policy, fix security-policy.md)
  • Issue type: feat

Risks & Open Questions

  • Skill length vs. context window: A comprehensive skill covering all workflows could be 600-800+ lines. Consider splitting into core SKILL.md with supplementary reference files.
  • Trigger keyword collision: Need keywords that activate for CLI usage without colliding with generate-sandbox-policy (which triggers on policy generation/authoring).
  • Provider auto-creation shortcut: Should the skill teach ncl sandbox create -- claude (auto-creates providers) or always use explicit provider management? Probably both, with the shortcut as the quick-start path.
  • Dynamic vs static policy fields: Agents must understand which policy fields can be hot-reloaded (network_policies, inference) and which require sandbox recreation (filesystem_policy, landlock, process). Getting this wrong leads to confusing errors.
  • read_only defaults decision: Should the generate-sandbox-policy skill match the actual reference file (/proc + /var/log) or keep its tighter defaults (/proc/self, no /var/log) with an explicit note?

Test Considerations

  • Manual validation: have an agent use both skills to execute each documented workflow end-to-end
  • Verify the --help fallback works for commands deliberately omitted from the skill
  • Test that trigger keywords activate the correct skill and don't conflict
  • Validate the policy iteration loop end-to-end: create sandbox → observe logs → pull policy → modify (via generate-sandbox-policy) → push → verify reload
  • Verify the allowed_routes fix by generating a policy with inference config and confirming it parses

Created by spike investigation. Use build-from-issue to plan and implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:cliCLI-related workarea:policyPolicy engine and policy lifecycle workarea:sandboxSandbox runtime and isolation workspikestate:agent-readyApproved for agent implementationstate:pr-openedPR has been opened for this issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions