diff --git a/docs/design/agent-skills/agent-skills-spike.md b/docs/design/agent-skills/agent-skills-spike.md new file mode 100644 index 000000000..abaf4f879 --- /dev/null +++ b/docs/design/agent-skills/agent-skills-spike.md @@ -0,0 +1,409 @@ +# Spike for Agent Skills Support + +## Overview + +**The problem**: Lightspeed Core has no mechanism for extending agent capabilities with specialized instructions or domain knowledge. Users cannot package reusable workflows, troubleshooting guides, or domain expertise into portable, discoverable units that the LLM can use on demand. + +**The recommendation**: Implement the [Agent Skills open standard](https://agentskills.io) with filesystem-based discovery. Config specifies paths to skill directories; skill metadata (name, description) is read from `SKILL.md` frontmatter at startup. The LLM discovers available skills via a `list_skills` tool and loads full instructions on demand via an `activate_skill` tool. The system prompt contains behavioral instructions for how to use these tools, not the skill catalog itself. + +**PoC validation**: Not applicable for this spike. The feature is well-defined by the agentskills.io specification and has been implemented by 30+ agent products including Claude Code, GitHub Copilot, Cursor, and OpenAI Codex. + +## Decisions for @sbunciak and @ptisnovs + +These are the high-level decisions that determine scope, approach, and cost. Each has a recommendation confirmed during spike research. + +### Decision 1: Which skill types to support? + +| Option | Description | +|--------|-------------| +| A | Built-in skills only (Lightspeed Core developers ship pre-defined skills) | +| B | Product team-defined only (LS app teams like RHEL Lightspeed define their own skills) | +| C | Both built-in and product team-defined | + +**Recommendation**: **B** (Product team-defined only). This allows LS app teams (e.g., RHEL Lightspeed, Ansible Lightspeed) to extend Lightspeed with domain-specific skills without requiring changes to Lightspeed Core. Product teams ship skills alongside the lightspeed-stack container by mounting skill directories via ConfigMaps or container volumes, then specifying the paths in `lightspeed-stack.yaml`. Skill content is read from `SKILL.md` files at startup. Built-in skills can be added to Lightspeed Core later if common patterns emerge. + +**Note**: End users of LS app products do NOT have the ability to add skills, similar to how they cannot add MCP servers. Skill configuration is controlled by product teams at deployment time. + +### Decision 2: Discovery mechanism? + +| Option | Description | +|--------|-------------| +| A | Filesystem-based (config specifies paths, skill metadata read from `SKILL.md` files) | +| B | Config-based (full skill definitions inlined in `lightspeed-stack.yaml`) | +| C | API-based (skills registered/managed via REST API) | +| D | Hybrid (built-in via config, product team-defined via filesystem or API) | + +**Recommendation**: **A** (Filesystem-based). Config specifies paths to skill directories (or a single directory containing multiple skills). Skill metadata (name, description) is read from `SKILL.md` frontmatter at startup. This keeps config lightweight, avoids bloating the config CR in k8s deployments, and allows skill content to be updated independently of config changes. Skills can be mounted via ConfigMaps, volumes, or any standard filesystem mechanism. + +### Decision 3: Script execution support? + +| Option | Description | +|--------|-------------| +| A | No scripts (only `SKILL.md` instructions) | +| B | Scripts allowed (full spec compliance) | +| C | Deferred (start with no scripts, add later) | + +**Recommendation**: **C** (Deferred). As noted in LCORE-1339, there are security concerns with executing arbitrary scripts. Script support will not be implemented until sandbox support (running scripts in an isolated environment) is added. The core value of skills is in the instructions — scripts can be added in a future phase once sandboxing is available. + +## Technical decisions for @ptisnovs + +Architecture-level and implementation-level decisions. + +### Decision 4: Skill content source + +| Option | Description | +|--------|-------------| +| A | Path-based (config points to directory containing `SKILL.md`) | +| B | Inline (instructions embedded directly in YAML) | +| C | Both (support either path or inline content) | + +**Recommendation**: **A** (Path-based). This follows the agentskills.io spec directory structure, keeps the YAML config clean, and allows skills to include `references/` subdirectories for additional documentation that can be loaded on demand. + +### Decision 5: Activation mechanism + +| Option | Description | +|--------|-------------| +| A | System prompt catalog (skill catalog embedded in system prompt, LLM decides) | +| B | Tool-based discovery (`list_skills` tool returns catalog, `activate_skill` loads instructions) | +| C | Per-request parameter (client specifies which skills to activate) | +| D | Hybrid (catalog in system prompt if < N skills, tool-based discovery otherwise) | + +**Recommendation**: **B** (Tool-based discovery). This approach separates behavioral instructions from skill inventory: + +1. **System prompt** contains behavioral instructions only: + - How to discover skills (`list_skills`) + - How to activate skills (`activate_skill`) + - Requirement to load full instructions before proceeding + +2. **`list_skills` tool** returns the skill catalog (name + description for each skill) + +3. **`activate_skill` tool** loads full `SKILL.md` instructions when the LLM decides a skill is relevant + +**Rationale**: This pattern follows Google ADK's proven approach and provides a clean evolution path: + +| Phase | `list_skills` behavior | Scales to | +|-------|------------------------|-----------| +| Phase 1 (initial) | Returns full catalog | ~20 skills | +| Phase 2 (future) | Accepts optional `query` param for keyword/semantic search | 100+ skills | + +The phase 2 extension requires only a tool implementation change, not an architectural change. The system prompt instructions and `activate_skill` tool remain unchanged. + +**Alternative considered**: Option A (system prompt catalog) was considered but rejected because: +- Token cost grows linearly with skill count (~50-100 tokens/skill) +- Risk of overwhelming the model context with large skill catalogs +- No clean evolution path to search-based discovery + +Option D (hybrid) remains viable for deployments with predictable skill counts, but adds complexity. Teams can revisit if phase 2 search proves unreliable for small catalogs. + +### Decision 6: Skill context management + +| Option | Description | +|--------|-------------| +| A | Always loaded (all skills' full instructions in every request) | +| B | Progressive disclosure (catalog via tool, full content loaded when LLM requests) | + +**Recommendation**: **B** (Progressive disclosure). This follows the agentskills.io pattern: +1. **Catalog** (~50-100 tokens/skill) - name + description returned by `list_skills` tool +2. **Instructions** (<5000 tokens) - full `SKILL.md` body loaded via `activate_skill` tool when needed +3. **Resources** (on-demand) - `references/` files loaded via file-read tool when referenced + +This keeps the base context small while giving the LLM access to specialized knowledge on demand. + +### Decision 7: Configuration structure + +Skills are configured by specifying paths to skill directories in `lightspeed-stack.yaml`. Two forms are supported: + +**Option A: Directory of skills** (recommended for most deployments) +```yaml +skills: + paths: + - "/var/skills/" # Directory containing skill subdirectories +``` + +**Option B: Individual skill paths** (for fine-grained control) +```yaml +skills: + paths: + - "/var/skills/code-review/" + - "/var/skills/troubleshooting/" +``` + +Each path points to either: +- A directory containing a `SKILL.md` file (single skill) +- A directory containing subdirectories, each with a `SKILL.md` file (multiple skills) + +Skill metadata (`name`, `description`) is read from the `SKILL.md` frontmatter at startup. This keeps config minimal and allows skill content to be managed independently. + +**Recommendation**: Approved. This structure keeps the config CR lightweight and decouples skill content from configuration. + +## Proposed JIRAs + +Each JIRA includes an agentic tool instruction pointing to the spec doc. + + + +### LCORE-???? Add skill configuration model + +**Description**: Add the `SkillsConfiguration` Pydantic model to the main configuration. This enables specifying skill directory paths in `lightspeed-stack.yaml`. + +**Scope**: + +- Create `SkillsConfiguration` class in `src/models/config.py` with `paths: list[str]` field +- Add `skills: Optional[SkillsConfiguration]` field to `Configuration` class +- Implement startup scanning: resolve paths, find `SKILL.md` files, parse frontmatter +- Add validation: paths exist, contain valid `SKILL.md` files with required frontmatter +- Store parsed skill metadata (name, description, path) for runtime use + +**Acceptance criteria**: + +- Skill paths can be configured in `lightspeed-stack.yaml` using the documented format +- Startup scans configured paths and discovers all valid skills +- Startup fails with clear error if path doesn't exist or lacks valid `SKILL.md` +- Duplicate skill names across paths are detected and rejected +- Unit tests cover path scanning and validation scenarios + +**Agentic tool instruction**: + +```text +Read the "Configuration" section in docs/design/agent-skills/agent-skills.md. +Key files: src/models/config.py (around line 1817, Configuration class). +Follow the MCP server config pattern (ModelContextProtocolServer class, line 468). +``` + + + +### LCORE-???? Implement list_skills tool + +**Description**: Register a `list_skills` tool that the LLM can call to discover available skills. Returns the skill catalog (name + description for each configured skill). + +**Scope**: + +- Create `src/utils/skills.py` module with skill catalog management +- Add `list_skills` tool registration in `prepare_tools()` in `src/utils/responses.py` +- Implement tool handler that returns formatted skill catalog (name + description) +- Modify `get_system_prompt()` in `src/utils/prompts.py` to add behavioral instructions for skill discovery and activation + +**Acceptance criteria**: + +- LLM can call `list_skills()` to get the catalog of available skills +- Tool returns name and description for each configured skill +- Tool returns empty list when no skills are configured +- System prompt includes behavioral instructions (how to use `list_skills` and `activate_skill`) +- Unit tests verify tool registration, catalog formatting, and system prompt instructions + +**Agentic tool instruction**: + +```text +Read the "Tool-based discovery" section in docs/design/agent-skills/agent-skills.md. +Key files: src/utils/responses.py (prepare_tools function, line 204), +src/utils/prompts.py (get_system_prompt function), +src/utils/skills.py (new module). +``` + + + +### LCORE-???? Implement activate_skill tool + +**Description**: Register an `activate_skill` tool that the LLM can call to load full skill instructions when it decides a skill is relevant. This complements `list_skills` by providing the detailed instructions for a specific skill. + +**Scope**: + +- Add `activate_skill` tool registration in `prepare_tools()` in `src/utils/responses.py` +- Implement tool handler that reads `SKILL.md` body content +- Return structured response with skill content and base directory path +- Optionally list `references/` files if present + +**Acceptance criteria**: + +- LLM can call `activate_skill(name="skill-name")` to load skill instructions +- Tool returns full `SKILL.md` body content (after frontmatter) +- Tool returns skill directory path so LLM can resolve relative references +- Tool returns list of available reference files if `references/` directory exists +- Tool returns error if skill name is invalid or not found in catalog +- Unit tests cover tool registration and invocation + +**Agentic tool instruction**: + +```text +Read the "Skill activation tool" section in docs/design/agent-skills/agent-skills.md. +Key files: src/utils/responses.py (prepare_tools function, line 204), +src/utils/skills.py. +``` + + + +### LCORE-???? Add skill reference file access + +**Description**: Enable the LLM to read files from the skill's `references/` subdirectory when skill instructions reference them. + +**Scope**: + +- Ensure existing file-read tool can access skill reference directories +- Add path validation to restrict access to configured skill directories only +- Document the pattern for skill authors to reference files + +**Acceptance criteria**: + +- LLM can read files from `/references/` using existing file-read capability +- Access is restricted to configured skill directories (no arbitrary filesystem access) +- Skill instructions can use relative paths like `references/troubleshooting-guide.md` +- Integration test verifies reference file access works end-to-end + +**Agentic tool instruction**: + +```text +Read the "Reference file access" section in docs/design/agent-skills/agent-skills.md. +Key files: src/utils/responses.py, existing file-read tool implementation. +``` + + + +### LCORE-???? Document Agent Skills feature + +**Description**: Create user-facing documentation for the Agent Skills feature including configuration guide, skill authoring guide, and examples. + +**Scope**: + +- Add configuration documentation to existing config docs +- Create skill authoring guide (SKILL.md format, directory structure) +- Add example skills to `examples/skills/` +- Update README with feature overview + +**Acceptance criteria**: + +- Users can configure skills by following the documentation +- Skill authors can create compliant skills using the authoring guide +- Example skills demonstrate common use cases (troubleshooting, code review, etc.) + +**Agentic tool instruction**: + +```text +Read the full spec doc at docs/design/agent-skills/agent-skills.md. +Reference the agentskills.io specification for SKILL.md format details. +``` + + + +### LCORE-???? Add integration and E2E tests for skills + +**Description**: Add integration tests and E2E tests to verify the skills feature works correctly end-to-end. + +**Scope**: + +- Integration tests: skill loading, catalog injection, tool invocation with mocked LLM +- E2E tests: full flow with real LLM, verify skill activation and usage + +**Acceptance criteria**: + +- Integration tests cover skill configuration, catalog generation, and tool handling +- E2E tests verify a configured skill can be discovered and used by the LLM +- Tests use example skills from `examples/skills/` + +**Agentic tool instruction**: + +```text +Read the "Testing" section in docs/design/agent-skills/agent-skills.md. +Key test files: tests/integration/endpoints/, tests/e2e/features/. +Follow existing test patterns in the codebase. +``` + +## Background sections + +### Agent Skills specification + +The [Agent Skills open standard](https://agentskills.io) defines a portable format for giving AI agents specialized capabilities. Key elements: + +**Directory structure**: +``` +skill-name/ +├── SKILL.md # Required: metadata + instructions +├── references/ # Optional: additional documentation +└── assets/ # Optional: templates, resources +``` + +**SKILL.md format**: +```markdown +--- +name: skill-name +description: What this skill does and when to use it. +--- + +# Instructions + +Step-by-step instructions for the agent... +``` + +**Frontmatter fields**: +| Field | Required | Description | +|-------|----------|-------------| +| `name` | Yes | 1-64 chars, lowercase, hyphens only | +| `description` | Yes | Max 1024 chars, describes what and when | +| `license` | No | License name or file reference | +| `compatibility` | No | Environment requirements | +| `metadata` | No | Arbitrary key-value pairs | + +### Progressive disclosure + +The agentskills.io spec recommends a three-tier loading strategy: + +| Tier | What's loaded | When | Token cost | +|------|---------------|------|------------| +| 1. Catalog | Name + description | LLM calls `list_skills` | ~50-100 tokens/skill | +| 2. Instructions | Full SKILL.md body | LLM calls `activate_skill` | <5000 tokens (recommended) | +| 3. Resources | References, assets | LLM reads reference files | Varies | + +This keeps base context small while giving the LLM access to specialized knowledge on demand. The system prompt contains only behavioral instructions (~200 tokens) regardless of skill count. + +### Adoption + +Agent Skills are supported by 30+ agent products including: +- Claude Code, Claude.ai +- GitHub Copilot, VS Code +- Cursor, OpenAI Codex +- Gemini CLI, JetBrains Junie +- Goose, Letta, Spring AI + +OpenAI's SDK already includes `LocalSkill` and `Skill` types in its responses module. + +### Security considerations + +**Scripts deferred**: The `scripts/` subdirectory is not supported in this implementation. As noted in LCORE-1339, executing arbitrary scripts poses security risks. Script support will be added in a future phase once sandbox support (running scripts in an isolated environment) is available. + +**Path restrictions**: The `activate_skill` tool and reference file access are restricted to configured skill directories. The LLM cannot access arbitrary filesystem paths through skills. + +**Trust model**: Skills are configured by LS app teams (e.g., RHEL Lightspeed) at deployment time, not by end users. Product teams mount skill directories into the container via ConfigMaps or volumes and specify the paths in `lightspeed-stack.yaml`. This mirrors the MCP server trust model — end users cannot add arbitrary skills, only use the skills their product team has deployed. + +## Appendix A: Existing approaches research + +### How other tools handle skills + +| Tool | Discovery | Activation | Script support | +|------|-----------|------------|----------------| +| Claude Code | Filesystem scan | File-read or /command | Yes | +| GitHub Copilot | Filesystem scan | System prompt + tool | Yes | +| Cursor | Filesystem scan | System prompt + tool | Yes | +| OpenAI Codex | API-based | API-based | Yes | +| Google ADK | `list_skills` tool | `load_skill` + `load_skill_resource` tools | Yes (`run_skill_script`) | + +**Note**: Google ADK's approach aligns with our recommendation. Their system prompt contains behavioral instructions (how to use the tools), not the skill catalog itself. The `list_skills` tool returns the catalog on demand. + +### Alternative designs considered + +**Full config-based definitions**: Rejected in favor of filesystem-based with config paths. Inlining full skill definitions (name, description, instructions) in `lightspeed-stack.yaml` would bloat the config CR in k8s deployments and couple skill content updates to config changes. Instead, config specifies paths and skill metadata is read from `SKILL.md` files. + +**Inline content**: Rejected in favor of path-based. Inline content would clutter the YAML config for multi-skill deployments and doesn't support reference files. + +**Always-loaded instructions**: Rejected in favor of progressive disclosure. Loading all skill instructions upfront wastes context tokens and doesn't scale to many skills. + +**System prompt catalog injection**: Rejected in favor of tool-based discovery (`list_skills`). Embedding the skill catalog directly in the system prompt: +- Increases token cost linearly with skill count (~50-100 tokens/skill) +- Risks overwhelming the model context with large skill catalogs +- Provides no clean evolution path to search-based discovery +The tool-based approach keeps the system prompt constant (behavioral instructions only) and allows phase 2 extension to semantic search without architectural changes. + +## Appendix B: Reference sources + +- Agent Skills Specification: https://agentskills.io/specification +- Agent Skills Implementation Guide: https://agentskills.io/client-implementation/adding-skills-support +- Agent Skills GitHub: https://github.com/agentskills/agentskills +- Example Skills: https://github.com/anthropics/skills diff --git a/docs/design/agent-skills/agent-skills.md b/docs/design/agent-skills/agent-skills.md new file mode 100644 index 000000000..56d5cf287 --- /dev/null +++ b/docs/design/agent-skills/agent-skills.md @@ -0,0 +1,701 @@ +# Feature design for Agent Skills Support + +| | | +|--------------------|-------------------------------------------| +| **Date** | 2026-04-09 | +| **Component** | Core / Configuration / Utils | +| **Authors** | @jboos | +| **Feature** | [LCORE-1339](https://redhat.atlassian.net/browse/LCORE-1339) | +| **Spike** | [LCORE-1594](https://redhat.atlassian.net/browse/LCORE-1594) | +| **Links** | [agentskills.io](https://agentskills.io) | + +## What + +Agent Skills support allows LS app teams (e.g., RHEL Lightspeed, Ansible Lightspeed) to extend Lightspeed Core with specialized instructions and domain knowledge packaged as portable skill directories. Skills follow the [Agent Skills open standard](https://agentskills.io). Product teams ship skills alongside the lightspeed-stack container by mounting skill directories via ConfigMaps or container volumes, then specifying the paths in `lightspeed-stack.yaml`. Skill metadata (name, description) is read from `SKILL.md` frontmatter at startup. + +**Note**: End users of LS app products do NOT have the ability to add skills, similar to how they cannot add MCP servers. Skill configuration is controlled by product teams at deployment time. + +The LLM discovers available skills via a `list_skills` tool and loads full instructions on demand via an `activate_skill` tool. The system prompt contains behavioral instructions for how to use these tools, not the skill catalog itself. + +## Why + +Today, Lightspeed Core has limited customization options: +- System prompt can be overridden globally or per-request +- MCP tools provide external capabilities + +However, there's no mechanism for: +- Packaging reusable workflows or troubleshooting guides +- Providing domain-specific expertise the LLM can use on demand +- Sharing knowledge across deployments in a portable format + +Skills solve this by giving the LLM access to procedural knowledge and domain-specific context it can load when relevant to the current task. + +## Requirements + +- **R1:** Skill paths are specified in `lightspeed-stack.yaml`; name and description are read from `SKILL.md` frontmatter +- **R2:** Each skill path must point to a directory containing a valid `SKILL.md` file +- **R3:** The system prompt contains behavioral instructions for skill discovery and activation +- **R4:** The LLM can discover skills via `list_skills` tool and load full instructions via `activate_skill` tool +- **R5:** Skill content is returned with structured wrapping (`` tags) per agentskills.io spec +- **R6:** The LLM can read files from a skill's `references/` subdirectory (allowlisted paths) +- **R7:** Script execution (`scripts/` subdirectory) is not supported +- **R8:** Skill configuration is validated at startup with clear error messages +- **R9:** Activated skills are tracked per conversation to prevent duplicate injection +- **R10:** Skill content is protected from context compaction + +## Use Cases + +- **U1:** As an LS app team administrator, I want to configure troubleshooting skills so that the LLM can help users diagnose common issues +- **U2:** As a skill author, I want to create a SKILL.md file with instructions so that I can package domain expertise portably +- **U3:** As a user, I want the LLM to automatically use relevant skills so that I get better answers without manually specifying which skill to use +- **U4:** As an LS app team, I want to deploy domain-specific skills so that the LLM understands product-specific processes and terminology + +## Architecture + +### Overview + +```text +Startup: + lightspeed-stack.yaml + │ + ▼ + ┌─────────────────┐ + │ Parse skills │──► Validate paths, read SKILL.md frontmatter + │ configuration │ + └─────────────────┘ + │ + ▼ + ┌─────────────────┐ + │ Build skill │──► name + description for each skill + │ catalog │ + └─────────────────┘ + +Request flow: + ┌─────────────────┐ + │ Query request │ + └────────┬────────┘ + │ + ▼ + ┌─────────────────┐ + │ Build system │──► Append behavioral instructions (how to use skill tools) + │ prompt │ + └────────┬────────┘ + │ + ▼ + ┌─────────────────┐ + │ Register tools │──► Add list_skills + activate_skill tools + └────────┬────────┘ + │ + ▼ + ┌─────────────────┐ + │ LLM processes │──► May call list_skills to discover available skills + │ request │ + └────────┬────────┘ + │ + ▼ (if skill relevant) + ┌─────────────────┐ + │ activate_skill │──► Returns with body + resources + │ tool invocation │ + └────────┬────────┘ + │ + ▼ + ┌─────────────────┐ + │ LLM uses skill │──► May read reference files if needed + │ instructions │ + └─────────────────┘ +``` + +### Configuration + +Skills are configured by specifying paths in `lightspeed-stack.yaml`. Two forms are supported: + +**Option A: Directory of skills** (recommended for most deployments) +```yaml +skills: + paths: + - "/var/skills/" # Directory containing skill subdirectories +``` + +**Option B: Individual skill paths** (for fine-grained control) +```yaml +skills: + paths: + - "/var/skills/openshift-troubleshooting/" + - "/var/skills/code-review/" +``` + +Each path points to either: +- A directory containing a `SKILL.md` file (single skill) +- A directory containing subdirectories, each with a `SKILL.md` file (multiple skills) + +Skill metadata (`name`, `description`) is read from the `SKILL.md` frontmatter at startup. + +Each skill directory must contain a `SKILL.md` file: + +``` +/var/skills/openshift-troubleshooting/ +├── SKILL.md # Required +└── references/ # Optional + ├── common-errors.md + └── networking-guide.md +``` + +Configuration class: + +```python +class SkillsConfiguration(ConfigurationBase): + """Agent skills configuration. + + Specifies paths to skill directories. Skill metadata (name, description) + is read from SKILL.md frontmatter at startup. + """ + + paths: list[str] = Field( + default_factory=list, + title="Skill paths", + description="Paths to skill directories or directories containing skill subdirectories.", + ) +``` + +Add to `Configuration` class: + +```python +skills: Optional[SkillsConfiguration] = Field( + default=None, + title="Agent skills", + description="Agent skills configuration. Specifies paths to skill directories.", +) +``` + +Runtime skill data (populated at startup after scanning paths): + +```python +@dataclass +class LoadedSkill: + """Skill loaded from filesystem at startup.""" + + name: str + description: str + path: Path + content: str # SKILL.md body after frontmatter + references: list[str] # Files in references/ subdirectory +``` + +Startup scanning logic: + +```python +def load_skills(config: SkillsConfiguration) -> list[LoadedSkill]: + """Scan configured paths and load all valid skills. + + Each path can be: + - A directory containing SKILL.md (single skill) + - A directory containing subdirectories with SKILL.md (multiple skills) + """ + skills = [] + seen_names: set[str] = set() + + for path_str in config.paths: + path = Path(path_str) + if not path.is_dir(): + raise ValueError(f"Skill path does not exist: {path}") + + skill_md = path / "SKILL.md" + if skill_md.is_file(): + # Single skill directory + skill = _load_skill_from_dir(path) + if skill.name in seen_names: + raise ValueError(f"Duplicate skill name: {skill.name}") + seen_names.add(skill.name) + skills.append(skill) + else: + # Directory of skill subdirectories + for subdir in path.iterdir(): + if subdir.is_dir() and (subdir / "SKILL.md").is_file(): + skill = _load_skill_from_dir(subdir) + if skill.name in seen_names: + raise ValueError(f"Duplicate skill name: {skill.name}") + seen_names.add(skill.name) + skills.append(skill) + + return skills +``` + +### System prompt injection + +Behavioral instructions are appended to the system prompt following the [agentskills.io implementation guide](https://agentskills.io/client-implementation/adding-skills-support). The system prompt contains instructions for how to use the skill tools, not the skill catalog itself. + +#### Behavioral instructions + +The system prompt includes instructions telling the model how to discover and use skills: + +``` +# Agent Skills + +You have access to specialized skills that provide domain-specific instructions. + +To discover available skills, call the list_skills tool. This returns the skill +catalog with name and description for each skill. + +When a task matches a skill's description, call the activate_skill tool with +the skill's name to load its full instructions. You MUST load and follow the +skill instructions before proceeding with the task. +``` + +#### Implementation + +```python +SKILL_BEHAVIORAL_INSTRUCTIONS = """ +# Agent Skills + +You have access to specialized skills that provide domain-specific instructions. + +To discover available skills, call the list_skills tool. This returns the skill +catalog with name and description for each skill. + +When a task matches a skill's description, call the activate_skill tool with +the skill's name to load its full instructions. You MUST load and follow the +skill instructions before proceeding with the task. +""" + +def get_skill_instructions(skills: list[LoadedSkill]) -> str: + """Get behavioral instructions for skill tools. + + Returns empty string if no skills are configured. + """ + if not skills: + return "" + return SKILL_BEHAVIORAL_INSTRUCTIONS +``` + +Modify `get_system_prompt()` in `src/utils/prompts.py`: + +```python +def get_system_prompt(system_prompt: Optional[str], loaded_skills: list[LoadedSkill], ...) -> str: + # ... existing logic to resolve base system prompt ... + + # Append skill behavioral instructions if skills are loaded + skill_instructions = get_skill_instructions(loaded_skills) + if skill_instructions: + resolved_prompt = resolved_prompt + "\n" + skill_instructions + + return resolved_prompt +``` + +**Note**: If no skills are configured, omit the instructions entirely. The `list_skills` and `activate_skill` tools are also not registered when no skills are configured. + +### Skill tools + +Two tools are registered for skill discovery and activation. This follows the [tool-based activation pattern](https://agentskills.io/client-implementation/adding-skills-support#dedicated-tool-activation) from agentskills.io and aligns with Google ADK's approach. + +#### list_skills tool + +The `list_skills` tool returns the skill catalog (name + description for each skill): + +```python +def get_list_skills_tool(skills: list[LoadedSkill]) -> Optional[InputTool]: + """Create the list_skills tool if skills are configured. + + Returns the skill catalog so the LLM can discover available skills. + """ + if not skills: + return None + + return InputTool( + type="function", + function={ + "name": "list_skills", + "description": "List available skills with their names and descriptions. Call this to discover what skills are available.", + "parameters": { + "type": "object", + "properties": {}, + }, + }, + ) + + +def handle_list_skills(skills: list[LoadedSkill]) -> str: + """Handle list_skills tool invocation. + + Returns skill catalog in XML format. + """ + if not skills: + return "" + + lines = [""] + for skill in skills: + lines.extend([ + " ", + f" {skill.name}", + f" {skill.description}", + " ", + ]) + lines.append("") + return "\n".join(lines) +``` + +**Phase 2 evolution**: In a future phase, `list_skills` can accept an optional `query` parameter for keyword/semantic search when the skill catalog grows large (100+ skills). + +#### activate_skill tool + +The `activate_skill` tool loads full instructions for a specific skill: + +```python +def get_activate_skill_tool(skills: list[LoadedSkill]) -> Optional[InputTool]: + """Create the activate_skill tool if skills are configured. + + The name parameter is constrained to valid skill names (as an enum) + to prevent the model from hallucinating nonexistent skill names. + """ + if not skills: + return None + + skill_names = [skill.name for skill in skills] + + return InputTool( + type="function", + function={ + "name": "activate_skill", + "description": "Load full instructions for a skill. Call this when a task matches a skill's description.", + "parameters": { + "type": "object", + "properties": { + "name": { + "type": "string", + "enum": skill_names, + "description": "The name of the skill to load", + } + }, + "required": ["name"], + }, + }, + ) +``` + +#### Structured wrapping + +The tool response wraps skill content in identifying tags following the [agentskills.io structured wrapping pattern](https://agentskills.io/client-implementation/adding-skills-support#structured-wrapping). This enables: +- The model to clearly distinguish skill instructions from other conversation content +- The harness to identify skill content during context compaction +- Bundled resources to be surfaced without being eagerly loaded + +```python +def handle_activate_skill(name: str, skills: list[LoadedSkill]) -> str: + """Handle activate_skill tool invocation. + + Returns skill content wrapped in structured tags. + """ + skill = next((s for s in skills if s.name == name), None) + if not skill: + return f"Unknown skill: {name}" + + lines = [ + f'', + skill.content, + "", + f"Skill directory: {skill.path}", + "Relative paths in this skill are relative to the skill directory.", + ] + + # List bundled resources without eagerly loading them + if skill.references: + lines.append("") + lines.append("") + for ref in skill.references: + lines.append(f" {ref}") + lines.append("") + + lines.append("") + return "\n".join(lines) +``` + +#### Example tool response + +```xml + +# OpenShift Troubleshooting + +## When to use this skill +Use this skill when: +- A user reports pods not starting or crashing +- Deployments are stuck in pending state +... + +Skill directory: /var/skills/openshift-troubleshooting +Relative paths in this skill are relative to the skill directory. + + + references/common-errors.md + references/networking-guide.md + + +``` + +### Reference file access + +Skills can include a `references/` subdirectory with additional documentation. The LLM can read these files using existing file-read capabilities when skill instructions reference them. + +**Path restriction**: File reads are restricted to configured skill directories. The skill tool returns the `base_path` so the LLM can construct valid paths like `{base_path}/references/guide.md`. + +#### Permission allowlisting + +Following the [agentskills.io guidance](https://agentskills.io/client-implementation/adding-skills-support#permission-allowlisting), skill directories should be allowlisted for file access so the model can read bundled resources without triggering permission prompts. Without this, every reference to a bundled file results in a permission dialog, breaking the flow. + +```python +def is_path_in_skill_directory(path: str, skills: list[LoadedSkill]) -> bool: + """Check if a path is within a configured skill directory.""" + resolved_path = Path(path).resolve() + for skill in skills: + skill_dir = skill.path.resolve() + try: + resolved_path.relative_to(skill_dir) + return True + except ValueError: + continue + return False +``` + +### Context management + +Once skill instructions are in the conversation context, they must remain effective for the session duration. + +#### Protect skill content from compaction + +If lightspeed-stack implements context compaction (conversation history summarization), skill content must be exempted from pruning. Skill instructions are durable behavioral guidance — losing them mid-conversation silently degrades performance. + +The `` tags from structured wrapping enable identification during compaction: + +```python +def is_skill_content(message: str) -> bool: + """Check if a message contains skill content that should be protected.""" + return "" in message +``` + +#### Deduplicate activations + +Track which skills have been activated in the current conversation. If the model attempts to load a skill already in context, return a note instead of re-injecting: + +```python +class SkillTracker: + """Track activated skills per conversation.""" + + def __init__(self): + self._activated: dict[str, set[str]] = {} # conversation_id -> skill names + + def is_activated(self, conversation_id: str, skill_name: str) -> bool: + return skill_name in self._activated.get(conversation_id, set()) + + def mark_activated(self, conversation_id: str, skill_name: str) -> None: + if conversation_id not in self._activated: + self._activated[conversation_id] = set() + self._activated[conversation_id].add(skill_name) + + def clear(self, conversation_id: str) -> None: + self._activated.pop(conversation_id, None) +``` + +When a skill is already activated: + +```python +if skill_tracker.is_activated(conversation_id, name): + return f"Skill '{name}' is already loaded in this conversation." +``` + +### Error handling + +| Scenario | Error | +|----------|-------| +| Skill path doesn't exist | Startup fails: "Skill path does not exist: {path}" | +| SKILL.md not found | Startup fails: "SKILL.md not found at {path}/SKILL.md" | +| Name mismatch | Startup fails: "Skill name mismatch: config has '{x}' but SKILL.md has '{y}'" | +| Invalid YAML frontmatter | Startup fails: "Invalid SKILL.md frontmatter: {error}" | +| Unknown skill in read_skill | Tool returns: {"error": "Unknown skill: {name}"} | + +## Implementation Suggestions + +### Key files and insertion points + +| File | What to do | +|------|------------| +| `src/models/config.py` | Add `SkillsConfiguration` class and `skills` field to `Configuration` | +| `src/utils/skills.py` | New module: `LoadedSkill`, `load_skills()`, `parse_skill_md()`, `get_skill_instructions()`, `handle_list_skills()`, `handle_activate_skill()` | +| `src/utils/prompts.py` | Modify `get_system_prompt()` to append behavioral instructions | +| `src/utils/responses.py` | Modify `prepare_tools()` to include `list_skills` and `activate_skill` tools | +| `src/constants.py` | Add skill-related constants | + +### Insertion point detail + +**Configuration loading** (`src/models/config.py`): +- Add `SkillsConfiguration` class with `paths: list[str]` field +- Add `skills: Optional[SkillsConfiguration]` to `Configuration` class (around line 1852) +- Path validation happens at startup in `load_skills()` function + +**System prompt injection** (`src/utils/prompts.py`): +- `get_system_prompt()` currently returns the resolved prompt at line 80 +- Insert behavioral instructions append before the return statement +- Import `get_skill_instructions` from new `utils/skills.py` module + +**Tool registration** (`src/utils/responses.py`): +- `prepare_tools()` at line 204 builds the tool list +- Add `list_skills` and `activate_skill` tools after MCP tools (around line 260) +- Tool handlers need to be registered for both `list_skills` and `activate_skill` function types + +### SKILL.md parsing + +```python +import re +import yaml +from pathlib import Path + +def parse_skill_md(content: str) -> tuple[dict, str]: + """Parse SKILL.md into frontmatter dict and body string.""" + # Match YAML frontmatter between --- delimiters + match = re.match(r"^---\s*\n(.*?)\n---\s*\n(.*)$", content, re.DOTALL) + if not match: + raise ValueError("SKILL.md must have YAML frontmatter between --- delimiters") + + frontmatter_text, body = match.groups() + + try: + frontmatter = yaml.safe_load(frontmatter_text) + except yaml.YAMLError as e: + raise ValueError(f"Invalid YAML frontmatter: {e}") from e + + if not isinstance(frontmatter, dict): + raise ValueError("Frontmatter must be a YAML mapping") + + if "name" not in frontmatter: + raise ValueError("SKILL.md frontmatter must include 'name' field") + + if "description" not in frontmatter: + raise ValueError("SKILL.md frontmatter must include 'description' field") + + return frontmatter, body.strip() + + +def list_references(skill_path: Path) -> list[str]: + """List files in the skill's references/ subdirectory.""" + refs_dir = skill_path / "references" + if not refs_dir.is_dir(): + return [] + + return [ + str(f.relative_to(skill_path)) + for f in refs_dir.rglob("*") + if f.is_file() + ] +``` + +### Config pattern + +All config classes extend `ConfigurationBase` which sets `extra="forbid"`. Use `Field()` with defaults, title, and description. Add `@model_validator(mode="after")` for path validation. + +Example config file: `examples/lightspeed-stack-skills.yaml` + +### Test patterns + +- Framework: pytest + pytest-asyncio + pytest-mock +- Config validation tests: `tests/unit/models/config/test_skills_configuration.py` +- Skill loading/parsing tests: `tests/unit/utils/test_skills.py` +- Integration tests: `tests/integration/endpoints/test_query_with_skills.py` + +**Test fixtures**: +```python +@pytest.fixture +def sample_skill_dir(tmp_path): + """Create a sample skill directory for testing.""" + skill_dir = tmp_path / "test-skill" + skill_dir.mkdir() + + skill_md = skill_dir / "SKILL.md" + skill_md.write_text("""--- +name: test-skill +description: A test skill for unit tests. +--- + +# Test Skill Instructions + +These are test instructions. +""") + + refs_dir = skill_dir / "references" + refs_dir.mkdir() + (refs_dir / "guide.md").write_text("# Guide\n\nReference content.") + + return skill_dir +``` + +## Open Questions for Future Work + +- **Script support**: Should `scripts/` subdirectory execution be added in a future phase? Requires security review. +- **Built-in skills**: Should Lightspeed ship pre-defined skills for common use cases? +- **Skill versioning**: Should skills support version metadata for compatibility tracking? +- **Remote skills**: Should skills be loadable from URLs or registries? +- **Skill metrics**: Should skill activation be tracked in Prometheus metrics? + +## Changelog + +| Date | Change | Reason | +|------|--------|--------| +| 2026-04-27 | Tool-based discovery: `list_skills` returns catalog, system prompt has behavioral instructions only | Scales better, clean evolution path to search-based discovery | +| 2026-04-27 | Config specifies paths only; name/description read from SKILL.md | Keep config lightweight, avoid bloating CR | +| 2026-04-09 | Initial version | Spike completion | + +## Appendix A: Agent Skills Specification + +The full specification is at https://agentskills.io/specification. + +Key points: +- `SKILL.md` must have YAML frontmatter with `name` (required) and `description` (required) +- `name` must be 1-64 characters, lowercase letters/numbers/hyphens, match parent directory +- `description` should be 1-1024 characters, describe what and when +- Body content after frontmatter contains the instructions (no format restrictions) +- Recommended to keep `SKILL.md` under 500 lines, move detailed reference material to separate files + +## Appendix B: Example Skill + +```markdown +--- +name: openshift-troubleshooting +description: Diagnose and fix common OpenShift deployment issues including pod failures, networking problems, and resource constraints. Use when users report deployment failures or application issues on OpenShift. +--- + +# OpenShift Troubleshooting + +## When to use this skill + +Use this skill when: +- A user reports pods not starting or crashing +- Deployments are stuck in pending state +- Services are unreachable +- Resource quota issues are suspected + +## Diagnostic steps + +### 1. Check pod status + +First, identify the problematic pods: + +oc get pods -n | grep -v Running + +For each failing pod, get detailed status: + +oc describe pod -n + +Look for: +- **Pending**: Usually resource constraints or scheduling issues +- **CrashLoopBackOff**: Application crash, check logs +- **ImagePullBackOff**: Image registry access issues + +### 2. Check events + +oc get events -n --sort-by='.lastTimestamp' + +### 3. Check logs + +oc logs -n +oc logs -n --previous # For crashed pods + +## Common issues and solutions + +See [references/common-errors.md](references/common-errors.md) for detailed solutions. +```markdown \ No newline at end of file