Skip to content

LCORE-1594: Agent Skills feature breakdown#1484

Open
jrobertboos wants to merge 5 commits intolightspeed-core:mainfrom
jrobertboos:lcore-1594-spike-agent-skills
Open

LCORE-1594: Agent Skills feature breakdown#1484
jrobertboos wants to merge 5 commits intolightspeed-core:mainfrom
jrobertboos:lcore-1594-spike-agent-skills

Conversation

@jrobertboos
Copy link
Copy Markdown
Contributor

@jrobertboos jrobertboos commented Apr 10, 2026

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Documentation
    • Added design and spike documents for Agent Skills: describes a configuration-driven skills catalog, required skill metadata and instruction files, startup validation and error handling, on-demand loading of skill instructions, controlled access to skill resources, protections for skill content in prompts, deduplication of activations, security/trust model, and example skills, tests, and integration guidance.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 10, 2026

Walkthrough

Adds two documentation files introducing a configuration-driven "Agent Skills" model for Lightspeed Core: filesystem-discovered skills via SKILL.md, startup validation, an <available_skills> catalog in the system prompt, and an activate_skill tool for on-demand instruction/resource loading with directory-limited access and security constraints.

Changes

Cohort / File(s) Summary
Agent Skills Design
docs/design/agent-skills/agent-skills.md
New full specification: config-driven skill directories listed in lightspeed-stack.yaml, required SKILL.md frontmatter, startup scanning/validation rules, <available_skills> catalog format, list_skills/activate_skill tool contracts, resource access limits, deduplication/tracking, and test/implementation notes.
Agent Skills Spike
docs/design/agent-skills/agent-skills-spike.md
New implementation spike: recommends filesystem discovery, describes tool-driven interaction model, enumerates JIRA implementation tasks (config validation, tool handlers, secure file access), documents SKILL.md structure, progressive disclosure tiers, security/trust model, and examples.

Sequence Diagram(s)

sequenceDiagram
  participant Startup
  participant AgentRuntime
  participant LLM
  participant FS as Filesystem

  Startup->>AgentRuntime: Read lightspeed-stack.yaml (skill entries)
  AgentRuntime->>FS: Validate skill paths and parse `SKILL.md` frontmatter
  AgentRuntime-->>LLM: Inject system prompt with <available_skills> catalog
  LLM->>AgentRuntime: Call `list_skills` or `activate_skill(name)`
  AgentRuntime->>FS: On activate, read SKILL.md body and allowed references
  FS-->>AgentRuntime: Return skill content and resource metadata
  AgentRuntime-->>LLM: Return structured <skill_content> and <skill_resources>
  LLM->>AgentRuntime: Use activated skill instructions/resources in conversation
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: introducing design documentation for the Agent Skills feature. It references the JIRA ticket (LCORE-1594) and accurately describes the PR's primary contribution as a feature breakdown/design specification.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jrobertboos jrobertboos marked this pull request as ready for review April 15, 2026 15:21
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/agent-skills/agent-skills-spike.md`:
- Around line 297-304: The SkillConfiguration model currently only enforces
length for the name field; add a Pydantic field validator on SkillConfiguration
for the name attribute (using field_validator('name') as a `@classmethod`, e.g.,
validate_name_format) that checks the regex ^[a-z0-9-]+$ and raises a ValueError
if it fails, preserving the existing min_length/max_length constraints so names
remain 1-64 chars but now also must be lowercase letters, numbers, and hyphens
only.
- Line 9: Update the "PoC validation" section header and sentence: replace the
confusing line starting with "**PoC validation**: Not applicable for this
spike." followed by the product list with a clear validation statement such as
"PoC validation: Validated by 30+ production implementations including Claude
Code, GitHub Copilot, Cursor, and OpenAI Codex" so the intent (that the standard
is proven) is explicit; edit the sentence under the "PoC validation" header to
use this rewording.
- Around line 133-135: The comment flags fragile line-number references; update
the docs to remove hard-coded line numbers and instead reference durable symbols
like the Configuration class definition in src/models/config.py and the
ModelContextProtocolServer class (MCP server config pattern), or add a note that
any line numbers should be verified/updated if code moves; ensure all
occurrences in agent-skills-spike.md use class names or file+symbol references
rather than specific line numbers.
- Around line 145-148: Clarify in the design/acceptance notes whether
build_skill_catalog() output is appended to per-request system prompts returned
by get_system_prompt() or only to resolved default prompts: update the
implementation plan and acceptance criteria to state the exact behavior (e.g.,
"skills are injected into the final resolved system prompt even when a
per-request system_prompt is provided" or "providing a per-request system_prompt
disables skill injection"), and mention interaction with the
disable_query_system_prompt flag (which when True prevents per-request prompts
and skill injection). Reference get_system_prompt(), build_skill_catalog(), and
disable_query_system_prompt in the note so reviewers and implementers know
exactly where to enforce or check this behavior.

In `@docs/design/agent-skills/agent-skills.md`:
- Around line 164-165: Add a brief comment above the PrivateAttr declarations
for _content and _references explaining these fields are runtime-only caches
(not serialized), are populated once during startup config validation (from
parsing SKILL.md and scanning the references directory), and should not be
relied on for persisted state; reference the _content, _references fields and
the use of PrivateAttr so future maintainers know their purpose and lifecycle.
- Around line 264-276: The pseudo-code in the PR appends the skill catalog to a
non-existent `resolved_prompt` variable but the real get_system_prompt() has
multiple early returns; update get_system_prompt() to first compute and store
the resolved prompt in a single variable (e.g., `base_prompt`) following the
existing precedence (explicit system_prompt, customization.custom_profile
default, customization.system_prompt, constants.DEFAULT_SYSTEM_PROMPT), then
call build_skill_catalog(configuration.skills) and if it returns content append
it to `base_prompt` and return that combined string, otherwise return
`base_prompt`; reference the get_system_prompt function, the build_skill_catalog
call, configuration.skills, and constants.DEFAULT_SYSTEM_PROMPT when making the
change.
- Around line 412-416: The current is_skill_content function uses simple
substring checks which can produce false positives; update is_skill_content to
use a regex that matches the full tag structure instead: compile and use the
pattern r'<skill_content\s+name="[^"]+">.*</skill_content>' with the DOTALL flag
(and import re if not present) and return whether the regex finds a match; keep
the function name is_skill_content and ensure the regex-based check replaces the
existing substring checks.
- Around line 423-439: SkillTracker currently keeps an unbounded _activated dict
(class SkillTracker, methods is_activated, mark_activated, clear) which can leak
memory; update it so entries are removed when conversations end and stale
entries expire: hook clear(conversation_id) into the conversation
deletion/cleanup pipeline, and replace or wrap _activated with a size-limited
LRU or TTL cache (e.g., implement an LRU eviction policy with a max_entries
parameter and/or per-entry expiry timestamps to purge abandoned conversations)
so that mark_activated and is_activated operate against the capped/expiring
storage and expired entries are evicted on access or by a periodic cleanup.
- Around line 520-531: The list_references function currently returns files
under refs_dir without excluding symbolic links, allowing symlinks to point
outside the skill directory; update list_references (the function and its use of
refs_dir and the rglob/f.is_file() check) to skip symlinks by filtering out
entries where f.is_symlink() is True (or use
f.resolve().is_relative_to(skill_path) as an extra safety check) so only regular
files inside the skill_path/references tree are returned; ensure the returned
paths remain relative_to(skill_path).
- Line 485: The line currently references the wrong tool name `read_skill`;
update it to the correct tool name `activate_skill` so the document consistently
uses `activate_skill` (verify and replace any stray `read_skill` mentions with
`activate_skill` in the same section), ensuring references to the tool handler
and function type match the rest of the doc and examples.
- Around line 167-186: In validate_skill_path (the `@model_validator` method
validate_skill_path) add an explicit check that Path(self.path) exists and is a
directory before checking for SKILL.md: compute skill_dir = Path(self.path), if
not skill_dir.exists() or not skill_dir.is_dir() raise a ValueError like "Skill
path not found or not a directory: {skill_dir}" so callers get a clear error;
then continue to build skill_md_path = skill_dir / "SKILL.md" and keep the
existing parse_skill_md, name comparison, setting of self._content and
self._references (list_references(Path(self.path))).
- Around line 327-354: The tool response in handle_activate_skill interpolates
unescaped values into XML; escape XML special characters for skill.name
(attribute), skill._content (body), skill.path, and each item in
skill._references before building lines. Use a proper XML/HTML escape utility
(e.g., html.escape or xml.sax.saxutils.escape) to replace &, <, >, ", ' and
apply it where skill.name, skill._content, skill.path and ref are inserted so
that the attribute, element text, and file entries are safe; keep the same tags
and structure but feed escaped strings into the list assembly.
- Around line 494-517: Update parse_skill_md to use a more flexible frontmatter
regex and stronger field validation: relax the regex to accept CRLF or LF line
endings and optional whitespace after the closing --- (e.g., use \r?\n and allow
whitespace or no trailing newline so frontmatter/body split still works), keep
re.DOTALL, and fall back to a match that allows end-of-file without an extra
newline; then validate that frontmatter (the YAML-safe_load result) is a dict
and that frontmatter["name"] and frontmatter["description"] exist, are instances
of str, and are non-empty after stripping (raise clear ValueError messages if
not). Ensure you update error messages from parse_skill_md to mention the
specific missing/invalid field names and include the variable names frontmatter
and body in the checks to make locating the code easier.
- Around line 252-258: The loop that builds XML for each skill currently
interpolates skill.name and skill.description directly (see the for skill in
skills: block) which can produce malformed XML or injection; update this to
escape XML special characters before interpolation (e.g., call a sanitizer like
html.escape or xml.sax.saxutils.escape on skill.name and skill.description) so
the lines appended use the escaped values; ensure both name and description are
escaped and keep the surrounding tags intact in the block that extends lines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: adb4ced2-4778-4147-9a10-41eeb61a92eb

📥 Commits

Reviewing files that changed from the base of the PR and between 07ca1b1 and 6208b62.

📒 Files selected for processing (2)
  • docs/design/agent-skills/agent-skills-spike.md
  • docs/design/agent-skills/agent-skills.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md
📚 Learning: 2026-03-02T16:38:30.287Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md

Applied to files:

  • docs/design/agent-skills/agent-skills.md
🪛 LanguageTool
docs/design/agent-skills/agent-skills.md

[uncategorized] ~26-~26: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...main-specific expertise the LLM can use on demand - Sharing knowledge across deployments ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[style] ~47-~47: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...n issues - U2: As a skill author, I want to create a SKILL.md file with instruction...

(REP_WANT_TO_VB)


[style] ~49-~49: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... - U4: As an enterprise customer, I want to deploy custom skills so that the LLM un...

(REP_WANT_TO_VB)

🔇 Additional comments (4)
docs/design/agent-skills/agent-skills.md (2)

286-317: LGTM: Well-designed tool registration.

The get_skill_tool implementation follows best practices:

  • Only registers when skills are configured (no empty tool)
  • Uses enum constraint to prevent hallucinated skill names
  • Clear description for the model

The use of an enum for the name parameter is particularly good—it prevents the model from requesting non-existent skills.


389-400: LGTM: Secure path validation.

The is_path_in_skill_directory implementation correctly prevents directory traversal attacks by:

  • Using Path.resolve() to normalize and resolve symlinks
  • Using relative_to() to verify the path is within the skill directory
  • Properly handling the ValueError exception

This approach handles edge cases like parent directory traversal (../../../etc/passwd) and symlinks correctly.

docs/design/agent-skills/agent-skills-spike.md (2)

329-336: 🧹 Nitpick | 🔵 Trivial

Consider documenting recommended file permissions.

The security considerations note that configured skill paths are trusted, but don't address the risk of skill directories being writable by the application. If an attacker gains code execution or the application user account is compromised, they could modify SKILL.md files to inject malicious instructions.

Consider adding a recommendation that skill directories should be:

  • Owned by root or a dedicated user (not the application user)
  • Read-only for the application user
  • Not writable by the application process
⛔ Skipped due to learnings
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.

196-218: Design appropriately addresses path security requirements.

The design document provides a sound validation approach: the is_path_in_skill_directory() function at lines 389–399 uses Path.resolve() and relative_to() to prevent directory traversal attacks. The activate_skill tool returns the skill's base_path, constraining the LLM to construct paths like {base_path}/references/guide.md. Permission allowlisting at the file-read tool level (per agentskills.io guidance) ensures bundled resources are accessible without prompts.

The specific MCP or file-read tool integration can be confirmed during implementation, but the design-level validation and path restriction strategy is correct.


**The recommendation**: Implement the [Agent Skills open standard](https://agentskills.io) with a config-based discovery model. Skills are defined in `lightspeed-stack.yaml` pointing to directories containing `SKILL.md` files. The LLM sees a skill catalog (name + description) in the system prompt and can request full instructions on demand via a dedicated tool.

**PoC validation**: Not applicable for this spike. The feature is well-defined by the agentskills.io specification and has been implemented by 30+ agent products including Claude Code, GitHub Copilot, Cursor, and OpenAI Codex.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Clarify PoC validation statement.

The statement "Not applicable for this spike" followed by listing 30+ products that have implemented the feature is potentially confusing. Consider rewording to something like "PoC validation: Validated by 30+ production implementations including Claude Code, GitHub Copilot, Cursor, and OpenAI Codex" to clarify that the standard itself is proven rather than requiring a separate PoC.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` at line 9, Update the "PoC
validation" section header and sentence: replace the confusing line starting
with "**PoC validation**: Not applicable for this spike." followed by the
product list with a clear validation statement such as "PoC validation:
Validated by 30+ production implementations including Claude Code, GitHub
Copilot, Cursor, and OpenAI Codex" so the intent (that the standard is proven)
is explicit; edit the sentence under the "PoC validation" header to use this
rewording.

Comment thread docs/design/agent-skills/agent-skills-spike.md
Comment thread docs/design/agent-skills/agent-skills-spike.md Outdated
Comment on lines +297 to +304
**Frontmatter fields**:
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | 1-64 chars, lowercase, hyphens only |
| `description` | Yes | Max 1024 chars, describes what and when |
| `license` | No | License name or file reference |
| `compatibility` | No | Environment requirements |
| `metadata` | No | Arbitrary key-value pairs |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Validation discrepancy: name format not enforced.

The Agent Skills specification (line 300) states that name must be "1-64 chars, lowercase, hyphens only", but the proposed SkillConfiguration model in the design doc (agent-skills.md lines 141-147) only validates length (min_length=1, max_length=64) without enforcing the lowercase/hyphens-only constraint.

This could lead to configuration that violates the agentskills.io standard. Consider adding a Pydantic validator to enforce the format:

`@field_validator`('name')
`@classmethod`
def validate_name_format(cls, v: str) -> str:
    if not re.match(r'^[a-z0-9-]+$', v):
        raise ValueError('Skill name must contain only lowercase letters, numbers, and hyphens')
    return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` around lines 297 - 304, The
SkillConfiguration model currently only enforces length for the name field; add
a Pydantic field validator on SkillConfiguration for the name attribute (using
field_validator('name') as a `@classmethod`, e.g., validate_name_format) that
checks the regex ^[a-z0-9-]+$ and raises a ValueError if it fails, preserving
the existing min_length/max_length constraints so names remain 1-64 chars but
now also must be lowercase letters, numbers, and hyphens only.

Comment thread docs/design/agent-skills/agent-skills.md Outdated
Comment on lines +412 to +416
```python
def is_skill_content(message: str) -> bool:
"""Check if a message contains skill content that should be protected."""
return "<skill_content" in message and "</skill_content>" in message
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

String-based tag check is conservative but acceptable.

The is_skill_content function uses simple string matching to identify skill content. This is conservative (may have false positives if users mention these tags) but appropriate for a protection mechanism—better to over-preserve content than accidentally remove skill instructions.

If false positives become an issue, consider using regex to match the full tag structure: r'<skill_content\s+name="[^"]+">.*</skill_content>' with DOTALL flag.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 412 - 416, The current
is_skill_content function uses simple substring checks which can produce false
positives; update is_skill_content to use a regex that matches the full tag
structure instead: compile and use the pattern
r'<skill_content\s+name="[^"]+">.*</skill_content>' with the DOTALL flag (and
import re if not present) and return whether the regex finds a match; keep the
function name is_skill_content and ensure the regex-based check replaces the
existing substring checks.

Comment on lines +423 to +439
class SkillTracker:
"""Track activated skills per conversation."""

def __init__(self):
self._activated: dict[str, set[str]] = {} # conversation_id -> skill names

def is_activated(self, conversation_id: str, skill_name: str) -> bool:
return skill_name in self._activated.get(conversation_id, set())

def mark_activated(self, conversation_id: str, skill_name: str) -> None:
if conversation_id not in self._activated:
self._activated[conversation_id] = set()
self._activated[conversation_id].add(skill_name)

def clear(self, conversation_id: str) -> None:
self._activated.pop(conversation_id, None)
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SkillTracker needs conversation lifecycle integration.

The SkillTracker class maintains an unbounded dictionary mapping conversation IDs to activated skills. This will grow indefinitely unless clear() is called when conversations end.

Consider:

  1. Integrating with conversation cleanup/deletion events
  2. Implementing TTL-based expiry for abandoned conversations
  3. Adding a maximum size with LRU eviction

Without this, long-running instances could accumulate thousands of conversation tracking entries, causing a memory leak.

Would you like me to propose an implementation with LRU caching or TTL-based expiry?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 423 - 439,
SkillTracker currently keeps an unbounded _activated dict (class SkillTracker,
methods is_activated, mark_activated, clear) which can leak memory; update it so
entries are removed when conversations end and stale entries expire: hook
clear(conversation_id) into the conversation deletion/cleanup pipeline, and
replace or wrap _activated with a size-limited LRU or TTL cache (e.g., implement
an LRU eviction policy with a max_entries parameter and/or per-entry expiry
timestamps to purge abandoned conversations) so that mark_activated and
is_activated operate against the capped/expiring storage and expired entries are
evicted on access or by a periodic cleanup.

Comment thread docs/design/agent-skills/agent-skills.md Outdated
Comment thread docs/design/agent-skills/agent-skills.md
Comment on lines +520 to +531
def list_references(skill_path: Path) -> list[str]:
"""List files in the skill's references/ subdirectory."""
refs_dir = skill_path / "references"
if not refs_dir.is_dir():
return []

return [
str(f.relative_to(skill_path))
for f in refs_dir.rglob("*")
if f.is_file()
]
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider symlink handling in reference listing.

The list_references function lists all files in the references/ directory but doesn't exclude symlinks. A symbolic link could point outside the skill directory, potentially exposing unintended files.

🔒 Filter out symlinks for security
 def list_references(skill_path: Path) -> list[str]:
     """List files in the skill's references/ subdirectory."""
     refs_dir = skill_path / "references"
     if not refs_dir.is_dir():
         return []
 
     return [
         str(f.relative_to(skill_path))
         for f in refs_dir.rglob("*")
-        if f.is_file()
+        if f.is_file() and not f.is_symlink()
     ]

This prevents accidentally listing files outside the skill directory via symlinks. If symlink support is desired for legitimate use cases, add explicit documentation about the security implications.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 520 - 531, The
list_references function currently returns files under refs_dir without
excluding symbolic links, allowing symlinks to point outside the skill
directory; update list_references (the function and its use of refs_dir and the
rglob/f.is_file() check) to skip symlinks by filtering out entries where
f.is_symlink() is True (or use f.resolve().is_relative_to(skill_path) as an
extra safety check) so only regular files inside the skill_path/references tree
are returned; ensure the returned paths remain relative_to(skill_path).


**The problem**: Lightspeed Core has no mechanism for extending agent capabilities with specialized instructions or domain knowledge. Users cannot package reusable workflows, troubleshooting guides, or domain expertise into portable, discoverable units that the LLM can use on demand.

**The recommendation**: Implement the [Agent Skills open standard](https://agentskills.io) with a config-based discovery model. Skills are defined in `lightspeed-stack.yaml` pointing to directories containing `SKILL.md` files. The LLM sees a skill catalog (name + description) in the system prompt and can request full instructions on demand via a dedicated tool.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed this "pointing to directories" pattern throughout this project, but want to potentially explore a new way via this feature if possible -

If we look to the Kubernetes world to guide us in this for example, helm charts can be found in https://artifacthub.io, and some helm charts are distributed using container registries, some distributed using GitHub, and some using standalone websites. Here's a popular helm chart, https://artifacthub.io/packages/helm/percona/ps-db, distributed using GitHub - https://percona.github.io/percona-helm-charts/

Notice that the way for me to use the ps-db helm chart artifacts (which are files in a folder), is not "clone helm GitHub repository, use this directory" - that wouldn't be scalable once it got to 100+ helm charts.

Which means it won't be scalable for us when we get 10+ agent skill contribution.

Here's examples of the three scenarios if we were to use them for distributing agent skills:

  1. using container registries - https://docker.io/anik120/markdown-to-gdoc-converter-agent-skill
  2. using GitHub - https://anik120.github.io/markdown-to-gdoc-converter-agent-skill/
  3. Using standalone website - https://my-personal-website/markdown-to-gdoc-converter-agent-skill

I think for our use case, the best first step is to start with a GitHub distribution mechanism. That would mean we'd need to build in mechanism to use the GitHub apis to pull down these artifacts (and make sure they contain artifacts in the filesystem that conform to the standards put forth in agentskills.io).

This will help Red Hat productize agent skills too - we can have Red hat owned repositories that contain these skills, that are submitted by partner eco-system partners (and others), but vetted by Red Hat.

...the same pattern can then be developed for `providers" and other stuff...but we can start with agent skills as the implementation guidance for other artifacts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie to use the example you have below, we'd modify it to look like:

skills:
  - name: "code-review"
    description: "Review code for best practices and security issues."
    path: https://redhat.github.io/code-review-agent/
  - name: "troubleshooting"
    description: "Diagnose and fix OpenShift deployment issues."
    path: https://redhat.github.io/openshift-troubleshooter-agent/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling skills down from external urls introduces significant security/trust concerns. Not only "do i trust this url today" but "what happens if the content at that url is compromised or updated in the future?"

For that reason alone I think starting w/ reading from the filesystem is correct.

But in addition, there are many ways, within k8s, to inject content into a container filesystem. So there's no reason that in the future, you can't add adapters that pull container from github, or copy content over from a sidecar container or uses image volumes to pull content from a "skills library" image, or even define skills in configmaps that get mounted in.

So my recommendation would be to start by assuming LCORE gets its skills from the filesystem. Then later you can worry about the best ways to get those skills onto the filesystem from various marketplace/distribution sources.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea however I believe that this is something that could be added in a later feature as there would not be much architectural change in the system wether it be a local "directory"/path or a remote "directory"/path. I do think that a local directory is required while the remote aspect is a "would be nice" tho.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add somewhere tho to make the path handled by some sort of interface so we can have local and remote handlers for the skill directories.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think that makes sense. with

So there's no reason that in the future, you can't add adapters that pull container from github, or copy content over from a sidecar container or uses image volumes to pull content from a "skills library" image, or even define skills in configmaps that get mounted in.

we don't have to do

I can add somewhere tho to make the path handled by some sort of interface so we can have local and remote handlers for the skill directories.

this even. It can just be a plain simple path in the filesystem that is pointed to at the configuration - the abstraction for how the content is pulled then is taken care at the sidecar level

| B | Scripts allowed (full spec compliance) |
| C | Deferred (start with no scripts, add later) |

**Recommendation**: **A** (No scripts). As noted in LCORE-1339, there are security concerns with executing arbitrary scripts. The core value of skills is in the instructions — scripts can be added in a future phase after security review if needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we adopt the agent skill distribution I suggested above, we can have script execution capabilities - Red Hat verified agent skills (with full, script execution capabilities) in a Red Hat owned repo would be the business value we bring - if you're an enterprise customer using the agent skills from the Red Hat owned repos, you can be sure your system won't be compromised with script execution.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until LCORE provides a sandbox mechanism to run said scripts in an ephemeral, isolated environment, I wholeheartedly agree w/ not supporting script execution.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of supporting more than just local paths for skills however it would be just as easy to execute scripts from a local directory the only reason I am recommending not implementing that functionality is according to the feature ticket it is out of scope.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of not opening skill support in OpenStack Lightspeed to users (at least initially), it would be a mechanism used by the Red Hat team to be able to add our skills to our product, so the code in scripts would be:

  • Created by Red Hat
  • Delivered and Configured by our installer

So we would not need skill catalog sources and we wouldn't have code execution of unknown origin.

For code execution we could have a plugin system, like ADK does. They have a couple of default options: run locally, run in Vertex system, run with docker...
For OpenStack Lightspeed (and probably for OpenShift Lightspeed) we would use a local runner or a pod runner.

And I believe having script support gives something that we don't have right now any other way: non-global tools. In this case they would be per-skill tools.

So I would vote for option B or C.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like option C is the best route here - start with no script execution, roll it out to product teams, and then iterate on script execution after MVP is in the hands of product teams and we have a chance to take in feedback.


**The problem**: Lightspeed Core has no mechanism for extending agent capabilities with specialized instructions or domain knowledge. Users cannot package reusable workflows, troubleshooting guides, or domain expertise into portable, discoverable units that the LLM can use on demand.

**The recommendation**: Implement the [Agent Skills open standard](https://agentskills.io) with a config-based discovery model. Skills are defined in `lightspeed-stack.yaml` pointing to directories containing `SKILL.md` files. The LLM sees a skill catalog (name + description) in the system prompt and can request full instructions on demand via a dedicated tool.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling skills down from external urls introduces significant security/trust concerns. Not only "do i trust this url today" but "what happens if the content at that url is compromised or updated in the future?"

For that reason alone I think starting w/ reading from the filesystem is correct.

But in addition, there are many ways, within k8s, to inject content into a container filesystem. So there's no reason that in the future, you can't add adapters that pull container from github, or copy content over from a sidecar container or uses image volumes to pull content from a "skills library" image, or even define skills in configmaps that get mounted in.

So my recommendation would be to start by assuming LCORE gets its skills from the filesystem. Then later you can worry about the best ways to get those skills onto the filesystem from various marketplace/distribution sources.

| B | Customer-defined only (end users import their own skill definitions) |
| C | Both built-in and customer-defined |

**Recommendation**: **B** (Customer-defined only). This allows end users to extend Lightspeed with their own domain expertise without requiring changes to the core product. Built-in skills can be added later if needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to propose that the only way stack products (e.g. rhel lightspeed) would be able to ship skills would be if they were a part of(shipped with) lightspeed core? Is that the intended statement?

It seems like the primary value of this feature would be to allow product teams to inject better agentic guidance into LCORE that serves their specific intended use cases. And LCORE should enable that in a way that does not require those skills be packaged with LCORE itself. (Such as by allowing skills to be injected into the LCORE filesystem via configmaps or container volumes provided by product teams)

customers adding their own skills also makes sense, but opens product teams up to the risk that a customer adds a skill that conflicts w/ their own logic, for better or worse. So i'd expect a lot of careful thought to go into what it means to allow customers to do that. It's like allowing them to edit the system prompt, effectively.

Copy link
Copy Markdown
Contributor Author

@jrobertboos jrobertboos Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so by customer I mean an LS app team (e.g. rhel lightspeed) and they should be able to have/maintain their own skills which then would host the skills in the container alongside lightspeed stack and would point using the lightspeed-stack configuration to the included skills.

The customers (non-red hat) of the LS app teams would not have the ability to add skills just like they do not have the ability to add mcp servers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok in that case i think we are in agreement but i'd suggest a terminology cleanup because "customers" and "end users" is easy to misunderstand.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read "Customers" as Openshift/RHEL/Ansible users too, might be worth clarifying this as "product team" (or something that identifies Openshift/RHEL/Ansible teams clearly)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the terminology change here. I also initially read "customers" as the final customers and not a Lightspeed flavor (RHEL LS, OpenStack LS, etc...)

| C | API-based (skills registered/managed via REST API) |
| D | Hybrid (built-in via config, customer-defined via filesystem or API) |

**Recommendation**: **B** (Config-based). Skills are defined in `lightspeed-stack.yaml` similar to `mcp_servers`. This provides explicit control over which skills are available, integrates with existing configuration patterns, and avoids filesystem scanning complexity.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the skill definitions would be inlined in the lightspeed-stack.yaml? I think that is a mistake. For one thing, it means the config CR that drives LCORE needs to contain those skills, which potentially makes the CR huge (unless you're going to stitch the lightspeed-stack.yaml together from a config CR plus various Skill CRs).

It also makes the yaml file huge and hard to update when one particular skill needs to be modified.

MCP servers are not an equivalent config concept here because they just have a few well specified config fields that need to be represented.

Loading skills from the filesystem should not be complicated. Getting the skill files onto the filesystem could be complicated, but again k8s offers numerous ways to accomplish this (and for non-k8s deployments, there are also many ways to solve this problem, none of them particularly novel or complicated)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that skill definitions are not necessarily needed to be defined in the config (I will update that tomorrow) as they can be fetched from the actual files. The only reason I did it this way was because I was thinking of the config being like the skills catalog.

I am not sure though why the skills should not be similar to the MCP config tho at least in the sense that at the bare minimum we need a path.

And it could be that we supply just one path or a list of paths and its expected that that path leads to a directory of skills e.g.

~/.agents/skills/
├── pdf-processing/
│   ├── SKILL.md          ← discovered
│   └── scripts/
│       └── extract.py
├── data-analysis/
│   └── SKILL.md          ← discovered
└── README.md             ← ignored (not a skill directory)

p.s. sorry for the broken English, I am tired haha

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure though why the skills should not be similar to the MCP config tho at least in the sense that at the bare minimum we need a path.

no disagreement there, my "this is not the same as mcp config" was only intended to mean "we don't need the full config definition inlined". Having the skill config consist entirely of either a path per skill, or just a path to a directory of all the skills, is what i had in mind.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhhh, ok that makes sense :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If configured path leads to a directory of skills and lightsped-stack recursively crawls all sub-directories looking for SKILL.md files then we are limiting things to a flat structure where all skills are at the top level, right?

I think that's an unnecessary limitation, since a nice feature of skills for complex systems is the possibility of nested skills.

And if we limit the auto-discovery of skills from the configured path to the first level of sub-directories, then we are also making it difficult to reuse some skills in multiple first-level skills.

Personally I think having the path configuration point to a directory with a SKILL.md file is the basic functionality to have. If we want we could also have a recursive_path or something like that as well that auto-discovers skills.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @Akrog's comment below:

Or if we may want in the future to have a recursive_path or something else:

skills:

  • path: /var/skills/code-review
  • path: /var/skills/troubleshooting

☝🏽 looks like the best direction to move forward with - This way the API we expose doesn't have to change, and we can add recursive_path as an implementation detail after.

ie you can either choose to configure as

skills:
- path: /var/skills/code-review
- path: /var/skills/troubleshooting

or

skills:
- path: /var/skills/

but we'll add the second capability later since that'll involve a bit of "path content validation" logic.

| B | Scripts allowed (full spec compliance) |
| C | Deferred (start with no scripts, add later) |

**Recommendation**: **A** (No scripts). As noted in LCORE-1339, there are security concerns with executing arbitrary scripts. The core value of skills is in the instructions — scripts can be added in a future phase after security review if needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until LCORE provides a sandbox mechanism to run said scripts in an ephemeral, isolated environment, I wholeheartedly agree w/ not supporting script execution.

| B | Inline (instructions embedded directly in YAML) |
| C | Both (support either path or inline content) |

**Recommendation**: **A** (Path-based). This follows the agentskills.io spec directory structure, keeps the YAML config clean, and allows skills to include `references/` subdirectories for additional documentation that can be loaded on demand.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my other comments, i'm +1 on path-based.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to path-based

| B | Dedicated tool (register `activate_skill` tool, LLM calls to load) |
| C | Per-request parameter (client specifies which skills to activate) |

**Recommendation**: **A** (System prompt injection) for the catalog, combined with **B** (dedicated tool) for loading full instructions. The system prompt contains the skill catalog (name + description, ~50-100 tokens per skill). When the LLM decides a skill is relevant, it calls the `activate_skill` tool to load the full instructions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok, but i'm going to offer up an alternative approach: Don't even include the skill descriptions in the system prompt. Just give the llm a "skill_search" tool. Let it invoke the skill_search tool to try to identify skills that may match the intent, and then it can pick from the search results to invoke activate_skill. (The skill searching would be some sort of keyword+semantic hybrid search)

This might be a "phase 2" implementation. The risk is that searching is ineffective in finding the right skills, but the upside is you don't risk overwhelming the model with too many skill descriptions in the system prompt.

Maybe consider a hybrid approach (as long as their are less than X skills, just include them all in the system prompt. When we get beyond X skills, move to a skill search approach.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the main reason that I did it this way by adding the skills to the context is because of this. I guess this could work well especially once a lot of skills get added but ur right in that I think that this would be a "phase 2" thing, I might change PR tho so that instead of appending to context a required skill_search tool that "for now" does not search but just returns the skill catalog list (names + descriptions). that would make extending the "search" part of the tool so that the model is not overwhelmed by large number of skills easy in the future.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ADK they multiple tools for the LLM to handle skills:

  • list_skills
  • load_skill
  • load_skill_resource
  • run_skill_script

And they include instructions: https://github.com/google/adk-python/blob/909a8c2ad4d06ad485173f40f278c503cd66a063/src/google/adk/tools/skill_toolset.py#L60-L87

Copy link
Copy Markdown

@bostrt bostrt Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask Red Hat (access.redhat.com/ask) injects the Skill name + description into a dedicated Skill Router's System Prompt, which is very effective for a finite number of Skills and I believe is proportional to the model's size + quality (i.e. you can't rely on 100 Skills in a small language model system prompt). I just want to add this in the phase 1 vs 2 discussion.

Something else I want to share is Ask Red Hat opted to use Structured Output to choose skills instead of Tool Calling (we tried both). We've struggled with reliable tool calling across the open-weight models we tested and Structured Output really helped a lot. We do lag behind on testing frontier models though so take with a grain of salt.

EDIT: The Structured Output format is like:

{
    "selected_skill_names": [
        "cve-search",
    ],
    "reasoning": "The user is seeking assistance with CVE-2026-.... This falls under cve-search ... blah blah "
}

For transparency, Ask Red Hat is currently using a Small Language Model (Granite 4.0 3B Micro) for Skill selection, which keeps routing fast and cheap but means our constraints may be more extreme than others.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to clarify that Ask Red Hat uses a dedicated Skill Router agent (the SLM) that runs before the final answer generation agent — it's a two-stage LLM pipeline rather than the main agent activating skills itself via tool calls. This is a fundamentally different approach from what's being proposed here, but it's worked well for us.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of GREAT feedback thank you all.

@jrobertboos looks like we can try what ADK does, and still keep phase 1 implementation scoped? I'd capture what Ask Red Hat does as a "Out of scope for phase 1 but follow up to be explored in phase 2" to account for the fact that we'll have to experiment with large language models and see the results.

- `description`: What the skill does and when to use it
- `path`: Absolute path to directory containing `SKILL.md`

**Recommendation**: Approved. This structure is consistent with existing config patterns and provides explicit control.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't skill.md files self-describing? What is the benefit to explicitly configure individual skills vs just pointing to a directory of skills, the same way claude/cursor/etc do? (skills directory, containing a directory per skill)

I can envision a few benefits to this approach, but it also adds significant config management overheard and opportunity for mistakes (what happens when the path is incorrect? what happens when the name/description don't match the skill.md file the path points to?)


If lightspeed-stack implements context compaction (conversation history summarization), skill content must be exempted from pruning. Skill instructions are durable behavioral guidance — losing them mid-conversation silently degrades performance.

The `<skill_content>` tags from structured wrapping enable identification during compaction:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems potentially fragile. what happens if this string shows up in the context for other reasons?

Also at some point you may be unable to compact if you're unwilling to compact skill content.

Copy link
Copy Markdown
Contributor Author

@jrobertboos jrobertboos Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main reason I did this is its the recommendation from the agent skills spec. also I did not think about the compaction problem but I think that would only really be a problem if there were ton and TONS of skills, which I think we could cover in a "phase 2".

"Phase 2" could basically be increasing scalability to allow for integration of more than 100 (or any limiting number) skills. As this would cover working on the skill_search which would be needed for large numbers of skills and how to handle compaction with a very large skill catalog (I think that this is less of an issue because I am not finding anyone having problems with this online, but still it would be something to safeguard against).

also the reason we avoid compaction is because its defined in the spec here.

self._activated: dict[str, set[str]] = {} # conversation_id -> skill names

def is_activated(self, conversation_id: str, skill_name: str) -> bool:
return skill_name in self._activated.get(conversation_id, set())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this persisted somewhere? if not, what are the implications for persisting+resuming conversations?

|----------|-------|
| Skill path doesn't exist | Startup fails: "Skill path does not exist: {path}" |
| SKILL.md not found | Startup fails: "SKILL.md not found at {path}/SKILL.md" |
| Name mismatch | Startup fails: "Skill name mismatch: config has '{x}' but SKILL.md has '{y}'" |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my earlier comment, you could avoid this entire class of error by simplifying the config.yaml to not include this information.

Copy link
Copy Markdown
Contributor

@Akrog Akrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for starting this work. :-)

| C | API-based (skills registered/managed via REST API) |
| D | Hybrid (built-in via config, customer-defined via filesystem or API) |

**Recommendation**: **B** (Config-based). Skills are defined in `lightspeed-stack.yaml` similar to `mcp_servers`. This provides explicit control over which skills are available, integrates with existing configuration patterns, and avoids filesystem scanning complexity.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If configured path leads to a directory of skills and lightsped-stack recursively crawls all sub-directories looking for SKILL.md files then we are limiting things to a flat structure where all skills are at the top level, right?

I think that's an unnecessary limitation, since a nice feature of skills for complex systems is the possibility of nested skills.

And if we limit the auto-discovery of skills from the configured path to the first level of sub-directories, then we are also making it difficult to reuse some skills in multiple first-level skills.

Personally I think having the path configuration point to a directory with a SKILL.md file is the basic functionality to have. If we want we could also have a recursive_path or something like that as well that auto-discovers skills.

| B | Scripts allowed (full spec compliance) |
| C | Deferred (start with no scripts, add later) |

**Recommendation**: **A** (No scripts). As noted in LCORE-1339, there are security concerns with executing arbitrary scripts. The core value of skills is in the instructions — scripts can be added in a future phase after security review if needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of not opening skill support in OpenStack Lightspeed to users (at least initially), it would be a mechanism used by the Red Hat team to be able to add our skills to our product, so the code in scripts would be:

  • Created by Red Hat
  • Delivered and Configured by our installer

So we would not need skill catalog sources and we wouldn't have code execution of unknown origin.

For code execution we could have a plugin system, like ADK does. They have a couple of default options: run locally, run in Vertex system, run with docker...
For OpenStack Lightspeed (and probably for OpenShift Lightspeed) we would use a local runner or a pod runner.

And I believe having script support gives something that we don't have right now any other way: non-global tools. In this case they would be per-skill tools.

So I would vote for option B or C.

| B | Inline (instructions embedded directly in YAML) |
| C | Both (support either path or inline content) |

**Recommendation**: **A** (Path-based). This follows the agentskills.io spec directory structure, keeps the YAML config clean, and allows skills to include `references/` subdirectories for additional documentation that can be loaded on demand.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to path-based

| B | Dedicated tool (register `activate_skill` tool, LLM calls to load) |
| C | Per-request parameter (client specifies which skills to activate) |

**Recommendation**: **A** (System prompt injection) for the catalog, combined with **B** (dedicated tool) for loading full instructions. The system prompt contains the skill catalog (name + description, ~50-100 tokens per skill). When the LLM decides a skill is relevant, it calls the `activate_skill` tool to load the full instructions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ADK they multiple tools for the LLM to handle skills:

  • list_skills
  • load_skill
  • load_skill_resource
  • run_skill_script

And they include instructions: https://github.com/google/adk-python/blob/909a8c2ad4d06ad485173f40f278c503cd66a063/src/google/adk/tools/skill_toolset.py#L60-L87


```yaml
skills:
- name: "code-review"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1: I would prefer not duplicating the name and description that is already present in the SKILL.md. It adds complexity to the configuration process, additional checks to the code that needs to confirm that this matches the metadata in the SKILL.md file...

In my opinion it should be either:

skills:
- /var/skills/code-revie
- /var/skill/troubleshooting

Or if we may want in the future to have a recursive_path or something else:

skills:
- path: /var/skills/code-revie
- path: /var/skill/troubleshooting

<!-- key: LCORE-???? -->
### LCORE-???? Implement skill catalog injection

**Description**: Inject the skill catalog (name + description for each skill) into the system prompt so the LLM knows what skills are available.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it would be better to have the list_skills tool instead as discussed above.

@bostrt
Copy link
Copy Markdown

bostrt commented Apr 24, 2026

One thing worth considering (maybe out of scope for this design?) is observability/tracing for skill activation decisions.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/agent-skills/agent-skills-spike.md`:
- Around line 279-317: Add fence language identifiers (e.g., ```markdown and
```text) to all fenced code blocks (the "Directory structure" and "SKILL.md
format" examples) and ensure there is a blank line before and after each fenced
block; also normalize table spacing in the "Frontmatter fields" and "Progressive
disclosure" tables so pipes are consistently spaced (e.g., single space around
cell content) and header separator rows use matching pipe counts. Update the
SKILL.md example frontmatter block to include a language tag and ensure blank
lines around it, and make the three tables' columns and separator rows
consistent to satisfy markdownlint rules.

In `@docs/design/agent-skills/agent-skills.md`:
- Around line 124-130: The markdown has lint issues: add an explicit language
tag to each fenced code block (e.g., replace ``` with ```text or ```bash for the
directory tree and other fenced regions referenced around lines ~124, ~226),
ensure there is a blank line before and after every fenced code block (fix the
missing blank lines noted around ~549), and ensure the file ends with exactly
one trailing newline (fix at ~646); update the code fence around the
SKILL.md/references example and the other fences (the blocks at the ranges
called out) accordingly.
- Around line 450-459: Pick a single canonical error payload format (use JSON)
and update both places to match: change the Error handling table entry for
"Unknown skill in read_skill" and the handle_activate_skill() example so they
both return {"error":"Unknown skill: {name}"} (ensure exact key "error" and same
message format), updating any surrounding text to reflect JSON rather than XML
so the tool contract is consistent between read_skill and handle_activate_skill.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5e894381-9c86-4cfa-a27c-51b1df3cccfe

📥 Commits

Reviewing files that changed from the base of the PR and between 6208b62 and 1afd720.

📒 Files selected for processing (2)
  • docs/design/agent-skills/agent-skills-spike.md
  • docs/design/agent-skills/agent-skills.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: Pylinter
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md
📚 Learning: 2026-03-02T16:38:30.287Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md

Applied to files:

  • docs/design/agent-skills/agent-skills.md
📚 Learning: 2026-03-02T15:57:15.929Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-stack PR: 1250
File: AGENTS.md:11-12
Timestamp: 2026-03-02T15:57:15.929Z
Learning: In lightspeed-stack's AGENTS.md, the Linting Tools section lists actual tool names (e.g., mypy, pylint, pyright, ruff, black), not make target names. For example, `check-types` is a make target that runs mypy, but mypy is the tool name documented in the Linting Tools section.

Applied to files:

  • docs/design/agent-skills/agent-skills.md
🪛 LanguageTool
docs/design/agent-skills/agent-skills.md

[style] ~16-~16: The phrase ‘do NOT have the ability to’ might be wordy. Consider using “can't”.
Context: ... Note: End users of LS app products do NOT have the ability to add skills, similar to how they cannot ...

(HAS_THE_ABILITY_TO)


[uncategorized] ~28-~28: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...main-specific expertise the LLM can use on demand - Sharing knowledge across deployments ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[style] ~49-~49: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...n issues - U2: As a skill author, I want to create a SKILL.md file with instruction...

(REP_WANT_TO_VB)


[style] ~51-~51: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...l to use - U4: As an LS app team, I want to deploy domain-specific skills so that t...

(REP_WANT_TO_VB)

docs/design/agent-skills/agent-skills-spike.md

[style] ~25-~25: The phrase ‘do NOT have the ability to’ might be wordy. Consider using “can't”.
Context: ... Note: End users of LS app products do NOT have the ability to add skills, similar to how they cannot ...

(HAS_THE_ABILITY_TO)

🪛 markdownlint-cli2 (0.22.1)
docs/design/agent-skills/agent-skills.md

[warning] 124-124: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 226-226: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 549-549: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 646-646: Files should end with a single newline character

(MD047, single-trailing-newline)

docs/design/agent-skills/agent-skills-spike.md

[warning] 280-280: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 280-280: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 288-288: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 300-300: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🔇 Additional comments (12)
docs/design/agent-skills/agent-skills-spike.md (3)

9-9: Clarify “PoC validation” wording.

The current text (“PoC validation: Not applicable for this spike.” + product list) is still ambiguous about what is “validated” (the standard vs this PoC). Reword to explicitly say the standard is validated by production implementations.

🛠️ Proposed fix
-**PoC validation**: Not applicable for this spike. The feature is well-defined by the agentskills.io specification and has been implemented by 30+ agent products including Claude Code, GitHub Copilot, Cursor, and OpenAI Codex.
+**PoC validation**: Validated by 30+ production implementations of the Agent Skills approach, including Claude Code, GitHub Copilot, Cursor, and OpenAI Codex.

133-137: Remove fragile hard-coded line references in “Agentic tool instruction”.

The tool instruction uses specific file line numbers (e.g., “around line 1817” / “line 468”) which are likely to drift. Prefer durable references like “Configuration class definition” and “ModelContextProtocolServer class”.


141-158: Document per-request system_prompt vs skill-catalog injection behavior.

The catalog injection JIRA specifies appending the catalog in get_system_prompt(), but it doesn’t state whether skill injection occurs when a per-request system_prompt is provided (and how disable_query_system_prompt affects that). Add explicit acceptance criteria to remove ambiguity.

docs/design/agent-skills/agent-skills.md (9)

169-188: Add explicit “path exists + is directory” validation before checking SKILL.md.

The validator checks SKILL.md directly but doesn’t first validate that self.path exists and is a directory. This can produce confusing errors when path is wrong (file vs dir vs missing). Add explicit checks to improve diagnosability.


143-149: Enforce full agent-skill name format, not just length.

The spec says name must be 1–64 chars, lowercase, hyphenated. The snippet currently only enforces min_length/max_length (no lowercase/hyphen/character-class validator), allowing invalid skill configs.


234-264: Escape XML special characters when building <available_skills>.

build_skill_catalog() interpolates skill.name and skill.description directly into XML. If either contains <, >, &, or quotes, the resulting XML becomes malformed and could be injection-prone. Escape/sanitize values before interpolation.


266-279: Fix / clarify the get_system_prompt() pseudo-code to match intended control flow.

The snippet shows appending to resolved_prompt, but it doesn’t demonstrate how resolved_prompt is guaranteed to be defined in all branches (and real get_system_prompt() implementations often have multiple return points). Update the design snippet to show the correct pattern (compute base prompt, then append, then return).


329-356: Escape XML special characters in handle_activate_skill() output.

The tool response wraps unescaped skill.name, skill._content, skill.path, and skill._references entries inside XML. Since _content can contain arbitrary markdown, escaping is needed to keep XML well-formed and avoid injection/malformed tags.


414-418: Consider a more robust <skill_content> detector than substring checks.

is_skill_content() uses simple "<skill_content" in message and "</skill_content>" in message, which can false-positive if those substrings appear in unrelated text. A regex matching the full tag structure is safer.


425-441: Prevent SkillTracker from growing unbounded (conversation lifecycle integration).

SkillTracker keeps an unbounded dict of activated skills per conversation. Without cleanup/TTL/LRU, long-running instances may accumulate entries and leak memory. Integrate with conversation deletion/cleanup and/or add eviction.


484-488: New: Tool registration mapping references read_skill instead of activate_skill.

The insertion-point table says “Tool handler needs to be registered for the read_skill function type”, but the design throughout uses activate_skill as the dedicated tool. Ensure the tool name is consistent in both narrative and tables.


496-533: Harden SKILL.md parsing and reference listing.

The SKILL.md parsing snippet uses a brittle frontmatter regex (LF-only assumptions, whitespace edge cases, and newline handling) and the reference listing doesn’t exclude symlinks. Improve robustness (CRLF + flexible delimiters) and security (skip symlinks / ensure resolved paths stay under the allowlisted directory).

Comment thread docs/design/agent-skills/agent-skills-spike.md
Comment thread docs/design/agent-skills/agent-skills.md
Comment on lines +450 to +459
### Error handling

| Scenario | Error |
|----------|-------|
| Skill path doesn't exist | Startup fails: "Skill path does not exist: {path}" |
| SKILL.md not found | Startup fails: "SKILL.md not found at {path}/SKILL.md" |
| Name mismatch | Startup fails: "Skill name mismatch: config has '{x}' but SKILL.md has '{y}'" |
| Invalid YAML frontmatter | Startup fails: "Invalid SKILL.md frontmatter: {error}" |
| Unknown skill in read_skill | Tool returns: {"error": "Unknown skill: {name}"} |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

New: Error payload format is inconsistent (JSON vs XML).

In the Error handling table, “Unknown skill in read_skill” states the tool returns JSON: {"error": "Unknown skill: {name}"}. But the handle_activate_skill() example returns an XML fragment: <error>Unknown skill: {name}</error>. Pick one canonical error format for the tool contract and make both doc locations agree.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 450 - 459, Pick a
single canonical error payload format (use JSON) and update both places to
match: change the Error handling table entry for "Unknown skill in read_skill"
and the handle_activate_skill() example so they both return {"error":"Unknown
skill: {name}"} (ensure exact key "error" and same message format), updating any
surrounding text to reflect JSON rather than XML so the tool contract is
consistent between read_skill and handle_activate_skill.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (8)
docs/design/agent-skills/agent-skills-spike.md (3)

9-9: ⚠️ Potential issue | 🟡 Minor

Clarify contradictory PoC wording.

Line [9] says “Not applicable for this spike” and immediately provides production validation evidence; that reads internally inconsistent. Please reword this as an explicit “standard already validated” statement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` at line 9, Replace the
contradictory "PoC validation" sentence so it explicitly states that this
capability is already validated in production: locate the "PoC validation"
section (heading "PoC validation") and change the text from "Not applicable for
this spike" plus the following evidence sentence to a single clear statement
such as "Already validated in production: implemented by 30+ agent products
including Claude Code, GitHub Copilot, Cursor, and OpenAI Codex," ensuring the
wording removes the "Not applicable" phrasing and presents the production
validation as the rationale.

167-171: ⚠️ Potential issue | 🟡 Minor

Remove hard-coded source line references from agentic instructions.

These instructions depend on exact line numbers (e.g., Line [169], Line [170], Line [198], Line [229], Line [306]) and will drift quickly. Reference file + symbol/function names only.

Also applies to: 196-201, 227-231, 254-257, 304-307

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` around lines 167 - 171,
Remove all hard-coded line-number references in agentic instructions and replace
them with file-and-symbol references; specifically, update references to point
to docs/design/agent-skills/agent-skills.md and the Configuration class in
src/models/config.py (symbol: Configuration) instead of Line
[169]/[170]/[198]/[229]/[306] (also fix the duplicated ranges 196-201, 227-231,
254-257, 304-307). Likewise, where the MCP server pattern is mentioned,
reference the ModelContextProtocolServer class (symbol:
ModelContextProtocolServer) and its implementation rather than any line numbers.
Ensure every instruction that currently cites a line number now cites the target
file and symbol/method name so the references remain stable as the codebase
changes.

117-121: ⚠️ Potential issue | 🟡 Minor

Doc lint failures are still present in fenced blocks/tables.

The current formatting still triggers markdownlint (blank lines around fences, missing language tag on the directory-tree fence, and table spacing around Line [337]).

Also applies to: 124-129, 317-323, 337-343

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` around lines 117 - 121, Fix
markdownlint failures by ensuring every fenced code block has a blank line
before and after the fence and that the directory-tree fence includes an
explicit language tag (e.g., use ```text or ```bash for the directory listing);
in particular update the YAML block containing "skills: paths: -
\"/var/skills/\"" and the nearby directory-tree fence to follow this rule. Also
add blank lines around and normalize spacing in the Markdown table near the
directory-tree (ensure a blank line before the table, a blank line after it, and
consistent single spaces around pipe separators) so the linter stops flagging
missing fence spacing and table formatting.
docs/design/agent-skills/agent-skills.md (5)

478-494: ⚠️ Potential issue | 🟠 Major

SkillTracker storage is unbounded and needs lifecycle/expiry controls.

Line [482] keeps a growing in-memory map with no TTL/size cap and no documented integration with conversation cleanup events, which risks long-lived memory growth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 478 - 494, The
SkillTracker currently keeps an unbounded in-memory map (_activated) that can
grow indefinitely; update SkillTracker to add lifecycle controls by (1)
supporting TTL-based expiry for entries (store timestamps and expire on access
or via background cleaner) or a size-limited LRU eviction policy, (2) exposing
explicit cleanup hooks (e.g., clear(conversation_id) already exists—add
clear_all or integrate with conversation deletion events) so callers can remove
conversation state when conversations end, and (3) making operations in methods
is_activated, mark_activated, and clear thread-safe (use a lock or a concurrent
map) and document the chosen strategy in the class docstring so consumers know
how to trigger cleanup.

324-333: ⚠️ Potential issue | 🟠 Major

Escape dynamic values before embedding in XML.

Line [328]-[329], Line [392], Line [395], and Line [404] interpolate unescaped content into XML tags/attributes. Skill metadata/content can break XML structure or inject unintended tags.

Also applies to: 389-405

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 324 - 333, The XML is
built by concatenating unescaped skill fields (skill.name, skill.description,
and other interpolated content), which can break XML or allow injection; fix by
escaping all dynamic values before embedding (use a standard XML escape function
like xml.sax.saxutils.escape or build the XML with an XML library), and update
the loop that appends f"<name>{skill.name}</name>" and
f"<description>{skill.description}</description>" (and the other interpolations
referenced) to use the escaped values consistently so no raw user content is
inserted into tags or attributes.

575-585: ⚠️ Potential issue | 🟠 Major

Reference listing should exclude symlinks to avoid path trust bypass.

The current listing includes any file under references/. Symlink entries can point outside the trusted skill directory and should be filtered out.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 575 - 585, The
reference listing in list_references currently returns any file under refs_dir,
which can include symlinks that bypass the trusted skill directory; update the
function (list_references, refs_dir) to skip symlinks by adding a check (e.g.,
ensure not f.is_symlink()) in the list comprehension (or filter) and/or resolve
the file and assert it lies within skill_path.resolve() before including it so
only regular files inside the skill's directory are returned.

113-117: ⚠️ Potential issue | 🟡 Minor

Fix remaining markdown fence/table formatting issues (including bad closing fence).

There are still lint-affecting formatting problems, and Line [701] uses an invalid closing fence (```markdown) instead of a proper terminator.

Also applies to: 120-125, 135-141, 317-323, 337-340, 656-701

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 113 - 117, The
markdown has broken/incorrect fenced-code and table terminators—specifically
replace any erroneous closing fence token like "```markdown" with the proper
terminator "```", ensure code blocks that show YAML (for example the block
containing the skills: paths: - "/var/skills/") begin with the correct language
tag (e.g., ```yaml) and end with ``` and verify all other fenced blocks and
tables use consistent opening/closing fences and proper pipe/heading syntax so
the linter no longer flags formatting (check occurrences around the blocks that
include the skills YAML, other fenced examples, and table regions and fix their
opening/closing fences and table pipe rows).

389-390: ⚠️ Potential issue | 🟠 Major

Unify tool error contract and tool name.

Line [511] still references read_skill and JSON, while Line [389] returns XML from activate_skill. Pick one canonical contract and use the same tool name/payload in both places.

Suggested doc fix (XML-consistent option)
-| Unknown skill in read_skill | Tool returns: {"error": "Unknown skill: {name}"} |
+| Unknown skill in activate_skill | Tool returns: "<error>Unknown skill: {name}</error>" |

Also applies to: 505-512

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 389 - 390, The docs
currently mix two error/payload formats and tool names: activate_skill returns
XML ("<error>...") while read_skill and its sample JSON responses use a
different contract; pick a single canonical contract (e.g., XML) and unify both
tools to use it. Update the read_skill examples and any references to the tool
name so they match the activate_skill contract and tool naming (ensure
read_skill/activate_skill both emit the same XML error format and identical
skill/tool name strings across examples and samples such as the sections
previously showing JSON), and update any sample payloads/schemas in the
surrounding text (including the parts referenced near the read_skill examples)
so all examples use the same unified contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/agent-skills/agent-skills-spike.md`:
- Around line 144-145: Replace the placeholder ticket keys "LCORE-????" used in
the headings (e.g., the "### LCORE-???? Add skill configuration model" entry)
with the actual Jira/issue tracker ticket IDs or a clear temporary marker (e.g.,
"LCORE-TBD") before merging; update every occurrence noted in the file (the
repeated headings at the other sections) so each heading is actionable and
traceable, and run a quick search for "LCORE-????" to ensure no instances
remain.

In `@docs/design/agent-skills/agent-skills.md`:
- Around line 549-570: The parser parse_skill_md currently only checks for
presence of "name" and "description" keys; update it to enforce the documented
Appendix constraints by validating that frontmatter["name"] matches the required
pattern (e.g., allowed characters/format from the Appendix), is non-empty after
trimming, and that frontmatter["description"] is a non-empty string and meets
any minimum/maximum length rules described in the Appendix; if any constraint
fails raise a ValueError with a clear message indicating which constraint (name
format, name emptiness, description emptiness/length) was violated so callers
get actionable feedback.
- Around line 437-438: The documentation's "Path restriction" paragraph claims
the skill tool returns a structured `base_path`, but the example response only
shows free-form text ("Skill directory: ..."), causing inconsistency; update the
example response (around the "Skill directory: ..." example at lines ~395-397)
to show a machine-readable payload that includes the `base_path` field (e.g., a
JSON object with a `base_path` key) and adjust the surrounding text to state
explicitly that the tool returns `base_path` for downstream path
construction/validation so consumers can parse it reliably.
- Around line 529-540: The implementation notes currently reference exact line
numbers which will drift; update the doc and diffs to use symbol-level anchors
instead: instruct editors to add an optional "skills:
Optional[SkillsConfiguration]" attribute to the Configuration class (refer to
the Configuration class symbol), perform path validation inside the
load_skills() function (refer to load_skills symbol) rather than by line number,
modify get_system_prompt() in src/utils/prompts.py to import
get_skill_instructions from utils/skills and append the behavioral instructions
before the function's return (refer to get_system_prompt symbol), and update
prepare_tools() in src/utils/responses.py to append list_skills and
activate_skill tools after the existing MCP tool construction and ensure
handlers for both list_skills and activate_skill function types are registered
(refer to prepare_tools symbol); replace any "line X" guidance in the docs with
these symbol-based anchors.

---

Duplicate comments:
In `@docs/design/agent-skills/agent-skills-spike.md`:
- Line 9: Replace the contradictory "PoC validation" sentence so it explicitly
states that this capability is already validated in production: locate the "PoC
validation" section (heading "PoC validation") and change the text from "Not
applicable for this spike" plus the following evidence sentence to a single
clear statement such as "Already validated in production: implemented by 30+
agent products including Claude Code, GitHub Copilot, Cursor, and OpenAI Codex,"
ensuring the wording removes the "Not applicable" phrasing and presents the
production validation as the rationale.
- Around line 167-171: Remove all hard-coded line-number references in agentic
instructions and replace them with file-and-symbol references; specifically,
update references to point to docs/design/agent-skills/agent-skills.md and the
Configuration class in src/models/config.py (symbol: Configuration) instead of
Line [169]/[170]/[198]/[229]/[306] (also fix the duplicated ranges 196-201,
227-231, 254-257, 304-307). Likewise, where the MCP server pattern is mentioned,
reference the ModelContextProtocolServer class (symbol:
ModelContextProtocolServer) and its implementation rather than any line numbers.
Ensure every instruction that currently cites a line number now cites the target
file and symbol/method name so the references remain stable as the codebase
changes.
- Around line 117-121: Fix markdownlint failures by ensuring every fenced code
block has a blank line before and after the fence and that the directory-tree
fence includes an explicit language tag (e.g., use ```text or ```bash for the
directory listing); in particular update the YAML block containing "skills:
paths: - \"/var/skills/\"" and the nearby directory-tree fence to follow this
rule. Also add blank lines around and normalize spacing in the Markdown table
near the directory-tree (ensure a blank line before the table, a blank line
after it, and consistent single spaces around pipe separators) so the linter
stops flagging missing fence spacing and table formatting.

In `@docs/design/agent-skills/agent-skills.md`:
- Around line 478-494: The SkillTracker currently keeps an unbounded in-memory
map (_activated) that can grow indefinitely; update SkillTracker to add
lifecycle controls by (1) supporting TTL-based expiry for entries (store
timestamps and expire on access or via background cleaner) or a size-limited LRU
eviction policy, (2) exposing explicit cleanup hooks (e.g.,
clear(conversation_id) already exists—add clear_all or integrate with
conversation deletion events) so callers can remove conversation state when
conversations end, and (3) making operations in methods is_activated,
mark_activated, and clear thread-safe (use a lock or a concurrent map) and
document the chosen strategy in the class docstring so consumers know how to
trigger cleanup.
- Around line 324-333: The XML is built by concatenating unescaped skill fields
(skill.name, skill.description, and other interpolated content), which can break
XML or allow injection; fix by escaping all dynamic values before embedding (use
a standard XML escape function like xml.sax.saxutils.escape or build the XML
with an XML library), and update the loop that appends
f"<name>{skill.name}</name>" and
f"<description>{skill.description}</description>" (and the other interpolations
referenced) to use the escaped values consistently so no raw user content is
inserted into tags or attributes.
- Around line 575-585: The reference listing in list_references currently
returns any file under refs_dir, which can include symlinks that bypass the
trusted skill directory; update the function (list_references, refs_dir) to skip
symlinks by adding a check (e.g., ensure not f.is_symlink()) in the list
comprehension (or filter) and/or resolve the file and assert it lies within
skill_path.resolve() before including it so only regular files inside the
skill's directory are returned.
- Around line 113-117: The markdown has broken/incorrect fenced-code and table
terminators—specifically replace any erroneous closing fence token like
"```markdown" with the proper terminator "```", ensure code blocks that show
YAML (for example the block containing the skills: paths: - "/var/skills/")
begin with the correct language tag (e.g., ```yaml) and end with ``` and verify
all other fenced blocks and tables use consistent opening/closing fences and
proper pipe/heading syntax so the linter no longer flags formatting (check
occurrences around the blocks that include the skills YAML, other fenced
examples, and table regions and fix their opening/closing fences and table pipe
rows).
- Around line 389-390: The docs currently mix two error/payload formats and tool
names: activate_skill returns XML ("<error>...") while read_skill and its sample
JSON responses use a different contract; pick a single canonical contract (e.g.,
XML) and unify both tools to use it. Update the read_skill examples and any
references to the tool name so they match the activate_skill contract and tool
naming (ensure read_skill/activate_skill both emit the same XML error format and
identical skill/tool name strings across examples and samples such as the
sections previously showing JSON), and update any sample payloads/schemas in the
surrounding text (including the parts referenced near the read_skill examples)
so all examples use the same unified contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ee53cbbc-c7e7-4916-89a8-c9befa176836

📥 Commits

Reviewing files that changed from the base of the PR and between 1afd720 and 6367c48.

📒 Files selected for processing (2)
  • docs/design/agent-skills/agent-skills-spike.md
  • docs/design/agent-skills/agent-skills.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: Pylinter
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md
📚 Learning: 2026-03-02T16:38:30.287Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md

Applied to files:

  • docs/design/agent-skills/agent-skills.md
📚 Learning: 2026-03-02T15:57:15.929Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-stack PR: 1250
File: AGENTS.md:11-12
Timestamp: 2026-03-02T15:57:15.929Z
Learning: In lightspeed-stack's AGENTS.md, the Linting Tools section lists actual tool names (e.g., mypy, pylint, pyright, ruff, black), not make target names. For example, `check-types` is a make target that runs mypy, but mypy is the tool name documented in the Linting Tools section.

Applied to files:

  • docs/design/agent-skills/agent-skills.md
🪛 LanguageTool
docs/design/agent-skills/agent-skills-spike.md

[style] ~25-~25: The phrase ‘do NOT have the ability to’ might be wordy. Consider using “can't”.
Context: ... Note: End users of LS app products do NOT have the ability to add skills, similar to how they cannot ...

(HAS_THE_ABILITY_TO)

docs/design/agent-skills/agent-skills.md

[style] ~16-~16: The phrase ‘do NOT have the ability to’ might be wordy. Consider using “can't”.
Context: ... Note: End users of LS app products do NOT have the ability to add skills, similar to how they cannot ...

(HAS_THE_ABILITY_TO)


[uncategorized] ~28-~28: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...main-specific expertise the LLM can use on demand - Sharing knowledge across deployments ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[style] ~49-~49: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...n issues - U2: As a skill author, I want to create a SKILL.md file with instruction...

(REP_WANT_TO_VB)


[style] ~51-~51: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...l to use - U4: As an LS app team, I want to deploy domain-specific skills so that t...

(REP_WANT_TO_VB)

🪛 markdownlint-cli2 (0.22.1)
docs/design/agent-skills/agent-skills-spike.md

[warning] 117-117: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 124-124: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 317-317: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 317-317: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 325-325: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 337-337: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

docs/design/agent-skills/agent-skills.md

[warning] 117-117: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 124-124: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 317-317: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 317-317: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 325-325: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 337-337: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

Comment on lines +144 to +145
<!-- key: LCORE-???? -->
### LCORE-???? Add skill configuration model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace placeholder ticket keys before merge.

The repeated LCORE-???? placeholders make the implementation plan non-actionable and untraceable in delivery tracking.

Also applies to: 174-175, 204-205, 234-235, 260-261, 286-287

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` around lines 144 - 145,
Replace the placeholder ticket keys "LCORE-????" used in the headings (e.g., the
"### LCORE-???? Add skill configuration model" entry) with the actual Jira/issue
tracker ticket IDs or a clear temporary marker (e.g., "LCORE-TBD") before
merging; update every occurrence noted in the file (the repeated headings at the
other sections) so each heading is actionable and traceable, and run a quick
search for "LCORE-????" to ensure no instances remain.

Comment on lines +437 to +438
**Path restriction**: File reads are restricted to configured skill directories. The skill tool returns the `base_path` so the LLM can construct valid paths like `{base_path}/references/guide.md`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reference access contract is inconsistent (base_path vs returned payload).

Line [437] says the tool returns base_path, but the example response only emits free-form text (Skill directory: ...) at Line [395]. This is brittle for downstream path construction/validation.

Also applies to: 395-397

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 437 - 438, The
documentation's "Path restriction" paragraph claims the skill tool returns a
structured `base_path`, but the example response only shows free-form text
("Skill directory: ..."), causing inconsistency; update the example response
(around the "Skill directory: ..." example at lines ~395-397) to show a
machine-readable payload that includes the `base_path` field (e.g., a JSON
object with a `base_path` key) and adjust the surrounding text to state
explicitly that the tool returns `base_path` for downstream path
construction/validation so consumers can parse it reliably.

Comment on lines +529 to +540
- Add `skills: Optional[SkillsConfiguration]` to `Configuration` class (around line 1852)
- Path validation happens at startup in `load_skills()` function

**System prompt injection** (`src/utils/prompts.py`):
- `get_system_prompt()` currently returns the resolved prompt at line 80
- Insert behavioral instructions append before the return statement
- Import `get_skill_instructions` from new `utils/skills.py` module

**Tool registration** (`src/utils/responses.py`):
- `prepare_tools()` at line 204 builds the tool list
- Add `list_skills` and `activate_skill` tools after MCP tools (around line 260)
- Tool handlers need to be registered for both `list_skills` and `activate_skill` function types
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid line-number-based insertion guidance in implementation notes.

Line [529], Line [533], and Line [538] anchor changes to exact line numbers, which will drift. Use symbol-level anchors (class/function names) for durable guidance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 529 - 540, The
implementation notes currently reference exact line numbers which will drift;
update the doc and diffs to use symbol-level anchors instead: instruct editors
to add an optional "skills: Optional[SkillsConfiguration]" attribute to the
Configuration class (refer to the Configuration class symbol), perform path
validation inside the load_skills() function (refer to load_skills symbol)
rather than by line number, modify get_system_prompt() in src/utils/prompts.py
to import get_skill_instructions from utils/skills and append the behavioral
instructions before the function's return (refer to get_system_prompt symbol),
and update prepare_tools() in src/utils/responses.py to append list_skills and
activate_skill tools after the existing MCP tool construction and ensure
handlers for both list_skills and activate_skill function types are registered
(refer to prepare_tools symbol); replace any "line X" guidance in the docs with
these symbol-based anchors.

Comment on lines +549 to +570
def parse_skill_md(content: str) -> tuple[dict, str]:
"""Parse SKILL.md into frontmatter dict and body string."""
# Match YAML frontmatter between --- delimiters
match = re.match(r"^---\s*\n(.*?)\n---\s*\n(.*)$", content, re.DOTALL)
if not match:
raise ValueError("SKILL.md must have YAML frontmatter between --- delimiters")

frontmatter_text, body = match.groups()

try:
frontmatter = yaml.safe_load(frontmatter_text)
except yaml.YAMLError as e:
raise ValueError(f"Invalid YAML frontmatter: {e}") from e

if not isinstance(frontmatter, dict):
raise ValueError("Frontmatter must be a YAML mapping")

if "name" not in frontmatter:
raise ValueError("SKILL.md frontmatter must include 'name' field")

if "description" not in frontmatter:
raise ValueError("SKILL.md frontmatter must include 'description' field")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Parser validation does not enforce documented frontmatter constraints.

The parser only checks key presence (Line [566]-[570]), while Appendix constraints (Line [649]-[650]) require stricter name/description validity. Align parsing validation with the documented contract.

Also applies to: 647-650

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 549 - 570, The parser
parse_skill_md currently only checks for presence of "name" and "description"
keys; update it to enforce the documented Appendix constraints by validating
that frontmatter["name"] matches the required pattern (e.g., allowed
characters/format from the Appendix), is non-empty after trimming, and that
frontmatter["description"] is a non-empty string and meets any minimum/maximum
length rules described in the Appendix; if any constraint fails raise a
ValueError with a clear message indicating which constraint (name format, name
emptiness, description emptiness/length) was violated so callers get actionable
feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants