Skip to content

Investigation: pluggable content policy engine for org-specific validation rules #67

@dacharyc

Description

@dacharyc

Summary

Investigate how skill-validator could support organizations defining their own content policies and passing them to the validator for checking. The goal is to let orgs enforce domain-specific rules (e.g., forbidden patterns, required disclaimers, restricted domains) without those policies being baked into the core validator.

Motivation

skill-validator currently performs structural validation: well-formedness, frontmatter schema, token budgets, link integrity, path traversal, orphan detection. These checks are universal — they apply to every skill regardless of where it's deployed.

However, organizations running internal skill marketplaces or curated registries often have content policies that go beyond structural correctness. Examples:

  • Privilege escalation keywords: Flag skills that instruct agents to use sudo, --no-verify, chmod 777, or phrases like "bypass security" or "disable authentication." These are rarely legitimate in skills about project workflows but may be fine in skills about Linux system administration.
  • Destructive command patterns: Flag rm -rf, DROP TABLE, git push --force (without --force-with-lease), DELETE FROM (without WHERE), kill -9, dd if=. Some skills (like git-workflow skills) legitimately reference these as documentation, so these should be flaggable rather than hard-blocked.
  • Data exfiltration patterns: Flag suspicious domains (ngrok.io, requestbin.com, webhook.site, raw IPs in URLs), pipe-to-shell patterns (curl ... | sh, wget ... | bash), non-HTTPS external URLs, and large base64-encoded blobs.
  • Hidden content in HTML comments: Flag <!-- ... --> blocks in Markdown files, which are invisible when rendered but present in the agent's raw context window. Whether this is a concern depends on the org's threat model and skill content — skills that include HTML templates as assets may legitimately contain comments.

These are all legitimate concerns, but they're inherently subjective and domain-dependent. A "destructive command" policy makes sense for a corporate skill marketplace but would be noise for an open-source skill about database administration.

Questions to investigate

  1. What format should policy definitions use? Options include:

    • A YAML/JSON config file with pattern lists and severity levels
    • A directory of rule files (one per check) with metadata
    • A Go plugin interface for compiled rules
    • Some combination
  2. How should policies integrate with the CLI? Options include:

    • --policy-file rules.yaml flag
    • A .skill-validator/policy.yaml convention in the repo root
    • Environment variable pointing to a policy directory
  3. How should policy findings be reported? Should they use the same result types (ERROR/WARNING/INFO) as structural checks, or a separate "POLICY" category? Should they be distinguishable in JSON output?

  4. What's the right boundary between "structural" and "policy"? The current implicit rule is: if a check can have legitimate false positives depending on the skill's domain, it's policy. If it catches something that's never legitimate in any skill file, it's structural. Is this the right line?

  5. Should policies be able to scope by file type or path? An org might want to flag HTML comments in SKILL.md (where hidden instructions are suspicious) but allow them in assets/*.html (where they're part of a template). Glob-based scoping could handle this.

Non-goals

  • Building a full rules engine or DSL. The first iteration should be simple enough to define in a config file.
  • Handling natural-language analysis (e.g., "does this skill encourage bad practices?"). That's the domain of the LLM judge, not static analysis.
  • Replacing platform-specific CI linting. Orgs will always have some checks that are too specific for a general-purpose config format. The goal is to cover the common 80%.

Desired outcome

A design proposal (can be an issue comment or a short doc in the repo) covering the recommended approach, with enough detail to implement a first version. The first iteration doesn't need to be perfect — it just needs to be a credible foundation that orgs can start using.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions