Skip to content

Split llms-txt-directive into separate HTML and markdown checks #58

@dacharyc

Description

@dacharyc

Summary

The current llms-txt-directive check has two bugs and a design gap. It should be split into two checks with more precise detection logic, following spec updates in agent-ecosystem/agent-docs-spec#23.

Bugs in current implementation

1. False positive: sidebar nav matches

The regex /llms\.txt/gi matches any occurrence of the string "llms.txt" in the HTML body, including sidebar navigation items. Mintlify passes this check because every page has a nav link to their docs page about llms.txt:

<li id="/ai/llmstxt" data-title="llms.txt">
  <a href="/docs/ai/llmstxt"><span>llms.txt</span></a>
</li>

This appears at ~7.7% of the body (under the 10% "near top" threshold), so every page passes. But Mintlify's actual directive is a blockquote that only exists in the markdown version of pages (/docs/quickstart.md), which the check never fetches.

2. Markdown pages never checked for HTML-discovered URLs

The fallback at line 124 of llms-txt-directive.ts only tries the original .md URL if the discovered URL was already a .md URL. Since discoverAndSamplePages returns HTML page URLs (from sitemap/llms.txt), toHtmlUrl() is a no-op and the fallback never fires. The markdown version of the page is never fetched.

Proposed fix

Blocked on spec updates in agent-ecosystem/agent-docs-spec#23. Once the spec defines these as two separate checks:

llms-txt-directive-html

  • Fetch the HTML page
  • Search within the <body> but exclude content inside <nav>, <script>, <style>, and JSON-LD/structured data blocks
  • Look for more specific patterns: links whose href points to /llms.txt, or visually-hidden elements (sr-only, clip-rect) containing "llms.txt" or "documentation index"
  • Incidental mentions in navigation or page content discussing llms.txt as a feature should not count

llms-txt-directive-md

  • For each sampled page URL, probe for the markdown version using toMdUrls() candidates
  • Search the markdown content for a directive (blockquote or text block) near the top
  • This check should have its own pass/warn/fail criteria independent of the HTML check

Files involved

  • src/checks/content-discoverability/llms-txt-directive.ts — main check logic
  • test/unit/checks/llms-txt-directive.test.ts — tests
  • docs/checks/content-discoverability.md — check documentation
  • src/helpers/to-md-urls.ts — URL conversion helpers (already has toMdUrls())

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions