Skip to content

Split llms-txt-directive into separate HTML and markdown checks #23

@dacharyc

Description

@dacharyc

Summary

The current llms-txt-directive check conflates two distinct signals that serve different purposes and audiences. The spec should be updated to define these as separate checks with independent pass/fail criteria.

Background

The check was originally based on observing that Mintlify (used by Claude Code docs) adds a blockquote directive at the top of every markdown page:

> ## Documentation Index
> Fetch the complete documentation index at: https://www.mintlify.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

This directive lives in the markdown version of the page (e.g., /docs/quickstart.md). It does not appear in the HTML version. However, an equivalent directive in the HTML has as much or more value for agents that fetch HTML pages, pointing them to the fact that markdown and/or an llms.txt index exists.

The two signals

HTML directive

  • Audience: Agents fetching rendered HTML pages
  • Purpose: Tell agents that a documentation index exists at /llms.txt and that markdown versions of pages may be available
  • Implementation: A visually-hidden element (e.g., sr-only class, CSS clip-rect) or visible link in the DOM. Example:
    <div class="sr-only" style="position:absolute;width:1px;height:1px;padding:0;margin:-1px;overflow:hidden;clip:rect(0,0,0,0);white-space:nowrap;border:0">
      For AI agents: a documentation index is available at /llms.txt — markdown
      versions of all pages are available by appending index.md to any URL path.
    </div>
  • Detection considerations: Must exclude incidental matches in nav items, JSON-LD metadata, <script> blocks, or page content that merely discusses llms.txt as a feature

Markdown directive

  • Audience: Agents fetching markdown versions of pages
  • Purpose: Point agents to the llms.txt index so they can discover the full documentation map
  • Implementation: A blockquote or text block near the top of the markdown content. Example:
    > ## Documentation Index
    > Fetch the complete documentation index at: https://example.com/llms.txt
    > Use this file to discover all available pages before exploring further.

Proposed changes

  1. Split the current llms-txt-directive spec entry into two checks:
    • llms-txt-directive-html: Checks for a directive in the HTML DOM
    • llms-txt-directive-md: Checks for a directive in the markdown version of pages
  2. Define distinct detection criteria, result levels, and recommended actions for each
  3. Clarify that incidental mentions of "llms.txt" (in navigation, metadata, or content discussing the standard) do not count as a directive

Context

This came up when investigating why Mintlify docs pass the current check while Fern docs get a WARN. Mintlify passes because every HTML page has a sidebar nav link to their docs page about llms.txt (<li data-title="llms.txt">), which the broad regex matches. The actual directive only exists in Mintlify's markdown pages, which the check never fetches. Meanwhile, Fern's 6/50 matches are likely pages that discuss llms.txt as a feature rather than pages containing an agent-facing directive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions