diff --git a/README.md b/README.md index 2fc6df4..56fbabb 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ docs agent-friendly. The spec focuses on meeting the technical constraints of agent platforms (truncation limits, content negotiation, discovery); it does not consider qualitative evaluation of content. -**Status**: Draft (v0.3.0) +**Status**: Draft (v0.4.0) **Full spec**: [SPEC.md](SPEC.md) | **Website**: [agentdocsspec.com](https://agentdocsspec.com) @@ -38,7 +38,7 @@ The spec defines **22 checks across 7 categories**: | Page Size | 4 | Rendering strategy (SPA/CSR detection), markdown size, HTML size (pre/post conversion), content start position | | Content Structure | 3 | Tabbed content serialization blowup, section header quality, code fence validity | | URL Stability | 2 | Soft 404 detection, redirect behavior | -| Observability | 3 | `llms.txt` freshness, markdown/HTML content parity, cache header hygiene | +| Observability | 3 | `llms.txt` coverage, markdown/HTML content parity, cache header hygiene | | Authentication | 2 | Auth gate detection, alternative access paths for gated content | Each check has defined pass/warn/fail criteria, an automation level, and diff --git a/SPEC.md b/SPEC.md index eeac780..8bea41f 100644 --- a/SPEC.md +++ b/SPEC.md @@ -3,8 +3,8 @@ | | | |--------------|--------------------------------------------------------------| | **Status** | Draft | -| **Version** | 0.3.0 | -| **Date** | 2026-03-31 | +| **Version** | 0.4.0 | +| **Date** | 2026-04-21 | | **Author** | Dachary Carey + community contributors | | **URL** | https://agentdocsspec.com | | **Repository** | https://github.com/agent-ecosystem/agent-docs-spec | @@ -133,7 +133,7 @@ are ordered by impact based on observed agent behavior: 6. **Monitor your agent-facing resources.** Treat `llms.txt` and markdown endpoints like any other production surface: check freshness, verify content parity with HTML, and ensure cache headers allow timely updates. - Checks: `llms-txt-freshness`, `markdown-content-parity`, + Checks: `llms-txt-coverage`, `markdown-content-parity`, `cache-header-hygiene` ## Spec Structure @@ -166,7 +166,7 @@ Some checks depend on the results of others: - `markdown-code-fence-validity` only runs if `markdown-url-support` or `content-negotiation` passes (the site must serve markdown for this check to apply). It also runs against any discovered `llms.txt` files. -- `llms-txt-freshness` only runs if `llms-txt-exists` passes. +- `llms-txt-coverage` only runs if `llms-txt-exists` passes. - `auth-alternative-access` only runs if `auth-gate-detection` returns warn or fail (the site must have auth-gated content for alternative access paths to be relevant). @@ -712,28 +712,33 @@ that's only optimized for the markdown path is leaving most agents behind. ### `page-size-html` - **What it checks**: The character count of the HTML response, and the - character count after simulating an HTML-to-markdown conversion (using a - Turndown-equivalent pipeline). Reports both numbers. + character count after converting HTML to markdown (simulating what an + agent's processing pipeline produces). Reports both numbers. - **Why it matters**: Most agents receive HTML, not markdown. The raw HTML size determines whether the page even fits in the fetch buffer (Claude Code caps - at ~10MB). The post-conversion size is closer to what the agent's - summarization model actually sees, but conversion is lossy and - unpredictable. A 500KB HTML page might convert to 50KB of useful markdown - (safe) or 400KB of markdown including raw CSS text that survived conversion - (not safe). Both numbers matter. + at ~10MB). The post-conversion size is closer to what the agent actually + processes, but conversion pipelines vary across agents and are lossy and + unpredictable. Navigation boilerplate, serialized tabbed content, and + deeply nested page structure can all inflate the converted output well + beyond the documentation content itself. Both raw and post-conversion + sizes matter. - **Result levels** (based on post-conversion size, since that's what the model receives): - **Pass**: Converted content under 50,000 characters. - **Warn**: Converted content between 50,000 and 100,000 characters. - **Fail**: Converted content over 100,000 characters. - **Recommended action**: - - **Warn**: Review pages for reducible inline CSS/JS. Consider providing - markdown versions as a smaller alternative path for agents. - - **Fail**: Reduce inline CSS/JS, break large pages into smaller units, or - provide markdown versions that bypass the HTML conversion overhead. -- **Automation**: Full. Use a Turndown-equivalent library with default - configuration (no explicit `