Validate llms.txt has the required H1 heading#4
Open
federicobartoli wants to merge 3 commits intoaddyosmani:mainfrom
Open
Validate llms.txt has the required H1 heading#4federicobartoli wants to merge 3 commits intoaddyosmani:mainfrom
federicobartoli wants to merge 3 commits intoaddyosmani:mainfrom
Conversation
The llms.txt spec requires a single H1 with the project or site name as the first element in the ordered structure. The checker didn't verify this, so a file with no H1 (or multiple H1s) passed as well-formed. - error when no H1 is present - warning when multiple H1s are present - warning when the H1 is not the first content in the file Fenced code blocks are stripped before matching so '# comment' lines inside bash etc. aren't counted as H1s. Spec: https://llmstxt.org
- Check the H1 position against the original content, not the code-block-stripped version. Previously, a file starting with a fenced code block followed by an H1 was treated as "H1 first" because stripping hoisted the H1 to the top of the analyzed text. - Allow up to 3 spaces of leading indentation before the '#' as CommonMark does. Adds a regression fixture (code block before the H1) that now correctly produces the "not the first content" warning.
- Strip a leading BOM before running the position check, so files saved by editors that add one are not incorrectly flagged as "H1 not first content". - Normalize setext H1 syntax (Title\n=====) to ATX before matching, so a spec-compliant setext H1 is recognized. Fixtures and tests added for both cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The llms.txt spec requires a single H1 with the project or site name as the first element in the ordered structure. The checker didn't verify this, so a file without an H1 (or with multiple H1s, or with the H1 appearing after other content) was treated as well-formed.
This PR adds three findings, implemented as a small
validateH1helper socheck()stays readable:errorwhen no H1 is presentwarningwhen multiple H1s are presentwarningwhen the H1 is not the first content in the fileinfowith the detected H1 on successFenced code blocks (
`` /~~~) are stripped before matching so `# comment` lines inside bash etc. aren't counted as H1s.Scoring is unchanged — this PR is purely additive, so no existing assertion or user-facing score shifts. Follow-ups can validate the other spec rules (blockquote summary, "Optional" H2 section, file-list link format) the same way.
Test plan
npm testpasses (25/25, including a newllms-no-h1fixture that asserts the missing-H1 error finding)good-sitefixture still scores the same (already has# ExampleDocsas its H1)