Skip to content

fix: add discoverability headers#131

Merged
VVoruganti merged 1 commit intov4from
vineeth/seo-headers
Apr 20, 2026
Merged

fix: add discoverability headers#131
VVoruganti merged 1 commit intov4from
vineeth/seo-headers

Conversation

@VVoruganti
Copy link
Copy Markdown

@VVoruganti VVoruganti commented Apr 20, 2026

Summary by CodeRabbit

New Features

  • Implemented standardized discovery protocols for search engines and AI crawlers to improve content visibility
  • Added headers advertising sitemap and RSS feeds to enhance content indexing and discoverability
  • Introduced granular per-crawler policies enabling fine-grained control over search indexing and AI model training permissions

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Walkthrough

Two configuration files were added to enhance SEO and crawler discoverability: HTTP Link headers advertising sitemap, RSS feed, llms.txt, and robots.txt resources for agent discovery (RFC 8288/9727); plus a comprehensive robots.txt file declaring the sitemap and defining per-crawler policies with Content-Signal directives.

Changes

Cohort / File(s) Summary
SEO and Crawler Configuration
static/_headers, static/robots.txt
Added HTTP Link headers for root (/) and /index.html advertising sitemap.xml, RSS feed (index.xml), llms.txt resources, and robots.txt. Created comprehensive robots.txt with sitemap declaration, default allow-all policy (except /static/), and per-crawler User-agent blocks with Content-Signal directives controlling search, ai-input, and ai-train permissions for multiple AI and web crawlers (OpenAI, Anthropic, Google, Meta, Perplexity, etc.).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A rabbit hops through headers bright,
With links that guide the crawlers' flight!
Robots.txt declares the way,
Content signals save the day—
Discovery blooms in plain sight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning No pull request description was provided by the author, violating the repository's template requirement to describe changes in detail and verify human effort. Add a detailed description explaining the purpose of the headers, why these specific discovery mechanisms were chosen, and any implications for the site's SEO and crawler access policies.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding HTTP Link headers and related discoverability mechanisms (headers file, robots.txt) for SEO and agent discovery.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch vineeth/seo-headers

Comment @coderabbitai help to get the list of available commands and usage tips.

Warning

⚠️ This pull request might be slop. It has been flagged by CodeRabbit slop detection and should be reviewed carefully.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
static/robots.txt (2)

5-5: Avoid hardcoding the sitemap host.

Line 5 hardcodes the production domain, which can drift from cfg.baseUrl across preview/staging/custom-domain deployments. Prefer generating this from config (or injecting it at build time) to keep one source of truth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@static/robots.txt` at line 5, The robots.txt currently hardcodes the sitemap
host; instead, update the robots.txt generation to use the canonical base URL
from your configuration (e.g., cfg.baseUrl) or inject it at build time so the
line becomes "Sitemap: {cfg.baseUrl}/sitemap.xml"; locate where robots.txt is
produced (static/robots.txt template or the build/deploy script) and replace the
fixed "https://blog.plasticlabs.ai" with a template variable or injected value
so preview/staging/custom domains use the correct host.

10-10: Recheck whether Disallow: /static/ is intentional.

Line 10 may not affect crawl behavior if your emitted public paths don’t actually live under /static/. If this is a policy guard, consider pointing it at real emitted paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@static/robots.txt` at line 10, Confirm whether the "Disallow: /static/"
directive is intentional; if not, remove it or replace it with the actual public
asset path(s) your build emits (update the "Disallow: /static/" entry to the
correct emitted path(s) or delete it entirely). Locate the line containing the
literal "Disallow: /static/" in robots.txt, verify which public paths are served
by your app, and update the directive to match those real emitted paths (or
remove the directive if no blocking is desired).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@static/robots.txt`:
- Line 5: The robots.txt currently hardcodes the sitemap host; instead, update
the robots.txt generation to use the canonical base URL from your configuration
(e.g., cfg.baseUrl) or inject it at build time so the line becomes "Sitemap:
{cfg.baseUrl}/sitemap.xml"; locate where robots.txt is produced
(static/robots.txt template or the build/deploy script) and replace the fixed
"https://blog.plasticlabs.ai" with a template variable or injected value so
preview/staging/custom domains use the correct host.
- Line 10: Confirm whether the "Disallow: /static/" directive is intentional; if
not, remove it or replace it with the actual public asset path(s) your build
emits (update the "Disallow: /static/" entry to the correct emitted path(s) or
delete it entirely). Locate the line containing the literal "Disallow: /static/"
in robots.txt, verify which public paths are served by your app, and update the
directive to match those real emitted paths (or remove the directive if no
blocking is desired).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c1a2889b-553b-4efe-8709-0587c817a268

📥 Commits

Reviewing files that changed from the base of the PR and between f885c48 and 20ffb95.

📒 Files selected for processing (2)
  • static/_headers
  • static/robots.txt

@VVoruganti VVoruganti merged commit 4e6e35c into v4 Apr 20, 2026
8 checks passed
@VVoruganti VVoruganti deleted the vineeth/seo-headers branch April 20, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant