feat: dynamic sitemap with page router, add daily cron referesh and google console submission by amaan-bhati · Pull Request #355 · keploy/blog-website

amaan-bhati · 2026-03-31T14:04:56Z

feat: Dynamic Sitemap Generation with Cron Refresh, Snapshot Fallback, and Google Search Console Auto-Submission

The blog previously served a static, manually maintained sitemap.xml, new posts, authors, and tags were never reflected unless someone updated it by hand. This PR replaces it with a fully automated sitemap pipeline at https://keploy.io/blog/sitemap.xml that crawls WordPress GraphQL, refreshes daily via Vercel cron, maintains a three-tier snapshot fallback so crawlers always receive valid XML during outages, and auto-submits to Google Search Console after every successful refresh.

Iteration History

multiple rounds of GitHub Copilot code review issues addressed across iterations: cursor pagination infinite-loop guard, CSP regex exclusion for the sitemap route, CRON_SECRET misconfiguration (500 vs 401 distinction), no-store on 503, /tmp snapshot edge cases, TypeScript strict null guards, empty tag slug guard, and maxDuration tuning.

File and Folder Structure

blog-website/
├── lib/
│   ├── sitemap.ts                            # Core: paginator, retry, entry builders, fallback, serialization
│   └── google-search-console.ts             # GSC OAuth 2.0 JWT flow + sitemap submission (no Google SDK)
├── pages/
│   ├── sitemap.xml.ts                        # SSR route serving /blog/sitemap.xml (maxDuration: 60)
│   └── api/cron/
│       └── refresh-sitemap.ts               # Cron endpoint: auth guard, refresh, GSC submit (maxDuration: 300)
├── scripts/
│   └── submit-sitemap-to-search-console.mjs # Standalone script to verify GSC credentials locally
├── tests/e2e/
│   ├── Sitemap.spec.ts                       # E2E: HTTP 200, Content-Type, Cache-Control, XML structure
│   └── RefreshSitemapCron.spec.ts            # E2E: 401/405/200 auth + method guards, response shape
└── vercel.json                               # Cron schedule 0 0 * * *, cache headers, CSP exclusions

WordPress GraphQL Paginator

fetchAllPosts() paginates with first: 50, after: cursor ordered by modified DESC.

25s timeout per request (AbortSignal.timeout)
6 retry attempts, linear backoff: 2000ms × attempt (2s → 12s, max 42s total)
250ms settle delay between pages to reduce WPGraphQL pressure
Retryable: 408, 429, 500–504, AbortError, TypeError, network failures, and GraphQL-level errors on 200 OK (WPGraphQL returns these during plugin reload / DB lock)
Non-retryable: other 4xx, missing data field
Cursor guard: if WordPress returns hasNextPage: true with no endCursor, throws immediately to prevent infinite page-1 re-fetch

Category-to-Route Mapping

mapCategoriesToRoutes() maps WP categories to "technology" or "community" by matching both slug and name (lowercased) to handle editorial inconsistencies. Posts with no matching category are excluded.

Entry Builders

Posts: priority: 0.8 if modified in last 30 days, 0.5 otherwise; changefreq: weekly; slug is encodeURIComponent-encoded.

Authors: lastmod = newest post by that author; priority: 0.7.

Tags: one entry per unique tag from included posts; tag display names sanitized to URL slugs via sanitizeStringForURL(); empty/whitespace tags skipped; priority: 0.7.

Static routes : 7 hardcoded entries; lastmod set to the newest post modified time so listing pages reflect freshest content.

XML Serialization

Manual generation: no external library. All values passed through escapeXml() (&, ", ', <, >). Priority formatted via .toFixed(1). dedupeEntries() uses a Map keyed by URL to eliminate any overlapping entries before serialization.

Three-Tier Snapshot Fallback

When fresh generation fails:

Tier 1: In-memory (lastSuccessfulSitemapXml): updated after every successful refresh; instant, no I/O; survives within the same Lambda instance.
Tier 2 : /tmp file (/tmp/keploy-blog-sitemap.xml): written after every successful refresh; validated by isValidSitemapXml() (checks XML declaration, <urlset> namespace, closing tag) before use.
Tier 3: Static-only fallback (getStaticFallbackXml()): 7 hardcoded routes returned with HTTP 503 + Cache-Control: no-store : never cached by edge, enabling immediate recovery once WordPress is back.

How /tmp works: Vercel exposes a writable /tmp directory (up to 500 MB) per serverless function instance, scoped to that instance's lifetime. It is not a shared filesystem — different instances have independent /tmp directories. Source: Vercel Functions: Runtimes.

How in-memory state works: Node.js module-level variables (like lastSuccessfulSitemapXml) persist across multiple requests handled by the same warm instance. When Vercel reuses an existing Lambda container for a new invocation (warm start), the module is not re-evaluated : variable state is retained. On a cold start or a new instance, the module is re-evaluated and the variable resets to null. Source: Vercel Functions: Concepts, Vercel: Improving Cold Start Performance.

Concurrency Guard

refreshSitemapPromise is a module-level deduplication guard. Concurrent callers share one in-flight crawl rather than each triggering an independent WordPress fetch. Cleared in .finally() so the next call after resolution starts fresh.

Cron Endpoint Security

Auth checked before method — prevents leaking valid HTTP methods to unauthenticated callers

GET only : 405 Method Not Allowed with Allow: GET header for anything else
Missing CRON_SECRET → 500 (misconfiguration), not 401 (wrong token) : distinguishes deployment error from auth failure
Google submission is non-blocking : a GSC failure returns 200 ok: true; sitemap refresh is never coupled to Google's availability

Vercel automatically injects Authorization: Bearer <CRON_SECRET> on every cron invocation when CRON_SECRET is set in project settings. Source: Vercel Cron Jobs : Quickstart, Vercel : Managing Cron Jobs.

Google Search Console Integration

Full OAuth 2.0 service account flow, no third-party Google SDK:

RS256-signed JWT constructed from GOOGLE_SERVICE_ACCOUNT_EMAIL + GOOGLE_SERVICE_ACCOUNT_PRIVATE_KEY using Node.js crypto.createSign('RSA-SHA256'): per RFC 7518 (JWA) and RFC 7519 (JWT)
JWT exchanged for OAuth access token via grant_type: urn:ietf:params:oauth:grant-type:jwt-bearer at oauth2.googleapis.com/token: per RFC 7523 and Google OAuth 2.0 Service Account docs
PUT to googleapis.com/webmasters/v3/sites/{siteUrl}/sitemaps/{sitemapUrl}: Search Console Sitemaps API
Private key \\n sequences replaced with real newlines before signing: Vercel stores multi-line env vars with literal \n; crypto.createSign requires a valid PEM with real line breaks. Source: Node.js crypto docs

Entirely optional : isSearchConsoleSubmissionConfigured() checks all three required env vars; missing any one skips the step with skipped: true in the response.

Cache-Control Strategy

Response	Cache-Control
200 sitemap	`public, max-age=0, s-maxage=86400, stale-while-revalidate=86400`
503 static fallback	`no-store`
`/blog/api/(.*)`	`no-store`
`/_next/static/(.*)`	`public, max-age=31536000, immutable`
All other `/blog/` pages	`public, max-age=3600, s-maxage=86400, stale-while-revalidate=604800`

Sitemap is excluded from CSP headers in both vercel.json and next.config.js via regex : XML responses have no use for CSP.

Build-Time Validation

next.config.js calls URL.canParse(process.env.WORDPRESS_API_URL) at build time: the build fails with a clear error if the variable is missing or invalid, preventing a silent misconfigured deploy. (as suggested and imroved by copilot multiple times)

Edge Cases Covered

Edge Case	How It Is Handled
WordPress completely unreachable	Three-tier fallback → in-memory → `/tmp` → static 503
WordPress returns partial data	`assertFullSitemap()` throws if < 5 posts per category
`hasNextPage: true` but no cursor	Throws immediately; prevents infinite page-1 re-fetch
GraphQL errors on 200 OK	Treated as retryable; up to 6 attempts
Non-retryable 4xx	Fails fast without exhausting retry budget
Concurrent refresh requests	Shared `refreshSitemapPromise` — one crawl, all callers wait
Corrupted `/tmp` snapshot	`isValidSitemapXml()` rejects it; falls to static fallback
`CRON_SECRET` not set	Returns `500` (misconfiguration), not `401`
GSC credentials wrong / Google down	Caught; cron still returns `200 ok: true`
Post with no matching category	Silently excluded from sitemap
Author name normalizes to empty slug	Skipped; no empty `/authors/` URL emitted
Tag name empty or normalizes to empty slug	Skipped via `flatMap` + early return guard
XML-sensitive chars in URLs/dates	`escapeXml()` escapes `&`, `"`, `'`, `<`, `>`
Duplicate URLs across generation paths	`dedupeEntries()` Map-deduplication before serialization
503 cached by edge	`no-store` on 503; edge retries origin on next request
Private key stored with literal `\n`	`.replace(/\\n/g, "\n")` before JWT signing

Testing

E2E Tests (Playwright)

tests/e2e/Sitemap.spec.ts: GET /sitemap.xml → HTTP 200, Content-Type: application/xml, Cache-Control has s-maxage=86400 + max-age=0 + stale-while-revalidate=86400, valid XML declaration, correct <urlset> namespace, core <loc> entries present.

tests/e2e/RefreshSitemapCron.spec.ts: no auth → 401; wrong token → 401; POST with valid token → 405 + Allow: GET; GET with valid token → 200, ok: true, entryCount > 0, generatedAt as string, searchConsole.submitted as boolean.

Live Verification Results

Check	Result
HTTP status	200
XML parse (Python `xml.etree`)	PASS — well-formed
Total URLs	1,561 (0 duplicates)
All entries have `<lastmod>`	1,561 / 1,561
Static / Technology / Community / Authors / Tags	7 / 37 / 457 / 86 / 974
`/tmp` snapshot written	YES — 267,306 bytes
Cron (valid auth)	`200 {ok: true, entryCount: 1561, searchConsole.submitted: true}`
Cron (wrong secret / no auth / POST)	401 / 401 / 405
GSC submission	SUBMITTED to `sc-domain:keploy.io`
End-to-end crawl time	~13 seconds

Priority distribution: 1.0 × 1, 0.9 × 2, 0.8 × 84, 0.7 × 1060, 0.6 × 2, 0.5 × 412.

Local GSC Verification Script, this was a script i generated using claude for testing and the steps were completed successfully

node scripts/submit-sitemap-to-search-console.mjs

Mirrors the production JWT + submission flow; loads from .env.local; exits code 1 with structured JSON error on failure. Use this to verify credentials before deploy without running the full app.

…d failure Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

kilo-code-bot · 2026-03-31T14:07:03Z

Code Review Summary

Status: 4 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	4
SUGGESTION	0

Issue Details (click to expand)

WARNING

File	Line	Issue
`pages/sitemap.xml.ts`	9	Missing error handling for `generateSitemapXml()` - if all fallbacks fail, unhandled errors will cause 500 responses without actionable context
`lib/sitemap.ts`	443	Error log in `persistSitemapSnapshot()` lacks actionable next steps per project guidelines
`pages/api/cron/refresh-sitemap.ts`	71	Error log for sitemap refresh failure lacks actionable next steps per project guidelines
`pages/api/cron/refresh-sitemap.ts`	47	NEW Error log for Google Search Console submission failure lacks actionable next steps per project guidelines

Improvements Since Last Review

Google Search Console integration added: New lib/google-search-console.ts implements sitemap submission to Google Search Console using service account JWT authentication
Graceful degradation: Search Console submission failure doesn't block the cron response - errors are captured and returned in the response body
Configuration check: isSearchConsoleSubmissionConfigured() allows optional Search Console submission based on environment variables
Robust retry logic: fetchGraphQL() includes exponential backoff retry with configurable limits (6 retries, 2s delay multiplier, 25s timeout)
Fallback mechanism: generateSitemapXml() caches successful sitemaps in memory and persists to /tmp for resilience
Validation added: assertFullSitemap() ensures both technology and community posts are present before serving
Promise coalescing: refreshSitemapSnapshot() deduplicates concurrent refresh requests using a shared promise
XML validation: isValidSitemapXml() validates persisted snapshot before serving stale data

Incremental Review (aff778d..4c2eeb1)

Changes reviewed:

lib/google-search-console.ts - NEW Google Search Console sitemap submission module with JWT auth
pages/api/cron/refresh-sitemap.ts - Added Search Console submission after sitemap refresh (1 new issue)

New issue introduced: Error log for Search Console submission failure lacks actionable next steps.

Positive observations in new code:

Proper service account JWT creation with RS256 signing
Good error handling with informative error messages including response body
Configuration check before attempting submission prevents unnecessary API calls
Failure doesn't break the cron job - gracefully reports submission status in response
URL encoding for siteUrl and sitemapUrl in the API call prevents injection issues

Additional Notes

Positive observations:

Excellent fallback strategy: in-memory cache → persisted snapshot → throw
Good use of proper XML escaping in escapeXml() to prevent injection attacks
Appropriate deduplication of sitemap entries
Good caching strategy with s-maxage=86400, stale-while-revalidate=86400
Sequential fetching with settle delay reduces burst pressure on WPGraphQL
Clean separation of concerns with dedicated builder functions
Retryable status codes properly identified (408, 429, 500, 502, 503, 504)
Promise coalescing prevents duplicate concurrent sitemap generations
XML validation ensures corrupted snapshots aren't served
Cron authentication properly requires CRON_SECRET environment variable

Files Reviewed (9 files)

lib/google-search-console.ts - no issues (new file, well-structured)
lib/sitemap.ts - 1 issue (error log lacks actionable next steps)
pages/sitemap.xml.ts - 1 issue (missing error handling)
pages/api/cron/refresh-sitemap.ts - 2 issues (error logs lack actionable next steps)
vercel.json - no issues (added cron config, fixed JSON structure)
pages/_document.tsx - no issues (previous review: import reorder + EOF formatting)
public/robots.txt - no issues
public/sitemap.xml - no issues (deleted static file)
package-lock.json - no issues

Fix these issues in Kilo Cloud

_{Reviewed by claude-4.5-opus-20251124 · 205,846 tokens}

Copilot

Pull request overview

Implements a dynamic sitemap endpoint for the Next.js pages router (replacing the previously committed static public/sitemap.xml) and updates SEO-related plumbing to reference the new sitemap location.

Changes:

Removed the large static public/sitemap.xml and added a server-rendered /sitemap.xml route.
Added sitemap generation utilities (lib/sitemap.ts) that aggregate posts/tags/authors into sitemap entries.
Updated robots.txt sitemap URL and adjusted Node engine range in package-lock.json to match package.json.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
public/sitemap.xml	Removed committed static sitemap in favor of a dynamic endpoint.
public/robots.txt	Points crawlers to the new sitemap URL.
pages/sitemap.xml.ts	Adds SSR route that emits generated sitemap XML with caching headers.
lib/sitemap.ts	Implements sitemap entry collection + XML serialization.
pages/_document.tsx	Adds missing `next/script` import (enables Script usage in Document).
package-lock.json	Updates Node engine constraint to `>=18`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

…og-website into dynamic-sitemap-update

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

vercel.json:59

This route sets Cache-Control in getServerSideProps, but this global /blog/(.*) headers rule also sets Cache-Control for /blog/sitemap.xml. Having two different caching directives can result in the sitemap being cached differently than intended (depending on precedence). Consider excluding sitemap.xml from this global rule or defining its caching behavior in one place.

    {
      "source": "/blog/(.*)",
      "headers": [
        {
          "key": "Content-Security-Policy",
          "value": "connect-src 'self' https://px.ads.linkedin.com https://www.google-analytics.com https://analytics.google.com https://region1.google-analytics.com https://stats.g.doubleclick.net https://rp.liadm.com https://idx.liadm.com https://pagead2.googlesyndication.com https://*.clarity.ms https://news.google.com https://assets.apollo.io https://wp.keploy.io https://cdn.hashnode.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://*.googlevideo.com https://googleads.g.doubleclick.net https://marketplace.visualstudio.com https://api.github.com https://pro.ip-api.com https://api.vector.co https://aplo-evnt.com https://ep1.adtrafficquality.google https://ppptg.com https://telemetry.keploy.io; frame-src 'self' https://www.googletagmanager.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://news.google.com https://googleads.g.doubleclick.net https://*.google.com https://ppptg.com; img-src 'self' https://c.bing.com https://ppptg.com https://pbs.twimg.com https://secure.gravatar.com https://wp.keploy.io https://keploy.io data:;"
        },
        {
          "key": "Cache-Control",
          "value": "public, max-age=3600, s-maxage=86400, stale-while-revalidate=604800"
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…okahead Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

amaan-bhati · 2026-04-09T13:58:19Z

Addressed all the copilot reviews until there were no comments left till now
After this step, gave a detailed context prompt to claude to test the entire implementation and see if there are any potential bugs, this was the 3rd time i was doing this with claude, earlier when there were no copilot comments and we asked claude for bugs, claude pointed out 4-5 bugs which we fixed later, this time, there was only one tradeoff:
- Edge cache vs cron timing: s-maxage=86400 is based on when the edge first cached the response, not when the cron last ran. Worst case: ~24–48h stale window (TTL + stale-while-revalidate). Acceptable for a blog sitemap; GSC submission ensures Google discovers new posts independently.
Tested rest of the things manually, visited the sitemap locally, the updated sitemap loads instantly:

Ran npm run build nothing breaks:

Important precaution

even if anything fails right after deployment, it wont affect us since the sitemap url for the blog website lives here: https://keploy.io/sitemap.xml and here the blog sitemap url is : https://keploy.io/sitemaps/sitemap-blog.xml which we will update only when our generation would work perfectly fine.

things we'd need to ensure before deploying to prod:

Vercel runs next build next.config.js calls URL.canParse(process.env.WORDPRESS_API_URL) at build time. If WORDPRESS_API_URL is missing or invalid in your Vercel project env vars, the build will fail here before anything deploys. Check the Vercel dashboard build logs first.
1. Confirm all env vars are set in Vercel
- WORDPRESS_API_URL: required, build fails without it
- CRON_SECRET: required, cron returns 500 without it
- GOOGLE_SERVICE_ACCOUNT_EMAIL
- GOOGLE_SERVICE_ACCOUNT_PRIVATE_KEY
- GOOGLE_SEARCH_CONSOLE_SITE_URL: should be sc-domain:keploy.io
- SITEMAP_PUBLIC_URL: should be https://keploy.io/blog/sitemap.xml

slayerjain · 2026-04-09T17:01:12Z

Review: Simpler architecture available — ISR replaces most of this code

The WordPress crawling logic, fallback handling, and edge cases are well thought out — good work on the robustness. But the core architecture can be dramatically simplified because Vercel ISR already does what the cron + in-memory cache + /tmp snapshot is trying to do.

The core issue: ~900 lines reimplements ISR

The three-tier fallback (in-memory → /tmp → static) is fundamentally unreliable on Vercel serverless:

In-memory (lastSuccessfulSitemapXml) — lost on every cold start, which happens frequently
/tmp snapshot — not shared across instances, wiped on every deploy
So 2 of 3 fallback tiers fail exactly when you need them most (after deploys or during traffic spikes that spawn new instances)

Vercel's ISR solves all of this natively:

Generates at build time → always has a valid cached version
Revalidates in background on a timer → no cron needed
If WordPress is down during regen → serves stale (valid) version automatically
Edge-cached → no serverless function invocation for most requests

Suggested approach: `getStaticProps` + `revalidate`

pages/sitemap.xml.ts (~80 lines):

import { GetStaticProps } from "next";
import { getAllPosts, getAllTags, getAllAuthors } from "../lib/api"; // already exists!

export const getStaticProps: GetStaticProps = async () => {
  const [posts, tags, authors] = await Promise.all([
    getAllPosts(),
    getAllTags(),
    getAllAuthors(),
  ]);
  
  const xml = buildSitemapXml(posts, tags, authors);
  return { props: { xml }, revalidate: 3600 }; // regenerate hourly in background
};

pages/api/cron/submit-gsc.ts (~30 lines) — cron ONLY pings GSC, no sitemap regen

What this eliminates

Current PR	With ISR
`lib/sitemap.ts` (733 lines) — custom paginator, retry, fallback tiers	Reuse existing `lib/api.ts` (already has `getAllPosts`, `getAllTags`, `getAllAuthors` with pagination)
`getServerSideProps` — function runs on every request	`getStaticProps` + `revalidate` — static, edge-cached
3-tier fallback (memory → /tmp → static)	Vercel ISR cache is the fallback (stale-while-revalidate built in)
Concurrency guard (`refreshSitemapPromise`)	ISR deduplicates revalidation natively
`maxDuration: 300` on cron	ISR revalidation runs in background, no long-running function
Cron refreshes sitemap + submits GSC	Cron only submits to GSC (~5 lines)
~900 lines across 4 new files	~110 lines across 2 files

What to keep

The GSC OAuth implementation (lib/google-search-console.ts) is solid — keep it (or simplify with google-auth-library)
The XML escaping and dedup logic is fine, just move it into the ISR page
The E2E tests are good — adapt them for the simpler route
The CSP exclusion for sitemap.xml is correct

The `lib/api.ts` duplication problem

The blog already has fully paginated queries in lib/api.ts:

getAllPosts() — paginated, returns slug, categories, author, date
getAllTags() — all tags
getAllAuthors() — all authors from posts

The PR rewrites all of these from scratch in lib/sitemap.ts with a parallel fetchGraphQL function, parallel retry logic, and parallel pagination. This means two independent WordPress data layers to maintain. If someone fixes a bug in lib/api.ts, the sitemap won't pick it up and vice versa.

TL;DR

The defensive programming is genuinely good, but it's solving problems that Vercel's platform already solves. Simplify to ISR + reuse lib/api.ts, and the PR drops from ~900 lines to ~110 while being more reliable (edge-cached, platform-managed fallback, no ephemeral /tmp dependency).

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot

Pull request overview

Copilot reviewed 23 out of 25 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-10T11:57:09Z

        return technologyPosts;
    }

+    if (query.includes('query SitemapPosts')) {


handleGraphQL never returns sitemapPostsResponse for the sitemap crawl: the sitemap route calls getAllPosts() which issues query AllPosts(...), but the mock server only checks for query SitemapPosts. As a result the sitemap request gets technologyPosts only (via the final if (query.includes('AllPosts'))), causing assertFullSitemap to fail (community count = 0) and the sitemap endpoint to fall back to 503 in e2e runs. Update the mock routing to return sitemapPostsResponse for the actual sitemap query shape (e.g., detect the AllPosts query without tagName, or detect the presence of sitemap-specific fields like modified/pageInfo).

Suggested change

if (query.includes('query SitemapPosts')) {

if (

query.includes('query SitemapPosts') ||

(

query.includes('AllPosts') &&

!query.includes('AllPostsForCategory') &&

!query.includes('tagName') &&

(query.includes('modified') || query.includes('pageInfo'))

)

) {

Copilot · 2026-04-10T11:57:09Z

+export const revalidate = 3600;
+
+export async function GET(): Promise<Response> {
+  try {
+    // reuse the existing getAllPosts() paginator from lib/api.ts.
+    // as of the pagination fix, this fetches ALL posts (not just the first 50).
+    const allPostsResult = await getAllPosts();
+
+    // convert getAllPosts() return shape into SitemapPost[] for the entry builders.
+    const posts = adaptPostsForSitemap(allPostsResult);
+
+    // reject partial wordpress responses before they replace a good cached version.
+    // throws if fewer than 5 posts per category, ISR will not cache a thrown error,
+    // so Vercel keeps serving the previous good cached version automatically.
+    assertFullSitemap(posts);
+
+    // static routes get lastmod = newest post modification time,
+    // so listing pages reflect when the freshest underlying content changed.
+    const latestModified = getLatestModified(posts) ?? new Date().toISOString();
+    const staticEntries = STATIC_ROUTES.map((r) => ({
+      ...r,
+      lastModified: latestModified,
+    }));
+
+    const entries = dedupeEntries([
+      ...staticEntries,
+      ...buildPostEntries(posts),
+      ...buildAuthorEntries(posts),
+      ...buildTagEntries(posts),
+    ]);
+
+    const xml = serializeSitemap(entries);
+
+    return new Response(xml, {
+      status: 200,
+      headers: {
+        "Content-Type": "application/xml",
+        // s-maxage instructs Vercel's CDN to cache for 1h (matches revalidate above).
+        // stale-while-revalidate lets the CDN serve stale while regenerating in background.
+        // max-age=0 ensures browsers always revalidate with the CDN rather than caching locally.
+        "Cache-Control": "public, max-age=0, s-maxage=3600, stale-while-revalidate=3600",
+      },


The sitemap route sets Cache-Control to s-maxage=3600 / stale-while-revalidate=3600, but this PR’s documented cache strategy and the new Playwright sitemap test assert s-maxage=86400 / stale-while-revalidate=86400. Align the implementation with the intended caching (either change revalidate + Cache-Control to 86400, or update the tests/docs to expect 3600) so CI and behavior are consistent.

Copilot · 2026-04-10T11:57:10Z

+        expect(response.headers()['cache-control']).toContain('s-maxage=86400');
+        expect(response.headers()['cache-control']).toContain('max-age=0');
+        expect(response.headers()['cache-control']).toContain('stale-while-revalidate=86400');


This test asserts cache-control contains s-maxage=86400 and stale-while-revalidate=86400, but app/sitemap.xml/route.ts currently returns s-maxage=3600 / stale-while-revalidate=3600 and there is no vercel.json override for /blog/sitemap.xml. Update the expected values (or adjust the route’s headers/revalidate) so the test matches actual sitemap caching behavior.

Suggested change

expect(response.headers()['cache-control']).toContain('s-maxage=86400');

expect(response.headers()['cache-control']).toContain('max-age=0');

expect(response.headers()['cache-control']).toContain('stale-while-revalidate=86400');

expect(response.headers()['cache-control']).toContain('s-maxage=3600');

expect(response.headers()['cache-control']).toContain('max-age=0');

expect(response.headers()['cache-control']).toContain('stale-while-revalidate=3600');

Copilot · 2026-04-10T11:57:10Z

+// GSC submission is fast — no WordPress crawl happens here anymore.
+// Sitemap generation is handled by ISR in app/sitemap.xml/route.ts.
+export const config = { maxDuration: 30 };
+
+export default async function handler(req: NextApiRequest, res: NextApiResponse) {
+  const expectedSecret = process.env.CRON_SECRET;
+
+  // distinguish a deployment misconfiguration (500) from a wrong token (401).
+  if (!expectedSecret) {
+    console.error(
+      "CRON_SECRET is not configured. Set it in Vercel environment variables and redeploy."
+    );
+    return res.status(500).json({
+      ok: false,
+      message: "Server misconfiguration — CRON_SECRET is not configured",
+    });
+  }
+
+  // auth is checked before method to avoid leaking valid HTTP methods to
+  // unauthenticated callers. vercel cron automatically injects this header.
+  if (req.headers.authorization !== `Bearer ${expectedSecret}`) {
+    return res.status(401).json({ ok: false, message: "Unauthorized" });
+  }
+
+  if (req.method !== "GET") {
+    res.setHeader("Allow", "GET");
+    return res.status(405).json({ ok: false, message: "Method not allowed" });
+  }
+
+  // skip silently if google search console env vars are not all configured.
+  if (!isSearchConsoleSubmissionConfigured()) {
+    return res.status(200).json({
+      ok: true,
+      message: "Google Search Console submission is not configured — skipped",
+    });
+  }
+
+  try {
+    // notify google that the sitemap has been updated so it re-crawls it.
+    // the sitemap itself is generated and cached by ISR — no crawl needed here.
+    const result = await submitSitemapToSearchConsole();
+    return res.status(200).json({ ok: true, ...result });
+  } catch (error) {
+    console.error(
+      "Google Search Console sitemap submission failed. " +
+        "Verify GOOGLE_SERVICE_ACCOUNT_EMAIL, GOOGLE_SERVICE_ACCOUNT_PRIVATE_KEY, " +
+        "GOOGLE_SEARCH_CONSOLE_SITE_URL, and Search Console property access for the service account.",
+      error
+    );
+    return res.status(500).json({
+      ok: false,
+      message:
+        error instanceof Error ? error.message : "Google Search Console submission failed",
+    });


The cron handler no longer performs a sitemap refresh/warm (it only submits to Google) and its success payload is { ok: true, siteUrl, sitemapUrl, submittedAt }, but the PR description + added e2e test expect refresh metadata like entryCount, generatedAt, and searchConsole.submitted, and also state that Google failures should be non-blocking. Consider (1) triggering a fetch of /sitemap.xml (using the incoming request host/proto) to ensure the sitemap is regenerated/warmed before submission, (2) returning the documented metadata in the JSON response, and (3) on GSC failure returning 200 ok: true with a searchConsole error payload instead of a 500 so cron refresh isn’t coupled to Google availability.

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot

Pull request overview

Copilot reviewed 24 out of 26 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

vercel.json:66

/blog/sitemap.xml is excluded from the /blog/* catch-all headers rule, but there’s no dedicated headers rule for the sitemap in vercel.json. If the intended Cache-Control policy is the documented s-maxage=86400 (daily edge cache), add an explicit /blog/sitemap.xml headers entry (or update the documentation/tests to match the route handler’s 1h policy).

    {
      "source": "/blog/((?!(?:sitemap\\.xml$|api/|_next/static/)).*)",
      "headers": [
        {
          "key": "Content-Security-Policy",
          "value": "connect-src 'self' https://px.ads.linkedin.com https://www.google-analytics.com https://analytics.google.com https://region1.google-analytics.com https://stats.g.doubleclick.net https://rp.liadm.com https://idx.liadm.com https://pagead2.googlesyndication.com https://*.clarity.ms https://news.google.com https://assets.apollo.io https://wp.keploy.io https://cdn.hashnode.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://*.googlevideo.com https://googleads.g.doubleclick.net https://marketplace.visualstudio.com https://api.github.com https://pro.ip-api.com https://api.vector.co https://aplo-evnt.com https://ep1.adtrafficquality.google https://ppptg.com https://telemetry.keploy.io; frame-src 'self' https://www.googletagmanager.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://news.google.com https://googleads.g.doubleclick.net https://*.google.com https://ppptg.com; img-src 'self' https://c.bing.com https://ppptg.com https://pbs.twimg.com https://secure.gravatar.com https://wp.keploy.io https://keploy.io data:;"
        },
        {
          "key": "Cache-Control",
          "value": "public, max-age=3600, s-maxage=86400, stale-while-revalidate=604800"
        }
      ]
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-10T14:43:03Z

+import https from "node:https";
+import {
+  adaptPostsForSitemap,


fetchGraphQL uses node:https unconditionally, so a WORDPRESS_API_URL like http://localhost:4000/graphql (used by Playwright’s mock server) will fail the TLS handshake and force the sitemap route into the 503 fallback. Parse new URL(apiUrl).protocol and use node:http for http: (or switch to a protocol-agnostic client) so local/dev/test URLs work.

Copilot · 2026-04-10T14:43:04Z

+    const edges = data?.posts?.edges ?? [];
+    allEdges = [...allEdges, ...edges];
+    hasNextPage = data?.posts?.pageInfo?.hasNextPage ?? false;
+    endCursor = data?.posts?.pageInfo?.endCursor ?? null;
+  }


The pagination loop can become infinite if WPGraphQL returns hasNextPage: true with a missing/null endCursor (the request will keep sending after: null and re-fetch page 1). Add a guard to throw/fail fast when hasNextPage is true but endCursor is falsy (or when the cursor doesn’t advance).

Copilot · 2026-04-10T14:43:04Z

+  return new Promise((resolve, reject) => {
+    const req = https.request(
+      {
+        hostname: url.hostname,
+        port: url.port || 443,
+        path: url.pathname + url.search,
+        method: "POST",
+        headers: {
+          "Content-Type": "application/json",
+          "Content-Length": Buffer.byteLength(body),
+          "User-Agent": "keploy-blog-sitemap/1.0",
+        },
+      },
+      (res) => {


https.request() here has no overall timeout/abort. If the upstream stalls (socket hang, never-ending response), the route handler can hang until the platform kills it, which prevents clean 503 fallback behavior and ties up concurrency. Add a request/response timeout (and ensure the request is destroyed/aborted) so failures reliably hit the catch block.

Copilot · 2026-04-10T14:43:04Z

+// If WordPress is down during regen: Vercel keeps serving previous good version automatically.
+// Cold-start / first request: mitigated by build-time pre-generation (see scripts/prewarm-sitemap.mjs)
+// and post-deploy warming triggered by the Vercel deployment hook in GitHub Actions.


The header comment mentions build-time pre-generation via scripts/prewarm-sitemap.mjs and a post-deploy warming hook, but there’s no such script in the repo. Update the comment to match the actual warming strategy (or add the referenced script) so future maintainers aren’t misled.

Suggested change

// If WordPress is down during regen: Vercel keeps serving previous good version automatically.

// Cold-start / first request: mitigated by build-time pre-generation (see scripts/prewarm-sitemap.mjs)

// and post-deploy warming triggered by the Vercel deployment hook in GitHub Actions.

// If WordPress is down during regen: Vercel keeps serving the previous good version automatically.

// The first request after deploy or cache expiry triggers generation; there is no

// repository-managed build-time pre-generation or post-deploy warming script here.

Copilot · 2026-04-10T14:43:05Z

+        const body = await response.json();
+        expect(body.ok).toBe(true);
+        expect(body.entryCount).toBeGreaterThan(0);
+        expect(typeof body.generatedAt).toBe('string');
+        expect(typeof body.searchConsole?.submitted).toBe('boolean');
+    });


This test expects entryCount, generatedAt, and searchConsole.submitted, but pages/api/cron/refresh-sitemap.ts returns either { ok, message } (skipped) or { ok, siteUrl, sitemapUrl, submittedAt } (submission). Update the assertions (or change the endpoint response shape) so the test matches the real API contract.

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

amaan-bhati · 2026-04-15T22:34:08Z

clsoing this pr since we found a cheaper and more optimised + faster approach in this pr: #374

feat: implement dynamic sitemap with page router and updated wrt buil…

ecd8aac

…d failure Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot AI review requested due to automatic review settings March 31, 2026 14:04

Copilot started reviewing on behalf of amaan-bhati March 31, 2026 14:05 View session

kilo-code-bot Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread pages/sitemap.xml.ts Outdated

Comment thread lib/sitemap.ts Outdated

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Comment thread pages/_document.tsx Outdated

Comment thread pages/sitemap.xml.ts Outdated

Comment thread lib/sitemap.ts Outdated

Comment thread pages/sitemap.xml.ts Outdated

feat: migration implementation of app sitemap.ts in page router

204b2a9

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

kilo-code-bot Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread lib/sitemap.ts Outdated

feat: fix edge cases and error wp 502 error handling

04fa5f2

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

kilo-code-bot Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread lib/sitemap.ts Outdated

amaan-bhati and others added 4 commits April 1, 2026 14:38

Merge branch 'main' into dynamic-sitemap-update

3fde605

fix: reorder import statements in _document.tsx

d80fc23

Merge branch 'dynamic-sitemap-update' of https://github.com/keploy/bl…

424db4c

…og-website into dynamic-sitemap-update

chore: test commit to check build pipeline

97af5d5

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

amaan-bhati requested a review from Copilot April 1, 2026 09:31

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Comment thread lib/sitemap.ts Outdated

Comment thread lib/sitemap.ts Outdated

Comment thread lib/sitemap.ts Outdated

feat: add vercel cronjob, refreshsitemap, edge cases, auth and snapshots

aff778d

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

kilo-code-bot Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread pages/api/cron/refresh-sitemap.ts Outdated

feat: add onew more cron job for auto google indexing using api

4c2eeb1

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot AI review requested due to automatic review settings April 1, 2026 16:01

Copilot started reviewing on behalf of amaan-bhati April 1, 2026 16:02 View session

kilo-code-bot Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread pages/api/cron/refresh-sitemap.ts Outdated

Copilot AI reviewed Apr 1, 2026

View reviewed changes

amaan-bhati changed the title ~~feat: implement dynamic sitemap with page router and updated wrt build failure~~ feat: implement dynamic sitemap with page router, add daily cron referesh and google console submission Apr 2, 2026

amaan-bhati changed the title ~~feat: implement dynamic sitemap with page router, add daily cron referesh and google console submission~~ feat: dynamic sitemap with page router, add daily cron referesh and google console submission Apr 2, 2026

Merge branch 'main' into dynamic-sitemap-update

7751fa3

amaan-bhati requested a review from Copilot April 6, 2026 11:37

Copilot started reviewing on behalf of amaan-bhati April 6, 2026 11:38 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Comment thread vercel.json

Comment thread lib/sitemap.ts Outdated

Comment thread lib/google-search-console.ts

amaan-bhati requested a review from Copilot April 6, 2026 15:01

feat: fix copilot review#14, exclude /api for for grouped negative lo…

e25b848

…okahead Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

amaan-bhati requested a review from Copilot April 9, 2026 07:26

Copilot started reviewing on behalf of amaan-bhati April 9, 2026 07:27 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread lib/sitemap.ts Outdated

Comment thread pages/sitemap.xml.ts Outdated

chore: copilot minor suggestion address for future people working

a558228

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

amaan-bhati requested a review from Copilot April 9, 2026 08:19

Copilot started reviewing on behalf of amaan-bhati April 9, 2026 08:22 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread lib/sitemap.ts Outdated

Comment thread pages/sitemap.xml.ts Outdated

chore: fix copilot review#16, cusor pagination fix

3266d6b

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

amaan-bhati requested a review from Copilot April 9, 2026 09:21

Copilot started reviewing on behalf of amaan-bhati April 9, 2026 09:22 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

amaan-bhati requested a review from Copilot April 9, 2026 11:49

Copilot started reviewing on behalf of amaan-bhati April 9, 2026 11:50 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Merge branch 'main' into dynamic-sitemap-update

93feab7

feat: migrate to isr generation of sitemap

fca7d50

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot AI review requested due to automatic review settings April 10, 2026 11:47

Copilot started reviewing on behalf of amaan-bhati April 10, 2026 11:50 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

feat: fix the 502 + fix strictnull checks

e2ef917

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

amaan-bhati requested a review from Copilot April 10, 2026 14:36

Copilot started reviewing on behalf of amaan-bhati April 10, 2026 14:37 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

feat: create api server ts, server inly using node:https

689aa79

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

amaan-bhati mentioned this pull request Apr 10, 2026

feat: migrate sitemap to App Router ISR with automation #367

Closed

amaan-bhati closed this Apr 15, 2026

-    if (query.includes('query SitemapPosts')) {
+    if (
+        query.includes('query SitemapPosts') ||
+        (
+            query.includes('AllPosts') &&
+            !query.includes('AllPostsForCategory') &&
+            !query.includes('tagName') &&
+            (query.includes('modified') || query.includes('pageInfo'))
+        )
+    ) {

Conversation

amaan-bhati commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: Dynamic Sitemap Generation with Cron Refresh, Snapshot Fallback, and Google Search Console Auto-Submission

Iteration History

File and Folder Structure

WordPress GraphQL Paginator

Category-to-Route Mapping

Entry Builders

XML Serialization

Three-Tier Snapshot Fallback

Concurrency Guard

Cron Endpoint Security

Google Search Console Integration

Cache-Control Strategy

Build-Time Validation

Edge Cases Covered

Testing

E2E Tests (Playwright)

Live Verification Results

Local GSC Verification Script, this was a script i generated using claude for testing and the steps were completed successfully

Uh oh!

Uh oh!

Uh oh!

kilo-code-bot Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

WARNING

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

amaan-bhati commented Mar 31, 2026 •

edited

Loading

kilo-code-bot Bot commented Mar 31, 2026 •

edited

Loading

Suggested approach: `getStaticProps` + `revalidate`

The `lib/api.ts` duplication problem