feat: dynamic sitemap with page router, add daily cron referesh and google console submission#355
feat: dynamic sitemap with page router, add daily cron referesh and google console submission#355amaan-bhati wants to merge 37 commits intomainfrom
Conversation
…d failure Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Code Review SummaryStatus: 4 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Improvements Since Last Review
Incremental Review (aff778d..4c2eeb1)Changes reviewed:
New issue introduced: Error log for Search Console submission failure lacks actionable next steps. Positive observations in new code:
Additional NotesPositive observations:
Files Reviewed (9 files)
Fix these issues in Kilo Cloud Reviewed by claude-4.5-opus-20251124 · 205,846 tokens |
There was a problem hiding this comment.
Pull request overview
Implements a dynamic sitemap endpoint for the Next.js pages router (replacing the previously committed static public/sitemap.xml) and updates SEO-related plumbing to reference the new sitemap location.
Changes:
- Removed the large static
public/sitemap.xmland added a server-rendered/sitemap.xmlroute. - Added sitemap generation utilities (
lib/sitemap.ts) that aggregate posts/tags/authors into sitemap entries. - Updated
robots.txtsitemap URL and adjusted Node engine range inpackage-lock.jsonto matchpackage.json.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| public/sitemap.xml | Removed committed static sitemap in favor of a dynamic endpoint. |
| public/robots.txt | Points crawlers to the new sitemap URL. |
| pages/sitemap.xml.ts | Adds SSR route that emits generated sitemap XML with caching headers. |
| lib/sitemap.ts | Implements sitemap entry collection + XML serialization. |
| pages/_document.tsx | Adds missing next/script import (enables Script usage in Document). |
| package-lock.json | Updates Node engine constraint to >=18. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
…og-website into dynamic-sitemap-update
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
vercel.json:59
- This route sets
Cache-ControlingetServerSideProps, but this global/blog/(.*)headers rule also setsCache-Controlfor/blog/sitemap.xml. Having two different caching directives can result in the sitemap being cached differently than intended (depending on precedence). Consider excludingsitemap.xmlfrom this global rule or defining its caching behavior in one place.
{
"source": "/blog/(.*)",
"headers": [
{
"key": "Content-Security-Policy",
"value": "connect-src 'self' https://px.ads.linkedin.com https://www.google-analytics.com https://analytics.google.com https://region1.google-analytics.com https://stats.g.doubleclick.net https://rp.liadm.com https://idx.liadm.com https://pagead2.googlesyndication.com https://*.clarity.ms https://news.google.com https://assets.apollo.io https://wp.keploy.io https://cdn.hashnode.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://*.googlevideo.com https://googleads.g.doubleclick.net https://marketplace.visualstudio.com https://api.github.com https://pro.ip-api.com https://api.vector.co https://aplo-evnt.com https://ep1.adtrafficquality.google https://ppptg.com https://telemetry.keploy.io; frame-src 'self' https://www.googletagmanager.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://news.google.com https://googleads.g.doubleclick.net https://*.google.com https://ppptg.com; img-src 'self' https://c.bing.com https://ppptg.com https://pbs.twimg.com https://secure.gravatar.com https://wp.keploy.io https://keploy.io data:;"
},
{
"key": "Cache-Control",
"value": "public, max-age=3600, s-maxage=86400, stale-while-revalidate=604800"
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…okahead Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Important precaution
things we'd need to ensure before deploying to prod:
|
Review: Simpler architecture available — ISR replaces most of this codeThe WordPress crawling logic, fallback handling, and edge cases are well thought out — good work on the robustness. But the core architecture can be dramatically simplified because Vercel ISR already does what the cron + in-memory cache + /tmp snapshot is trying to do. The core issue: ~900 lines reimplements ISRThe three-tier fallback (in-memory →
Vercel's ISR solves all of this natively:
Suggested approach:
|
| Current PR | With ISR |
|---|---|
lib/sitemap.ts (733 lines) — custom paginator, retry, fallback tiers |
Reuse existing lib/api.ts (already has getAllPosts, getAllTags, getAllAuthors with pagination) |
getServerSideProps — function runs on every request |
getStaticProps + revalidate — static, edge-cached |
| 3-tier fallback (memory → /tmp → static) | Vercel ISR cache is the fallback (stale-while-revalidate built in) |
Concurrency guard (refreshSitemapPromise) |
ISR deduplicates revalidation natively |
maxDuration: 300 on cron |
ISR revalidation runs in background, no long-running function |
| Cron refreshes sitemap + submits GSC | Cron only submits to GSC (~5 lines) |
| ~900 lines across 4 new files | ~110 lines across 2 files |
What to keep
- The GSC OAuth implementation (
lib/google-search-console.ts) is solid — keep it (or simplify withgoogle-auth-library) - The XML escaping and dedup logic is fine, just move it into the ISR page
- The E2E tests are good — adapt them for the simpler route
- The CSP exclusion for sitemap.xml is correct
The lib/api.ts duplication problem
The blog already has fully paginated queries in lib/api.ts:
getAllPosts()— paginated, returns slug, categories, author, dategetAllTags()— all tagsgetAllAuthors()— all authors from posts
The PR rewrites all of these from scratch in lib/sitemap.ts with a parallel fetchGraphQL function, parallel retry logic, and parallel pagination. This means two independent WordPress data layers to maintain. If someone fixes a bug in lib/api.ts, the sitemap won't pick it up and vice versa.
TL;DR
The defensive programming is genuinely good, but it's solving problems that Vercel's platform already solves. Simplify to ISR + reuse lib/api.ts, and the PR drops from ~900 lines to ~110 while being more reliable (edge-cached, platform-managed fallback, no ephemeral /tmp dependency).
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 25 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return technologyPosts; | ||
| } | ||
|
|
||
| if (query.includes('query SitemapPosts')) { |
There was a problem hiding this comment.
handleGraphQL never returns sitemapPostsResponse for the sitemap crawl: the sitemap route calls getAllPosts() which issues query AllPosts(...), but the mock server only checks for query SitemapPosts. As a result the sitemap request gets technologyPosts only (via the final if (query.includes('AllPosts'))), causing assertFullSitemap to fail (community count = 0) and the sitemap endpoint to fall back to 503 in e2e runs. Update the mock routing to return sitemapPostsResponse for the actual sitemap query shape (e.g., detect the AllPosts query without tagName, or detect the presence of sitemap-specific fields like modified/pageInfo).
| if (query.includes('query SitemapPosts')) { | |
| if ( | |
| query.includes('query SitemapPosts') || | |
| ( | |
| query.includes('AllPosts') && | |
| !query.includes('AllPostsForCategory') && | |
| !query.includes('tagName') && | |
| (query.includes('modified') || query.includes('pageInfo')) | |
| ) | |
| ) { |
| export const revalidate = 3600; | ||
|
|
||
| export async function GET(): Promise<Response> { | ||
| try { | ||
| // reuse the existing getAllPosts() paginator from lib/api.ts. | ||
| // as of the pagination fix, this fetches ALL posts (not just the first 50). | ||
| const allPostsResult = await getAllPosts(); | ||
|
|
||
| // convert getAllPosts() return shape into SitemapPost[] for the entry builders. | ||
| const posts = adaptPostsForSitemap(allPostsResult); | ||
|
|
||
| // reject partial wordpress responses before they replace a good cached version. | ||
| // throws if fewer than 5 posts per category, ISR will not cache a thrown error, | ||
| // so Vercel keeps serving the previous good cached version automatically. | ||
| assertFullSitemap(posts); | ||
|
|
||
| // static routes get lastmod = newest post modification time, | ||
| // so listing pages reflect when the freshest underlying content changed. | ||
| const latestModified = getLatestModified(posts) ?? new Date().toISOString(); | ||
| const staticEntries = STATIC_ROUTES.map((r) => ({ | ||
| ...r, | ||
| lastModified: latestModified, | ||
| })); | ||
|
|
||
| const entries = dedupeEntries([ | ||
| ...staticEntries, | ||
| ...buildPostEntries(posts), | ||
| ...buildAuthorEntries(posts), | ||
| ...buildTagEntries(posts), | ||
| ]); | ||
|
|
||
| const xml = serializeSitemap(entries); | ||
|
|
||
| return new Response(xml, { | ||
| status: 200, | ||
| headers: { | ||
| "Content-Type": "application/xml", | ||
| // s-maxage instructs Vercel's CDN to cache for 1h (matches revalidate above). | ||
| // stale-while-revalidate lets the CDN serve stale while regenerating in background. | ||
| // max-age=0 ensures browsers always revalidate with the CDN rather than caching locally. | ||
| "Cache-Control": "public, max-age=0, s-maxage=3600, stale-while-revalidate=3600", | ||
| }, |
There was a problem hiding this comment.
The sitemap route sets Cache-Control to s-maxage=3600 / stale-while-revalidate=3600, but this PR’s documented cache strategy and the new Playwright sitemap test assert s-maxage=86400 / stale-while-revalidate=86400. Align the implementation with the intended caching (either change revalidate + Cache-Control to 86400, or update the tests/docs to expect 3600) so CI and behavior are consistent.
| expect(response.headers()['cache-control']).toContain('s-maxage=86400'); | ||
| expect(response.headers()['cache-control']).toContain('max-age=0'); | ||
| expect(response.headers()['cache-control']).toContain('stale-while-revalidate=86400'); |
There was a problem hiding this comment.
This test asserts cache-control contains s-maxage=86400 and stale-while-revalidate=86400, but app/sitemap.xml/route.ts currently returns s-maxage=3600 / stale-while-revalidate=3600 and there is no vercel.json override for /blog/sitemap.xml. Update the expected values (or adjust the route’s headers/revalidate) so the test matches actual sitemap caching behavior.
| expect(response.headers()['cache-control']).toContain('s-maxage=86400'); | |
| expect(response.headers()['cache-control']).toContain('max-age=0'); | |
| expect(response.headers()['cache-control']).toContain('stale-while-revalidate=86400'); | |
| expect(response.headers()['cache-control']).toContain('s-maxage=3600'); | |
| expect(response.headers()['cache-control']).toContain('max-age=0'); | |
| expect(response.headers()['cache-control']).toContain('stale-while-revalidate=3600'); |
| // GSC submission is fast — no WordPress crawl happens here anymore. | ||
| // Sitemap generation is handled by ISR in app/sitemap.xml/route.ts. | ||
| export const config = { maxDuration: 30 }; | ||
|
|
||
| export default async function handler(req: NextApiRequest, res: NextApiResponse) { | ||
| const expectedSecret = process.env.CRON_SECRET; | ||
|
|
||
| // distinguish a deployment misconfiguration (500) from a wrong token (401). | ||
| if (!expectedSecret) { | ||
| console.error( | ||
| "CRON_SECRET is not configured. Set it in Vercel environment variables and redeploy." | ||
| ); | ||
| return res.status(500).json({ | ||
| ok: false, | ||
| message: "Server misconfiguration — CRON_SECRET is not configured", | ||
| }); | ||
| } | ||
|
|
||
| // auth is checked before method to avoid leaking valid HTTP methods to | ||
| // unauthenticated callers. vercel cron automatically injects this header. | ||
| if (req.headers.authorization !== `Bearer ${expectedSecret}`) { | ||
| return res.status(401).json({ ok: false, message: "Unauthorized" }); | ||
| } | ||
|
|
||
| if (req.method !== "GET") { | ||
| res.setHeader("Allow", "GET"); | ||
| return res.status(405).json({ ok: false, message: "Method not allowed" }); | ||
| } | ||
|
|
||
| // skip silently if google search console env vars are not all configured. | ||
| if (!isSearchConsoleSubmissionConfigured()) { | ||
| return res.status(200).json({ | ||
| ok: true, | ||
| message: "Google Search Console submission is not configured — skipped", | ||
| }); | ||
| } | ||
|
|
||
| try { | ||
| // notify google that the sitemap has been updated so it re-crawls it. | ||
| // the sitemap itself is generated and cached by ISR — no crawl needed here. | ||
| const result = await submitSitemapToSearchConsole(); | ||
| return res.status(200).json({ ok: true, ...result }); | ||
| } catch (error) { | ||
| console.error( | ||
| "Google Search Console sitemap submission failed. " + | ||
| "Verify GOOGLE_SERVICE_ACCOUNT_EMAIL, GOOGLE_SERVICE_ACCOUNT_PRIVATE_KEY, " + | ||
| "GOOGLE_SEARCH_CONSOLE_SITE_URL, and Search Console property access for the service account.", | ||
| error | ||
| ); | ||
| return res.status(500).json({ | ||
| ok: false, | ||
| message: | ||
| error instanceof Error ? error.message : "Google Search Console submission failed", | ||
| }); |
There was a problem hiding this comment.
The cron handler no longer performs a sitemap refresh/warm (it only submits to Google) and its success payload is { ok: true, siteUrl, sitemapUrl, submittedAt }, but the PR description + added e2e test expect refresh metadata like entryCount, generatedAt, and searchConsole.submitted, and also state that Google failures should be non-blocking. Consider (1) triggering a fetch of /sitemap.xml (using the incoming request host/proto) to ensure the sitemap is regenerated/warmed before submission, (2) returning the documented metadata in the JSON response, and (3) on GSC failure returning 200 ok: true with a searchConsole error payload instead of a 500 so cron refresh isn’t coupled to Google availability.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 26 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
vercel.json:66
/blog/sitemap.xmlis excluded from the/blog/*catch-all headers rule, but there’s no dedicated headers rule for the sitemap invercel.json. If the intended Cache-Control policy is the documenteds-maxage=86400(daily edge cache), add an explicit/blog/sitemap.xmlheaders entry (or update the documentation/tests to match the route handler’s 1h policy).
{
"source": "/blog/((?!(?:sitemap\\.xml$|api/|_next/static/)).*)",
"headers": [
{
"key": "Content-Security-Policy",
"value": "connect-src 'self' https://px.ads.linkedin.com https://www.google-analytics.com https://analytics.google.com https://region1.google-analytics.com https://stats.g.doubleclick.net https://rp.liadm.com https://idx.liadm.com https://pagead2.googlesyndication.com https://*.clarity.ms https://news.google.com https://assets.apollo.io https://wp.keploy.io https://cdn.hashnode.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://*.googlevideo.com https://googleads.g.doubleclick.net https://marketplace.visualstudio.com https://api.github.com https://pro.ip-api.com https://api.vector.co https://aplo-evnt.com https://ep1.adtrafficquality.google https://ppptg.com https://telemetry.keploy.io; frame-src 'self' https://www.googletagmanager.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://news.google.com https://googleads.g.doubleclick.net https://*.google.com https://ppptg.com; img-src 'self' https://c.bing.com https://ppptg.com https://pbs.twimg.com https://secure.gravatar.com https://wp.keploy.io https://keploy.io data:;"
},
{
"key": "Cache-Control",
"value": "public, max-age=3600, s-maxage=86400, stale-while-revalidate=604800"
}
]
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import https from "node:https"; | ||
| import { | ||
| adaptPostsForSitemap, |
There was a problem hiding this comment.
fetchGraphQL uses node:https unconditionally, so a WORDPRESS_API_URL like http://localhost:4000/graphql (used by Playwright’s mock server) will fail the TLS handshake and force the sitemap route into the 503 fallback. Parse new URL(apiUrl).protocol and use node:http for http: (or switch to a protocol-agnostic client) so local/dev/test URLs work.
| const edges = data?.posts?.edges ?? []; | ||
| allEdges = [...allEdges, ...edges]; | ||
| hasNextPage = data?.posts?.pageInfo?.hasNextPage ?? false; | ||
| endCursor = data?.posts?.pageInfo?.endCursor ?? null; | ||
| } |
There was a problem hiding this comment.
The pagination loop can become infinite if WPGraphQL returns hasNextPage: true with a missing/null endCursor (the request will keep sending after: null and re-fetch page 1). Add a guard to throw/fail fast when hasNextPage is true but endCursor is falsy (or when the cursor doesn’t advance).
| return new Promise((resolve, reject) => { | ||
| const req = https.request( | ||
| { | ||
| hostname: url.hostname, | ||
| port: url.port || 443, | ||
| path: url.pathname + url.search, | ||
| method: "POST", | ||
| headers: { | ||
| "Content-Type": "application/json", | ||
| "Content-Length": Buffer.byteLength(body), | ||
| "User-Agent": "keploy-blog-sitemap/1.0", | ||
| }, | ||
| }, | ||
| (res) => { |
There was a problem hiding this comment.
https.request() here has no overall timeout/abort. If the upstream stalls (socket hang, never-ending response), the route handler can hang until the platform kills it, which prevents clean 503 fallback behavior and ties up concurrency. Add a request/response timeout (and ensure the request is destroyed/aborted) so failures reliably hit the catch block.
| // If WordPress is down during regen: Vercel keeps serving previous good version automatically. | ||
| // Cold-start / first request: mitigated by build-time pre-generation (see scripts/prewarm-sitemap.mjs) | ||
| // and post-deploy warming triggered by the Vercel deployment hook in GitHub Actions. |
There was a problem hiding this comment.
The header comment mentions build-time pre-generation via scripts/prewarm-sitemap.mjs and a post-deploy warming hook, but there’s no such script in the repo. Update the comment to match the actual warming strategy (or add the referenced script) so future maintainers aren’t misled.
| // If WordPress is down during regen: Vercel keeps serving previous good version automatically. | |
| // Cold-start / first request: mitigated by build-time pre-generation (see scripts/prewarm-sitemap.mjs) | |
| // and post-deploy warming triggered by the Vercel deployment hook in GitHub Actions. | |
| // If WordPress is down during regen: Vercel keeps serving the previous good version automatically. | |
| // The first request after deploy or cache expiry triggers generation; there is no | |
| // repository-managed build-time pre-generation or post-deploy warming script here. |
| const body = await response.json(); | ||
| expect(body.ok).toBe(true); | ||
| expect(body.entryCount).toBeGreaterThan(0); | ||
| expect(typeof body.generatedAt).toBe('string'); | ||
| expect(typeof body.searchConsole?.submitted).toBe('boolean'); | ||
| }); |
There was a problem hiding this comment.
This test expects entryCount, generatedAt, and searchConsole.submitted, but pages/api/cron/refresh-sitemap.ts returns either { ok, message } (skipped) or { ok, siteUrl, sitemapUrl, submittedAt } (submission). Update the assertions (or change the endpoint response shape) so the test matches the real API contract.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
|
clsoing this pr since we found a cheaper and more optimised + faster approach in this pr: #374 |


feat: Dynamic Sitemap Generation with Cron Refresh, Snapshot Fallback, and Google Search Console Auto-Submission
The blog previously served a static, manually maintained
sitemap.xml, new posts, authors, and tags were never reflected unless someone updated it by hand. This PR replaces it with a fully automated sitemap pipeline athttps://keploy.io/blog/sitemap.xmlthat crawls WordPress GraphQL, refreshes daily via Vercel cron, maintains a three-tier snapshot fallback so crawlers always receive valid XML during outages, and auto-submits to Google Search Console after every successful refresh.Iteration History
multiple rounds of GitHub Copilot code review issues addressed across iterations: cursor pagination infinite-loop guard, CSP regex exclusion for the sitemap route,
CRON_SECRETmisconfiguration (500 vs 401 distinction),no-storeon 503,/tmpsnapshot edge cases, TypeScript strict null guards, empty tag slug guard, andmaxDurationtuning.File and Folder Structure
WordPress GraphQL Paginator
fetchAllPosts()paginates withfirst: 50, after: cursorordered bymodified DESC.AbortSignal.timeout)2000ms × attempt(2s → 12s, max 42s total)408, 429, 500–504,AbortError,TypeError, network failures, and GraphQL-level errors on 200 OK (WPGraphQL returns these during plugin reload / DB lock)datafieldhasNextPage: truewith noendCursor, throws immediately to prevent infinite page-1 re-fetchCategory-to-Route Mapping
mapCategoriesToRoutes()maps WP categories to"technology"or"community"by matching both slug and name (lowercased) to handle editorial inconsistencies. Posts with no matching category are excluded.Entry Builders
Posts:
priority: 0.8if modified in last 30 days,0.5otherwise;changefreq: weekly; slug isencodeURIComponent-encoded.Authors:
lastmod= newest post by that author;priority: 0.7.Tags: one entry per unique tag from included posts; tag display names sanitized to URL slugs via
sanitizeStringForURL(); empty/whitespace tags skipped;priority: 0.7.Static routes : 7 hardcoded entries;
lastmodset to the newest post modified time so listing pages reflect freshest content.XML Serialization
Manual generation: no external library. All values passed through
escapeXml()(&,",',<,>). Priority formatted via.toFixed(1).dedupeEntries()uses aMapkeyed by URL to eliminate any overlapping entries before serialization.Three-Tier Snapshot Fallback
When fresh generation fails:
lastSuccessfulSitemapXml): updated after every successful refresh; instant, no I/O; survives within the same Lambda instance./tmpfile (/tmp/keploy-blog-sitemap.xml): written after every successful refresh; validated byisValidSitemapXml()(checks XML declaration,<urlset>namespace, closing tag) before use.getStaticFallbackXml()): 7 hardcoded routes returned with HTTP 503 +Cache-Control: no-store: never cached by edge, enabling immediate recovery once WordPress is back.Concurrency Guard
refreshSitemapPromiseis a module-level deduplication guard. Concurrent callers share one in-flight crawl rather than each triggering an independent WordPress fetch. Cleared in.finally()so the next call after resolution starts fresh.Cron Endpoint Security
Auth checked before method — prevents leaking valid HTTP methods to unauthenticated callers
GETonly :405 Method Not AllowedwithAllow: GETheader for anything elseCRON_SECRET→500(misconfiguration), not401(wrong token) : distinguishes deployment error from auth failure200 ok: true; sitemap refresh is never coupled to Google's availabilityGoogle Search Console Integration
Full OAuth 2.0 service account flow, no third-party Google SDK:
GOOGLE_SERVICE_ACCOUNT_EMAIL+GOOGLE_SERVICE_ACCOUNT_PRIVATE_KEYusing Node.jscrypto.createSign('RSA-SHA256'): per RFC 7518 (JWA) and RFC 7519 (JWT)grant_type: urn:ietf:params:oauth:grant-type:jwt-beareratoauth2.googleapis.com/token: per RFC 7523 and Google OAuth 2.0 Service Account docsPUTtogoogleapis.com/webmasters/v3/sites/{siteUrl}/sitemaps/{sitemapUrl}: Search Console Sitemaps API\\nsequences replaced with real newlines before signing: Vercel stores multi-line env vars with literal\n;crypto.createSignrequires a valid PEM with real line breaks. Source: Node.js crypto docsEntirely optional :
isSearchConsoleSubmissionConfigured()checks all three required env vars; missing any one skips the step withskipped: truein the response.Cache-Control Strategy
public, max-age=0, s-maxage=86400, stale-while-revalidate=86400no-store/blog/api/(.*)no-store/_next/static/(.*)public, max-age=31536000, immutable/blog/pagespublic, max-age=3600, s-maxage=86400, stale-while-revalidate=604800Sitemap is excluded from CSP headers in both
vercel.jsonandnext.config.jsvia regex : XML responses have no use for CSP.Build-Time Validation
next.config.jscallsURL.canParse(process.env.WORDPRESS_API_URL)at build time: the build fails with a clear error if the variable is missing or invalid, preventing a silent misconfigured deploy. (as suggested and imroved by copilot multiple times)Edge Cases Covered
/tmp→ static 503assertFullSitemap()throws if < 5 posts per categoryhasNextPage: truebut no cursorrefreshSitemapPromise— one crawl, all callers wait/tmpsnapshotisValidSitemapXml()rejects it; falls to static fallbackCRON_SECRETnot set500(misconfiguration), not401200 ok: true/authors/URL emittedflatMap+ early return guardescapeXml()escapes&,",',<,>dedupeEntries()Map-deduplication before serializationno-storeon 503; edge retries origin on next request\n.replace(/\\n/g, "\n")before JWT signingTesting
E2E Tests (Playwright)
tests/e2e/Sitemap.spec.ts:GET /sitemap.xml→ HTTP 200,Content-Type: application/xml,Cache-Controlhass-maxage=86400+max-age=0+stale-while-revalidate=86400, valid XML declaration, correct<urlset>namespace, core<loc>entries present.tests/e2e/RefreshSitemapCron.spec.ts: no auth → 401; wrong token → 401; POST with valid token → 405 +Allow: GET; GET with valid token → 200,ok: true,entryCount > 0,generatedAtas string,searchConsole.submittedas boolean.Live Verification Results
xml.etree)<lastmod>/tmpsnapshot written200 {ok: true, entryCount: 1561, searchConsole.submitted: true}sc-domain:keploy.ioPriority distribution:
1.0× 1,0.9× 2,0.8× 84,0.7× 1060,0.6× 2,0.5× 412.Local GSC Verification Script, this was a script i generated using claude for testing and the steps were completed successfully
Mirrors the production JWT + submission flow; loads from
.env.local; exits code 1 with structured JSON error on failure. Use this to verify credentials before deploy without running the full app.