feat: migrate sitemap to App Router ISR with automation#367
feat: migrate sitemap to App Router ISR with automation#367amaan-bhati wants to merge 29 commits intomainfrom
Conversation
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Adds a GitHub Actions workflow triggered by a Vercel deploy hook that hits /blog/sitemap.xml immediately after deployment, warming the ISR cache so the first real user or crawler never hits a cold Lambda. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… env - lib/google-search-console.ts: OAuth2 JWT flow for GSC submission, required by pages/api/cron/refresh-sitemap.ts - scripts/submit-sitemap-to-search-console.mjs: local dev script to manually submit sitemap to GSC without deploying - vercel.json: add cron schedule (daily midnight), add missing redirect, fix CSP header source regex to exclude sitemap.xml and /api/ paths - next.config.js: exclude sitemap.xml from Next.js CSP headers to keep both layers consistent with vercel.json - playwright.config.ts: inject CRON_SECRET=test-secret so e2e cron tests can authenticate against the local dev server Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Migrates sitemap generation from the Pages Router implementation to a Next.js App Router ISR route handler, relying on Vercel CDN caching and simplifying the cron job to only notify Google Search Console.
Changes:
- Add
app/sitemap.xml/route.tsISR route handler and newlib/api-server.ts(node:https) for server-side WPGraphQL access. - Refactor
lib/sitemap.tsinto pure sitemap adaptation/serialization utilities and remove the Pages Router sitemap endpoint. - Update cron handler and apply TypeScript strict-null-check fixes across several components/pages triggered by adding
app/.
Reviewed changes
Copilot reviewed 20 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tsconfig.json | Enables Next TS plugin + strictNullChecks, updates include paths for App Router types. |
| pages/technology/[slug].tsx | Fixes useRef initialization and types a previously implicit array. |
| pages/community/[slug].tsx | Fixes useRef initialization and types a previously implicit array. |
| pages/authors/[slug].tsx | Types a previously implicit array to satisfy strict null checks. |
| pages/api/cron/refresh-sitemap.ts | Simplifies cron endpoint to auth + optional GSC submission only. |
| lib/sitemap.ts | Introduces pure sitemap utilities (adapt/build/serialize) and fallback XML generator. |
| lib/api.ts | Tightens API URL config handling; expands getAllPosts query to include modified, categories/tags, and pageInfo pagination. |
| lib/api-server.ts | Adds server-only WPGraphQL client via node:https and a sitemap-focused pagination fetch. |
| components/TableContents.tsx | Fixes timeout ref typing and replaceState signature for strict null checks. |
| components/post-body.tsx | Adds explicit state typings for strict null checks. |
| components/NotFoundPage.tsx | Avoids optional-chaining pitfalls under strict null checks when slicing edges. |
| components/more-stories.tsx | Types the error state as `string |
| components/AuthorMapping.tsx | Adds explicit array typings for strict null checks. |
| app/sitemap.xml/route.ts | New ISR sitemap endpoint at /sitemap.xml (under basePath) with fallback behavior. |
| app/not-found.tsx | Adds App Router not-found boundary redirecting to Pages Router 404. |
| app/layout.tsx | Adds required App Router root layout with <html>/<body>. |
| .github/workflows/prewarm-sitemap.yml | Adds deploy-triggered workflow to warm the ISR sitemap cache. |
| pages/sitemap.xml.ts (deleted) | Removes Pages Router sitemap to avoid path conflict with App Router route. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Cover the full ISR sitemap flow: - Sitemap.spec.ts: status 200, Content-Type xml, correct s-maxage=3600 ISR cache headers, valid urlset structure, static routes presence, dynamic post count, lastmod dates, changefreq, CSP exclusion, and URL deduplication - RefreshSitemapCron.spec.ts: 401 without auth, 401 with wrong secret, 405 for non-GET (method not leaked to unauthenticated callers), 200 skipped response when GSC env vars are absent in test env, and no-store cache-control check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Next.js raises a build error when a static file in public/ and a route handler exist at the same path. The ISR route handler in app/sitemap.xml/route.ts supersedes this stale 2024 static file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 25 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
- tests/mock-server.js: add AllPostsForSitemap handler before generic AllPosts guard, returning combined tech+community edges so assertFullSitemap sees posts in both categories (previously communityCount=0 caused 503 in every sitemap e2e test) - pages/api/cron/refresh-sitemap.ts: set Cache-Control: no-store explicitly in the handler so the test assertion is framework-independent (vercel.json headers are a Vercel platform layer, not applied by the local Next.js server) - lib/api-server.ts: replace allEdges=[...allEdges,...edges] in pagination loop with allEdges.push(...edges) to avoid O(n²) array allocation across pages - lib/sitemap.ts: guard parseInt result with Number.isNaN so an invalid SITEMAP_MIN_POSTS_PER_CATEGORY env var falls back to 5 instead of silently making the assertion unreachable (NaN < 5 === false) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 28 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 28 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
vercel.json:60
- The
/blog/((?!(?:sitemap\.xml$|api/|_next/static/)).*)headers rule still matches/blog/_next/imageand will apply the CSP and the generic Cache-Control to Next’s image optimizer responses, potentially overriding Next’s own long-lived image caching (notably you’ve setimages.minimumCacheTTLto 1 year in next.config.js). Consider excluding_next/image(and any other non-HTML asset paths you rely on) from this rule so image requests keep their intended cache headers.
{
"source": "/blog/((?!(?:sitemap\\.xml$|api/|_next/static/)).*)",
"headers": [
{
"key": "Content-Security-Policy",
"value": "connect-src 'self' https://px.ads.linkedin.com https://www.google-analytics.com https://analytics.google.com https://region1.google-analytics.com https://stats.g.doubleclick.net https://rp.liadm.com https://idx.liadm.com https://pagead2.googlesyndication.com https://*.clarity.ms https://news.google.com https://assets.apollo.io https://wp.keploy.io https://cdn.hashnode.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://*.googlevideo.com https://googleads.g.doubleclick.net https://marketplace.visualstudio.com https://api.github.com https://pro.ip-api.com https://api.vector.co https://aplo-evnt.com https://ep1.adtrafficquality.google https://ppptg.com https://telemetry.keploy.io; frame-src 'self' https://www.googletagmanager.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://news.google.com https://googleads.g.doubleclick.net https://*.google.com https://ppptg.com; img-src 'self' https://c.bing.com https://ppptg.com https://pbs.twimg.com https://secure.gravatar.com https://wp.keploy.io https://keploy.io data:;"
},
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 28 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 31 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 30 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 30 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 29 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 29 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…y guards Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 29 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 29 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 29 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
clsoing this pr since we found a cheaper and more optimised + faster approach in this pr: #374 |
Follow up of the dynamic sitemap using page router pr: #355
Summary
Replaces the Pages Router sitemap (
pages/sitemap.xml.ts) and its custom crawler/retry/fallback machinery with a Next.js App Router ISR Route Handler that delegates caching entirely to Vercel's CDN.app/sitemap.xml/route.ts— ISR Route Handler (revalidate = 3600). Returnsapplication/xmldirectly vianew Response(). After first generation, every request is served from Vercel CDN edge (<10ms, no Lambda invoked). If WordPress is down during a background regeneration cycle, Vercel automatically keeps serving the previous good cached version.lib/api-server.ts— Server-only module usingnode:httpsdirectly instead offetch(). Required because Next.js App Router wrapsfetch()with RSC instrumentation that causes Cloudflare (in front ofwp.keploy.io) to return 502 HTML.node:httpsbypasses this entirely, identical to whatcurlsends.lib/sitemap.ts— Serializer utilities only (buildPostEntries,buildAuthorEntries,buildTagEntries,serializeSitemap,adaptPostsForSitemap,assertFullSitemap). No fetching, no retry logic.pages/sitemap.xml.ts— Deleted. Conflicts with the App Router route at the same URL path.pages/api/cron/refresh-sitemap.ts— Rewritten to GSC-submit only (~50 lines,maxDuration: 30). Sitemap generation is now ISR's job; the cron only tells Google the sitemap has been updated.app/layout.tsx— Required by Next.js 14 whenapp/directory is present. Must include<html><body>or the App Router runtime fails.app/not-found.tsx— Redirects unmatched App Router paths to the Pages Router 404 page..github/workflows/prewarm-sitemap.yml— Triggered by Vercel deploy hook. Warms the ISR cache immediately after deploy so the first crawler never hits a cold Lambda.What the review asked for → what we did
lib/api.tsinstead of custom crawlerlib/api-server.ts— same pagination logic,node:httpstransport to fix RSC 502getStaticProps + revalidateinstead ofgetServerSidePropsrevalidate = 3600maxDuration: 30on cronTypeScript fixes (side effect of adding
app/directory)Next.js 14 automatically enables
strictNullChecks: trueintsconfig.jsonwhen anapp/directory is created. This surfaced pre-existing implicit typing issues across several components. All fixes are type-annotation-only — no runtime behaviour changed.Vercel Pricing: how sitemap ISR usage is counted
Vercel ISR (our
GET /blog/sitemap.xml) is billed on these meters:Function invocations + compute
revalidate), not on every request.ISR durable cache writes (8KB units)
ISR durable cache reads (8KB units)
Cron
What our implementation costs (expected)
Sitemap config
revalidate = 3600(hourly)450 community + ~28 techposts (478post URLs) + static routes + derived author/tag URLs1 post every 2 days (15posts/month)Expected monthly usage attributable to the sitemap
15/monthto22/month(aligned with publishing cadence).720/monthif the sitemap is requested around/after every TTL expiry, but most will be identical output and won’t incur ISR write units.Sources
[S1] Vercel Docs — ISR Usage and Pricing (8KB units, reads/writes, durable vs CDN, identical output note)
https://vercel.com/docs/pricing/incremental-static-regeneration
[S2] Vercel Docs — Fluid compute pricing (Active CPU billed only during execution; pauses during I/O)
https://vercel.com/docs/functions/usage-and-pricing/
[S3] Vercel Docs — Cron Jobs: Usage & Pricing (cron jobs invoke functions; cron included; function pricing applies)
https://vercel.com/docs/cron-jobs/usage-and-pricing
Summary table
getServerSideProps/ per-request generationapp/sitemap.xml/route.tspages/sitemap.xml.ts/tmp→ static)503fallback only when no good cache existsrevalidate = 3600(ISR); regen on access after TTLrevalidate = 3600(ISR)lib/api-server.ts(minimal fields)lib/api.tshelpers/tmpreset on deploy)lib/api.ts)Few things we did not implement exactly as suggested:
getStaticProps+ reuselib/api.ts” sketch because generating the sitemap in the App Router context via the globalfetch()path (used bylib/api.ts) was returning 502 HTML from Cloudflare/WP in production, i tried doing this locally as well and it was throwing an error on the terminal.lib/api-server.ts(rawnode:http/https) which bypasses Next App Router fetch instrumentation.Why
lib/sitemap.tsstill existslib/sitemap.tsis not a parallel WordPress crawler anymore. It is the sitemap builder module:assertFullSitemap) to avoid caching incomplete crawls./tmpcaches, or run cron-based regeneration. WordPress crawling/pagination for sitemap happens inlib/api-server.tsonly.