feat: added dynamic sitemap generation using sitemap.ts in nextjs - 🏗️#353
feat: added dynamic sitemap generation using sitemap.ts in nextjs - 🏗️#353amaan-bhati wants to merge 7 commits intomainfrom
Conversation
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Code Review SummaryStatus: 4 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)CRITICAL
WARNING
Resolved Issues (fixed in commit a9e70f6)
Other Observations (not blocking)
Files Reviewed (2 files)
Fix these issues in Kilo Cloud Reviewed by claude-4.5-opus-20251124 · 214,435 tokens |
There was a problem hiding this comment.
Pull request overview
Adds a Next.js App Router sitemap endpoint and related TypeScript config updates to support dynamic sitemap generation from the WordPress GraphQL backend.
Changes:
- Added
app/sitemap.tsimplementingMetadataRoute.Sitemapwith static + WP-derived dynamic entries. - Updated
tsconfig.json(formatting + Next TS plugin +.next/typesinclude + enabledstrictNullChecks). - Updated
public/sitemap.xmlcontents (but this now conflicts with the dynamic sitemap route).
Reviewed changes
Copilot reviewed 1 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| tsconfig.json | Adjusts TS config for Next.js tooling and enables strictNullChecks. |
| public/sitemap.xml | Updates the static sitemap file, but introduces invalid content and a route conflict with the new dynamic sitemap. |
| app/sitemap.ts | Implements dynamic sitemap generation by paginating WordPress posts via GraphQL and emitting MetadataRoute.Sitemap. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…d authors from wpgraphql Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
… bad gateway edge case Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: Amaan Bhati <94218318+amaan-bhati@users.noreply.github.com>
| ` | ||
| const data = await fetchGraphQL<any>(query, { after: after || null }) | ||
| if (!data?.[type]) { | ||
| console.log(`DEBUG: fetchGraphQL returned missing data for ${type}`) |
There was a problem hiding this comment.
[WARNING]: Debug logs use console.log instead of console.debug.
Per the custom instruction "If debug log statements have functions that can be expensive in non-debug mode (function still executing even if debug mode is not enabled), ensure to gate them" — these console.log calls with template literal interpolation execute in production, incurring unnecessary overhead.
Consider changing to console.debug(...) or gating with a debug flag:
| console.log(`DEBUG: fetchGraphQL returned missing data for ${type}`) | |
| if (process.env.NODE_ENV === 'development') console.debug(`fetchGraphQL returned missing data for ${type}`) |
| break | ||
| } | ||
|
|
||
| console.log(`DEBUG: Fetched ${type} page with ${data[type].edges.length} items. hasNextPage: ${data[type].pageInfo.hasNextPage}, endCursor: ${data[type].pageInfo.endCursor}`) |
There was a problem hiding this comment.
[WARNING]: Debug log executes in production.
This console.log labeled as DEBUG should use console.debug to avoid production log noise and unnecessary string interpolation overhead. Per custom instruction #3: "Info logs should be really relevant info to the user; if not, they should be debug."
| console.log(`DEBUG: Fetched ${type} page with ${data[type].edges.length} items. hasNextPage: ${data[type].pageInfo.hasNextPage}, endCursor: ${data[type].pageInfo.endCursor}`) | |
| if (process.env.NODE_ENV === 'development') console.debug(`Fetched ${type} page with ${data[type].edges.length} items. hasNextPage: ${data[type].pageInfo.hasNextPage}, endCursor: ${data[type].pageInfo.endCursor}`) |
|
|
||
| // Failsafe to prevent excessive polling | ||
| if (allNodes.length > 5000) { | ||
| console.log(`DEBUG: Failsafe triggered for ${type}`) |
There was a problem hiding this comment.
[WARNING]: Debug log should be gated.
Same issue as above — this DEBUG-prefixed log uses console.log and executes in production.
| console.log(`DEBUG: Failsafe triggered for ${type}`) | |
| if (process.env.NODE_ENV === 'development') console.debug(`Failsafe triggered for ${type}`) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export default async function sitemap(): Promise<MetadataRoute.Sitemap> { | ||
| const baseUrl = 'https://keploy.io/blog' | ||
|
|
||
| // Sequential fetching to deeply respect WP Engine GraphQL burst limits | ||
| const posts = await fetchAllPosts() |
There was a problem hiding this comment.
The new sitemap endpoint is SEO-critical but isn’t covered by existing e2e tests. Consider adding a Playwright test that requests /blog/sitemap.xml, asserts a 200 + application/xml content-type, and verifies it contains a few expected URLs.
| if (!res.ok) { | ||
| console.error("fetchGraphQL res not ok:", res.status, res.statusText) | ||
| return null | ||
| } | ||
| const json = await res.json() | ||
| return json.data as T | ||
| } catch (error) { | ||
| console.error('WPGraphQL fetch error:', error) | ||
| return null |
There was a problem hiding this comment.
These error logs don’t provide a clear next step, and they’ll appear in production if WPGraphQL is transiently unavailable (e.g., rate limiting). Consider improving the message to include actionable guidance (e.g., check WPGraphQL endpoint/env vars, retry later) and/or handling retries/backoff; also consider avoiding logging full errors repeatedly during ISR to reduce log noise.
| console.error("fetchGraphQL res not ok:", res.status, res.statusText) | ||
| return null | ||
| } | ||
| const json = await res.json() | ||
| return json.data as T | ||
| } catch (error) { | ||
| console.error('WPGraphQL fetch error:', error) |
There was a problem hiding this comment.
fetchGraphQL ignores GraphQL-level errors (json.errors) and will return json.data even when the response contains errors or partial data. That can silently generate an incomplete/incorrect sitemap; consider checking json.errors (similar to lib/api.ts:25-28) and failing fast or returning null with an actionable error message.
| console.error("fetchGraphQL res not ok:", res.status, res.statusText) | |
| return null | |
| } | |
| const json = await res.json() | |
| return json.data as T | |
| } catch (error) { | |
| console.error('WPGraphQL fetch error:', error) | |
| console.error( | |
| 'fetchGraphQL HTTP error:', | |
| res.status, | |
| res.statusText, | |
| '- please verify the WPGraphQL endpoint URL and check server logs for more details.' | |
| ) | |
| return null | |
| } | |
| const json = await res.json() | |
| if (json && Array.isArray(json.errors) && json.errors.length > 0) { | |
| console.error( | |
| 'fetchGraphQL GraphQL error(s) returned from WPGraphQL. Inspect the query and server logs for details:', | |
| json.errors | |
| ) | |
| return null | |
| } | |
| return json.data as T | |
| } catch (error) { | |
| console.error( | |
| 'WPGraphQL fetch error. Please check network connectivity, the WPGraphQL endpoint configuration, and server logs:', | |
| error | |
| ) |
| if (!data?.[type]) { | ||
| console.log(`DEBUG: fetchGraphQL returned missing data for ${type}`) | ||
| break | ||
| } | ||
|
|
||
| console.log(`DEBUG: Fetched ${type} page with ${data[type].edges.length} items. hasNextPage: ${data[type].pageInfo.hasNextPage}, endCursor: ${data[type].pageInfo.endCursor}`) | ||
|
|
||
| // Failsafe to prevent excessive polling | ||
| if (allNodes.length > 5000) { | ||
| console.log(`DEBUG: Failsafe triggered for ${type}`) | ||
| break | ||
| } |
There was a problem hiding this comment.
There are unconditional console.log “DEBUG:” statements in the sitemap generator. This will produce noisy logs during builds/ISR and doesn’t follow the logging guideline here (debug logs should be gated, and error logs should include a clear next step). Please remove these logs or gate them behind an explicit debug flag (e.g., an env var) and keep production output minimal/actionable.
| tags.forEach(tag => { | ||
| sitemapData.push({ | ||
| url: `${baseUrl}/tag/${encodeURIComponent(tag.slug)}`, | ||
| lastModified: tag.lastModified, | ||
| changeFrequency: 'weekly', | ||
| priority: 0.64, | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Sitemap tag URLs are built from tag.slug, but the site’s tag routes/links appear to use the tag name (e.g. components/tag.tsx links to /tag/${name}, and the previous sitemap contained URLs like /tag/Feature%20Flags). Using slug here may create URLs that are inconsistent with internal linking (and can cause duplicate/incorrect indexing). Consider generating tag URLs using the same value the route expects (or update routing to consistently use slugs).
| // Process Authors (verified exact path: /blog/authors/[slug]) | ||
| authors.forEach(author => { | ||
| sitemapData.push({ | ||
| url: `${baseUrl}/authors/${encodeURIComponent(author.slug)}`, | ||
| lastModified: author.lastModified, | ||
| changeFrequency: 'weekly', | ||
| priority: 0.64, | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Author URLs in the sitemap are built from WP user slug, but the site’s author routing uses sanitizeAuthorSlug(ppmaAuthorName) (see components/AuthorMapping.tsx / pages/authors/[slug].tsx). This mismatch can yield non-canonical or even 404 author URLs in the sitemap. Consider deriving author slugs using sanitizeAuthorSlug (e.g., by aggregating author names from posts) or by querying name and sanitizing it instead of using WP slug directly.
…pilot review Updated debug logging to use console.debug in development mode. Signed-off-by: Amaan Bhati <94218318+amaan-bhati@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Process Authors (verified exact path: /blog/authors/[slug]) | ||
| authors.forEach(author => { | ||
| sitemapData.push({ | ||
| url: `${baseUrl}/authors/${encodeURIComponent(author.slug)}`, | ||
| lastModified: author.lastModified, | ||
| changeFrequency: 'weekly', | ||
| priority: 0.64, | ||
| }) |
There was a problem hiding this comment.
Author sitemap entries are built from WPGraphQL users.node.slug (/authors/${slug}), but the existing author route (pages/authors/[slug].tsx) generates/looks up slugs via sanitizeAuthorSlug(ppmaAuthorName) derived from posts. These formats likely won’t match, leading to sitemap URLs that 404. Consider deriving author slugs the same way the route does (from ppmaAuthorName) or reusing sanitizeAuthorSlug when building author URLs.
| slug | ||
| posts(first: 1) { | ||
| nodes { | ||
| modified | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
fetchAllTaxonomies uses posts(first: 1) without an explicit orderby. This makes lastModified potentially incorrect (often returns most recent by DATE, not by MODIFIED), which undermines the goal of accurate <lastmod> values. Add an explicit order (e.g., order by MODIFIED DESC) to ensure the returned modified is actually the latest for that taxonomy/author.
| // Process Categories | ||
| categories.forEach(cat => { | ||
| sitemapData.push({ | ||
| url: `${baseUrl}/${encodeURIComponent(cat.slug)}`, | ||
| lastModified: cat.lastModified, | ||
| changeFrequency: 'weekly', | ||
| priority: 0.80, | ||
| }) |
There was a problem hiding this comment.
The sitemap currently emits URLs for all WP categories (${baseUrl}/${cat.slug}), but the Next.js routes in this repo only support /technology and /community (no generic category route). If WP ever contains other categories with posts, this will publish invalid URLs to crawlers. Consider filtering categories (and post primaryCategory) to the set of routes the frontend actually serves, or mapping WP categories to supported route segments.
| export default async function sitemap(): Promise<MetadataRoute.Sitemap> { | ||
| const baseUrl = 'https://keploy.io/blog' | ||
|
|
||
| // Sequential fetching to deeply respect WP Engine GraphQL burst limits | ||
| const posts = await fetchAllPosts() | ||
| const tags = await fetchAllTaxonomies('tags') | ||
| const categories = await fetchAllTaxonomies('categories') | ||
| const authors = await fetchAllTaxonomies('users') | ||
|
|
There was a problem hiding this comment.
There are Playwright E2E tests for SEO, but none cover /blog/sitemap.xml. Adding a basic test that requests the sitemap and asserts it returns XML with a few expected URLs (and no obvious 404 targets) would help prevent regressions in sitemap generation and routing alignment.
|
clsoing this pr since we found a cheaper and more optimised + faster approach in this pr: #374 |
Related Tickets & Documents
Fixes: #[issue-number] none, didnt create an issue for the same since id be picking this up
Description
Replaced the legacy, manually-maintained
public/sitemap.xmlwith a fully automated, dynamic sitemap generator located atapp/sitemap.ts. It leverages the Next.js App Router Metadata API and WPGraphQL to ensure that every new post, category, tag, and author published on WordPress is instantly and accurately indexed for SEO without manual intervention.sitemap.tsfor automated sitemap generation: (alongwith the code reference on github of some open source repos):reference and research sheet
Key Features and Automation
The script programmatically generates a sitemap at
keploy.io/blog/sitemap.xmlby fetching and mapping the following entities:/technology,/community)./tag/automation)./authors/[slug]).Technical Implementation & Best Practices
sitemap.tsconvention, which automatically handles XML serialization and Content-Type headers.tags { nodes { posts(first: 1) { nodes { modified } } } }When does it run?
The timing depends on your environment:
Performance & Edge Cases Covered
Promise.allto fetch Posts, Tags, Authors, and Categories concurrently. This triggered WPEngine's burst-limit protection, causing intermittent 502 errors.Important edge case (multiple ai agents fell short here lol)
/technologyotherwise the page would return 404 because there is no ui for uncategorised pages in our code.uncategorisedmean, these are when a blog's data comes from a route other than /technology and /community which is technically not possible because behind the scenes our blogs data comes from wordpress and hashnode, on hashnode we only have two categories, one for tech blogs and other for community blogs, data stored is at these two repositries (https://github.com/keploy/tech-blog) (https://github.com/keploy/community-blog), there is no possible case of uncategorised herecurl -s -X POST -H "Content-Type: application/json" \ -d '{"query": "{ categories(first: 100) { edges { node { slug count } } } }"}' \ https://wp.keploy.io/graphql | jq '.data.categories.edges[].node'which confirmed the following:
References and source of truths:
sitemap.tsfor automated sitemap generation: (alongwith the code reference on github of some open source repos):https://docs.google.com/spreadsheets/d/1BQOjPYPA7GUlWbioShqgF1SEGiOoOMskymxpl0ew8Lw/edit?usp=sharing
Type of Change
Environment and Dependencies
Checklist