Skip to content

Conversation

@harlan-zw
Copy link
Contributor

πŸ”— Linked issue

#233

❓ Type of change

  • πŸ“– Documentation (updates to the documentation or readme)
  • 🐞 Bug fix (a non-breaking change that fixes an issue)
  • πŸ‘Œ Enhancement (improving an existing functionality)
  • ✨ New feature (a non-breaking change that adds functionality)
  • 🧹 Chore (updates to the build process or auxiliary tools and libraries)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)

πŸ“š Description

Two related bugs affecting robots.txt configuration:

  1. .includes() bug in normalizeGroup() - Used .includes() with a callback instead of .some(), causing _indexable to always be true even when disallow: ['/'] should make it false

  2. Missing normalization after robots:config hook - Groups added via the hook weren't normalized, leaving them without the _indexable property. This caused all URLs to be incorrectly marked as non-indexable,
    breaking sitemap generation for users relying on this hook.

@harlan-zw
Copy link
Contributor Author

harlan-zw commented Oct 4, 2025

@silverbackdan could you please review πŸ™

I still need to investigate the double hook call.

@silverbackdan
Copy link
Collaborator

@harlan-zw thanks for jumping on this so quickly. The fix and tests all look like a huge improvements.

Do you think that normalizeGroup needs to be called and processed again if _indexable is already found, to possibly prevent the need for re-normalizing already normalized groups? https://github.com/nuxt-modules/robots/blob/main/src/util.ts#L265

I'll check the implementation from this branch in my application using the currently released sitemap module and report back.

@silverbackdan
Copy link
Collaborator

This fixed branch and currently released sitemap module is deployed now to https://www.cymrukitchens.com/sitemap_index.xml

These sitemaps in this website are the ones that would disappear before so we'll see in an hour or so - but from the work you've done and tests in place this is sure to fix it.

]
for (const group of groups) {
if (!group._indexable) {
if (group._indexable === false) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alone would have kept my sitemaps populated :) :)

}
await nitro.hooks.callHook('robots:config', generateRobotsTxtCtx)
// Normalize groups after hook to ensure all groups have _indexable property
generateRobotsTxtCtx.groups = generateRobotsTxtCtx.groups.map(normalizeGroup)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem a shame to have to reprocess all the groups still doesn't it? It's just on the off change the hook call is calling a function which is designed to extend the config. Is it worth a different hook name, or adding a specific robots:config-extend hook just in the module? .. but I suppose if the groups are passed in any hook, they have the possibility of being altered.

Copy link
Contributor Author

@harlan-zw harlan-zw Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ideally we only normalise once but yeh the issue is we're giving the user normalised data that they can then modify (and un-normalise) we can't switch that around without a breaking change.

There's likely a optimisation around it but it adds complexity so this is the safest for now most likely but will review that before I merge

@silverbackdan
Copy link
Collaborator

Still fixed on my website, so I'm going to say the patch is a success πŸ₯³

@silverbackdan
Copy link
Collaborator

The other issue I reported in sitemap where the titles and content of the sitemap didn't update when linking between pages in production seems to have resolve it seems for now too..

@harlan-zw harlan-zw merged commit 59c4454 into main Oct 5, 2025
4 checks passed
@harlan-zw
Copy link
Contributor Author

So the thinking is to deprecate this runtime hook and introduce two new hooks for more fine-grain control over the robots with more preditable behavior. I'll attempt this in another PR with less urgency.

@silverbackdan
Copy link
Collaborator

That sounds good to me. We could possibly even add them in as additional hooks while this one is deprecated so not to cause breaking changes for a while.

I did have a thought that perhaps I was adding the same groups in again into the hook, but the logs I made didn't seem to suggest it at the time. I'm not entirely sure of how lots of the system work still, if this data is simply saved one in the server-side instance - or what specific event was triggering the sitemap to disappear after some time.

But this fix does work, and these can also be questions worth considering in a less urgent way as the functionality of the applications I've implemented this on no longer have bugs, so thank you very much for your work over the weekend on this!

@silverbackdan
Copy link
Collaborator

Ps - if I get a chance do you want me to go through like I did on sitemap to resolve all typecheck issues?

@harlan-zw
Copy link
Contributor Author

harlan-zw commented Oct 5, 2025

The sitemap module is one of my last outstanding modules where the types aren't checked in the CI, so robots should be good.

I've started working on #235 for the new hooks and any other changes to your initial issue. If you're interested in helping, I gave you write access; you could work on the same branch locally. All good if not though.

I am very open to reworking any code I've done in that PR I was just playing around to see what would be useful.

@silverbackdan
Copy link
Collaborator

Thanks very much, I'll write some comments there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants