Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an existing toggle?
If not, making it a config (plausibly something we might give site owners control over) and giving it a less obscure name would be good.
If we had a config like this, it would be nice if it controlled robots.txt too I assume
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah its a new one. how about user > config: disable_site_indexing? or site_robots_noindex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it would be a site API change to check the config when the site is activated to decide if robots is enabled or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see a couple options:
allow_robotsfield as source of truth. Update /services/users/v1/sites to return allow_robots as well as site status. Fetch this in Kibble and put it on the site model. Core template could simply usesite.AllowRobotsin the template. Add a separate config that specifies this is a "private" site and don't flip allow robots when going live.seo_disable_meta_robotsand use it directly from the template. Fix the go live issue separately.The clean way is probably also a ton of faff that would require coordinating deploys and version bumps so quick and dirty seems appealing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The go-live issue could be mitigated by leaving the allow_robots setting alone in that API and change it so that having a sitelock implies robots.txt is always denied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is really only a few sites where they want the site to go live without site lock but not have it visible to the public / the site is invite only, therefore relying on the site lock wont work in this case, and its nice to ensure robots is enabled for all the other sites. I think for a quick solution, we can create a config for core to write a noindex meta tag for all pages and therefore if robots is enabled then the crawler will hit the site and see not to index any pages. It does mean we have the site no robots field and config for essentially the same thing which could be confusing though.