Remove rules that redirect URLs made up by AI/bot crawlers by speth · Pull Request #277 · Cantera/cantera-website

speth · 2025-10-22T14:02:15Z

The bandwidth usage from cantera.org has recently surged. Investigating the logs showed that the increased bandwidth was mainly due to a flood of requests for URLs such as:

/documentation/reactors/releasenotes/thermo/thermo/kinetics/python/constants.html
/documentation/python/kinetics/reactors/yaml/reactors/reactors/reactors/releasenotes/releasenotes/v3.1.html
/documentation/dev/doxygen/reference/cxx/thermo/yaml/kinetics/yaml/yaml2ck.html
/documentation/dev/doxygen/reference/releasenotes/examples/reactors/thermo/python/python/lxcat_conversion.html

The /documentation prefix was used in the old (pre-Cantera 3.1) website, while the rest of these URLs seem to be composed of components of valid URLs on the Cantera website, but arranged in some random order that corresponds to no page that has ever existed. All of these requests provide user agents claiming to be a real web browser, rather then identifying themselves as bots, and provide no referrer URL, and are distributed across 1000s of IPs. I can only assume this is some AI or bot trying to scrape content for training.

The high bandwidth usage was mainly due the the fact that we were redirecting any URL starting with /documentation to the root of the reference documentation, on the basis that this would be better than giving a 404 to an old deep link into the docs. However, by doing so these crawlers then read not only this page but the full set of resources (.css and .js files) to render this page, consuming quite a bit of bandwidth.

By removing this redirect and returning 404 for these URLs, the bandwidth usage has dropped back down to more manageable levels. I'm hoping this bot net will give up at some point. I've also dropped another rule that could have a similar effect if a bot started to explore that space.

bryanwweber

Thanks Ray! I'd been wondering about those emails from Linode and I can confirm I haven't gotten one recently.

Remove rules that redirect URLs made up by AI/bot crawlers

6883dbf

bryanwweber approved these changes Oct 23, 2025

View reviewed changes

speth merged commit 25cf8f9 into Cantera:main Oct 23, 2025
1 check passed

speth deleted the manage-crawlers branch October 23, 2025 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove rules that redirect URLs made up by AI/bot crawlers#277

Remove rules that redirect URLs made up by AI/bot crawlers#277
speth merged 1 commit intoCantera:mainfrom
speth:manage-crawlers

speth commented Oct 22, 2025

Uh oh!

bryanwweber left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

speth commented Oct 22, 2025

Uh oh!

bryanwweber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants