Avoid unnecessary CMS DB hits for pages that will 404#15628
Avoid unnecessary CMS DB hits for pages that will 404#15628stevejalim wants to merge 5 commits intomainfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15628 +/- ##
==========================================
+ Coverage 79.29% 79.30% +0.01%
==========================================
Files 159 161 +2
Lines 8347 8443 +96
==========================================
+ Hits 6619 6696 +77
- Misses 1728 1747 +19 ☔ View full report in Codecov by Sentry. |
dc54a7d to
dbd3748
Compare
8be921d to
c7873f7
Compare
|
Talking with Pmac, we're gonna try using postgres as a DB-backed networked cache, possibly/probably with a locmem cache on each pod. |
dbd3748 to
b42a955
Compare
845f97e to
df5b15d
Compare
|
So the whole db-based cache thing didn't work out, because getting from the cache also triggers the invalidation check, which then can result in an error when called on a readonly postgres DB (which is the situation for the web pods). Instead, new-new plan:
|
2af4211 to
eabc17c
Compare
Saves 11 SQL queries on the releasnotes page by cacheing the country code lookup for an hour. Tested on /en-US/firefox/132.0.1/releasenotes/ Cold cache: 14 queries / 2066ms Warm cache: 3 queries / 222ms
…CMS for a page ...because if the page isn't in the lookahead, we can avoid touching the DB just to ultimately return a 404
…cache In order to balance the need for a distributed cache with the speed of a local-memory cache, we've come up with a couple of helper functions that wrap the following behaviour: * If it's in the local-memory cache, return that immediately. * If it's not, fall back to the DB cache, and if the key exists there, return that, cacheing it in local memory again on the way through * If the local memory cache and DB cache both miss, just return the default value for the helper function * Set the value in the local memory cache and DB cache at (almost) the same time * If the DB cache is not reachable (eg the DB is a read-only replica), log this loudly, as it's a sign the helper has not been used appropriately, but still set the local-memory version for now, to prevent total failure. IMPORTANT: before this can be used in production, we need to create the cache table in the database with ./manage.py createcachetable AFTER this code has been deployed. This sounds a bit chicken-and-egg but we hopefully can do it via direct DB connection in the worst case.
eabc17c to
edf49d4
Compare
maribedran
left a comment
There was a problem hiding this comment.
@stevejalim This makes sense to me. The only situation I can think of right now that could break this is if we ever decide to use the RoutablePageMixin: https://docs.wagtail.org/en/stable/reference/contrib/routablepage.html#routable-page-mixin
Maybe we could check if that app is installed and, if a page inherits from that class, we never raise a 404 for paths that start with that page's URL.
Yeah, I remember discussing this with someone else and we agreed routable pages would be a problem with this approach - but because we (currently) don't use them, it wasn't a big concern.
I'm thinking along similar lines - there must be a way to introspect the page and get all the routes that defined via the |
This changeset adds a lookahead check before we call
wagtail.views.serveto know whether it's worth asking the CMS to serve a page.The idea is that if the page isn't in the lookahead's data source, we can avoid touching the DB just to ultimately return a 404 earlier.
This will save us DB load, particularly when we get drive-by scans that pepper the site with irrelevant URLs.
Significant changes and points to review
Please be sceptical about this, particularly around cache invalidation/cache updating - e.g. when a page is published, unpublished or moved in the page tree.
Important: this change will need to go hand in hand with an infra update that does give us a networked cache (Redis or Memcached) - if we stick with LocMemCache, then while the pods can and will build their own lookahead in local cache, that cache will not be invalidated when a page is published, unpublished or moved.Issue / Bugzilla link
#15505 #14742
Testing
Details to come