Add helpers to support a 'hybrid cache' option that uses locmem + DB cache by stevejalim · Pull Request #15859 · mozilla/bedrock

stevejalim · 2025-01-14T12:55:23Z

In order to balance the need for a distributed cache with the speed of a local-memory cache (which we'll use in #15628), we've come up with a couple of helper functions that wrap the following behaviour:

Getting values:

If it's in the local-memory cache, return that immediately.
If it's not, fall back to the DB cache, and if the key exists there, return that, cacheing it in local memory again on the way through
If the local memory cache and DB cache both miss, just return the default value for the helper function

Setting values:

Set the value in the local memory cache and DB cache at (almost) the same time
If the DB cache is not reachable (eg the DB is a read-only replica), log this loudly, as it's a sign the helper has not been used appropriately, but still set the local-memory version for now, to prevent total failure.

IMPORTANT: before this can be used in production, we need to create the cache table in the database with ./manage.py createcachetable AFTER this code has been deployed. This sounds a bit chicken-and-egg but we hopefully can do it via direct DB connection in the worst case.

We also ensure that the sqlite export has the cache table in existence (and that it's empty) so that demos, etc that need sqlite to run don't blow up on boot once this code is live - we'll need to run fresh exports of all three envs' DBs as soon as this is out in the wild, but that's fine.

I used an AI to write some of this code - Chat GPT for ideation and Copilot, for some tests, but I reviewed and tweaked all code before using

Significant changes and points to review

What do you think about this approach?
Bearing in mind that we won't have a writeable DB available everywhere, are these helpers clear enough about where it's OK to use them? Would you like any other protections?
Should we add metrics to the usage of the DB cache like we do with the in-memory cache?
How do you feel about us manually connecting to the DB to boostrap the DB cache table? Other options (a k8s upgrade Job seems a bit heavy for something that only needs to run once ever.)

Issue / Bugzilla link

Related to #15505

Testing

Reviewing the unit tests should be enough for now, but feel free to play with it in your shell.

codecov · 2025-01-14T13:03:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.34%. Comparing base (b3110e2) to head (b00eec2).
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #15859      +/-   ##
==========================================
+ Coverage   79.28%   79.34%   +0.05%     
==========================================
  Files         159      159              
  Lines        8343     8365      +22     
==========================================
+ Hits         6615     6637      +22     
  Misses       1728     1728

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…cache In order to balance the need for a distributed cache with the speed of a local-memory cache, we've come up with a couple of helper functions that wrap the following behaviour: Getting values: * If it's in the local-memory cache, return that immediately. * If it's not, fall back to the DB cache, and if the key exists there, return that, cacheing it in local memory again on the way through * If the local memory cache and DB cache both miss, just return the default value for the helper function Setting values: * Set the value in the local memory cache and DB cache at (almost) the same time * If the DB cache is not reachable (eg the DB is a read-only replica), log this loudly, as it's a sign the helper has not been used appropriately, but still set the local-memory version for now, to prevent total failure. IMPORTANT: before this can be used in production, we need to create the cache table in the database with ./manage.py createcachetable AFTER this code has been deployed. This sounds a bit chicken-and-egg but we hopefully can do it via direct DB connection in the worst case.

…le, to keep demos and other sqlite-powered things happy

robhudson

simple and straight-forward code.

My only concern is caching adds complication and this is like "nested caching". But the short cache time on the local cache lessens that concern.

bedrock/base/cache.py

bedrock/base/tests/test_hybrid_cache_helpers.py

robhudson · 2025-01-14T17:12:12Z

bedrock/settings/base.py

+        # See bedrock.base.cache.get_from_hybrid_cache and set_in_hybrid_cache
+        "LOCATION": "hybrid_cache_db_table",  # name of DB table to be used
+        "BACKEND": "django.core.cache.backends.db.DatabaseCache",
+        "TIMEOUT": None,  # cached items will not expire


Do we want this to be forever? Have we considered a long timeout so they eventually do get purged? No expiry ever seems like this table would potentially bloat over time.

This is the default timeout for the cache, so that we can support the idea of never-expiring things, but set_in_hybrid_cache does allow a timeout to be set if we wanted to. If we had a default here, my understanding is that if we did set timeout=None a call to set something in the cache, it would result in the defualt being used instead of None

Related to this, though, I've just thought about providing more flexibility in speccing cache times for db and locmem cacheing when using set_in_hybrid_cache, so I'll re-request an R after pushing that

Looking at the cache backends in Django, they use django.core.cache.backends.base.DEFAULT_TIMEOUT to differentiate between None and the default.

The backends also define a get_backend_timeout that could be useful?

My thought is we should use a very long expiry as the default, even if it's like 1 year or something, just so old unused cache does get purged. And have to explicitly set None to have no timeout at all, b/c that feels like it should be on purpose.

Ah, great - if it is possible to override the default with None, then that's great. Will make the change now

stevejalim · 2025-01-15T15:11:24Z

@robhudson Do you think we should add some metrics to the use of the DB cache?

… still be set to None/no-expiry when needed

stevejalim force-pushed the 15505--add-hybrid-cache branch from ed7e609 to fd6026c Compare January 14, 2025 14:49

Add support to the sqlite export command to boostrap the DB cache tab…

9d295b2

…le, to keep demos and other sqlite-powered things happy

stevejalim marked this pull request as ready for review January 14, 2025 14:56

stevejalim requested a review from a team as a code owner January 14, 2025 14:56

robhudson approved these changes Jan 14, 2025

View reviewed changes

stevejalim added 3 commits January 15, 2025 14:53

Remove debugging print statement

f247a5d

Comment fixups

8838dc6

Improve configurability of hybrid cache timeouts + add relevant tests

42c700f

stevejalim requested a review from robhudson January 15, 2025 15:11

Set a very long default cache time for the DB-backed cache, which can…

b00eec2

… still be set to None/no-expiry when needed

stevejalim merged commit c352036 into main Jan 16, 2025
5 checks passed

stevejalim deleted the 15505--add-hybrid-cache branch January 16, 2025 12:14

stevejalim mentioned this pull request Jan 16, 2025

Revert "Add helpers to support a 'hybrid cache' option that uses locmem + DB cache" #15881

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add helpers to support a 'hybrid cache' option that uses locmem + DB cache#15859

Add helpers to support a 'hybrid cache' option that uses locmem + DB cache#15859
stevejalim merged 6 commits intomainfrom
15505--add-hybrid-cache

stevejalim commented Jan 14, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jan 14, 2025 •

edited

Loading

Uh oh!

robhudson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robhudson Jan 14, 2025

Uh oh!

stevejalim Jan 15, 2025

Uh oh!

robhudson Jan 15, 2025

Uh oh!

stevejalim Jan 15, 2025

Uh oh!

stevejalim commented Jan 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stevejalim commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Significant changes and points to review

Issue / Bugzilla link

Testing

Uh oh!

codecov bot commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

robhudson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robhudson Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

stevejalim Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

robhudson Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

stevejalim Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

stevejalim commented Jan 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stevejalim commented Jan 14, 2025 •

edited

Loading

codecov bot commented Jan 14, 2025 •

edited

Loading