Skip to content

Add helpers to support a 'hybrid cache' option that uses locmem + DB cache#15859

Merged
stevejalim merged 6 commits intomainfrom
15505--add-hybrid-cache
Jan 16, 2025
Merged

Add helpers to support a 'hybrid cache' option that uses locmem + DB cache#15859
stevejalim merged 6 commits intomainfrom
15505--add-hybrid-cache

Conversation

@stevejalim
Copy link
Contributor

@stevejalim stevejalim commented Jan 14, 2025

In order to balance the need for a distributed cache with the speed of a local-memory cache (which we'll use in #15628), we've come up with a couple of helper functions that wrap the following behaviour:

Getting values:

  • If it's in the local-memory cache, return that immediately.
  • If it's not, fall back to the DB cache, and if the key exists there, return that, cacheing it in local memory again on the way through
  • If the local memory cache and DB cache both miss, just return the default value for the helper function

Setting values:

  • Set the value in the local memory cache and DB cache at (almost) the same time
  • If the DB cache is not reachable (eg the DB is a read-only replica), log this loudly, as it's a sign the helper has not been used appropriately, but still set the local-memory version for now, to prevent total failure.

IMPORTANT: before this can be used in production, we need to create the cache table in the database with ./manage.py createcachetable AFTER this code has been deployed. This sounds a bit chicken-and-egg but we hopefully can do it via direct DB connection in the worst case.

We also ensure that the sqlite export has the cache table in existence (and that it's empty) so that demos, etc that need sqlite to run don't blow up on boot once this code is live - we'll need to run fresh exports of all three envs' DBs as soon as this is out in the wild, but that's fine.

  • I used an AI to write some of this code - Chat GPT for ideation and Copilot, for some tests, but I reviewed and tweaked all code before using

Significant changes and points to review

  • What do you think about this approach?

  • Bearing in mind that we won't have a writeable DB available everywhere, are these helpers clear enough about where it's OK to use them? Would you like any other protections?

  • Should we add metrics to the usage of the DB cache like we do with the in-memory cache?

  • How do you feel about us manually connecting to the DB to boostrap the DB cache table? Other options (a k8s upgrade Job seems a bit heavy for something that only needs to run once ever.)

Issue / Bugzilla link

Related to #15505

Testing

Reviewing the unit tests should be enough for now, but feel free to play with it in your shell.

@codecov
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.34%. Comparing base (b3110e2) to head (b00eec2).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15859      +/-   ##
==========================================
+ Coverage   79.28%   79.34%   +0.05%     
==========================================
  Files         159      159              
  Lines        8343     8365      +22     
==========================================
+ Hits         6615     6637      +22     
  Misses       1728     1728              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…cache

In order to balance the need for a distributed cache with the speed of a
local-memory cache, we've come up with a couple of helper functions that wrap
the following behaviour:

Getting values:
* If it's in the local-memory cache, return that immediately.
* If it's not, fall back to the DB cache, and if the key exists there,
return that, cacheing it in local memory again on the way through
* If the local memory cache and DB cache both miss, just return the
default value for the helper function

Setting values:
* Set the value in the local memory cache and DB cache at (almost) the same time
* If the DB cache is not reachable (eg the DB is a read-only replica), log
this loudly, as it's a sign the helper has not been used appropriately, but
still set the local-memory version for now, to prevent total failure.

IMPORTANT: before this can be used in production, we need to create the
cache table in the database with ./manage.py createcachetable AFTER this
code has been deployed. This sounds a bit chicken-and-egg but we hopefully can
do it via direct DB connection in the worst case.
@stevejalim stevejalim force-pushed the 15505--add-hybrid-cache branch from ed7e609 to fd6026c Compare January 14, 2025 14:49
…le, to keep demos and other sqlite-powered things happy
@stevejalim stevejalim marked this pull request as ready for review January 14, 2025 14:56
@stevejalim stevejalim requested a review from a team as a code owner January 14, 2025 14:56
Copy link
Contributor

@robhudson robhudson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple and straight-forward code.

My only concern is caching adds complication and this is like "nested caching". But the short cache time on the local cache lessens that concern.

# See bedrock.base.cache.get_from_hybrid_cache and set_in_hybrid_cache
"LOCATION": "hybrid_cache_db_table", # name of DB table to be used
"BACKEND": "django.core.cache.backends.db.DatabaseCache",
"TIMEOUT": None, # cached items will not expire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be forever? Have we considered a long timeout so they eventually do get purged? No expiry ever seems like this table would potentially bloat over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default timeout for the cache, so that we can support the idea of never-expiring things, but set_in_hybrid_cache does allow a timeout to be set if we wanted to. If we had a default here, my understanding is that if we did set timeout=None a call to set something in the cache, it would result in the defualt being used instead of None

Related to this, though, I've just thought about providing more flexibility in speccing cache times for db and locmem cacheing when using set_in_hybrid_cache, so I'll re-request an R after pushing that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the cache backends in Django, they use django.core.cache.backends.base.DEFAULT_TIMEOUT to differentiate between None and the default.

The backends also define a get_backend_timeout that could be useful?

My thought is we should use a very long expiry as the default, even if it's like 1 year or something, just so old unused cache does get purged. And have to explicitly set None to have no timeout at all, b/c that feels like it should be on purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great - if it is possible to override the default with None, then that's great. Will make the change now

@stevejalim stevejalim requested a review from robhudson January 15, 2025 15:11
@stevejalim
Copy link
Contributor Author

@robhudson Do you think we should add some metrics to the use of the DB cache?

@stevejalim stevejalim merged commit c352036 into main Jan 16, 2025
5 checks passed
@stevejalim stevejalim deleted the 15505--add-hybrid-cache branch January 16, 2025 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants