In the example projects, we have this handy piece of info:
# Note that named_caches and lmdb_store falls back to partial restore keys which
# may give a useful partial result that will save time over completely clean state,
# but will cause the cache entry to grow without bound over time.
# See https://www.pantsbuild.org/2.21/docs/using-pants/using-pants-in-ci for tips on how to periodically clean it up.
# Alternatively you change gha-cache-key to ignore old caches.
And then we have the suggestion to use this action, and instructions about manual usage and a cache nuke function: https://www.pantsbuild.org/2.21/docs/using-pants/using-pants-in-ci#directories-to-cache
Problem is, as the partial restore key is so lenient - and the cache key is strict enough, that using the nuke function from the docs won't work most of the time.
To reduce the monotonically increasing cache usage, a user will need to explicitly and manually change the cache key, or run a nuke function in the actions that will ALSO have an impact on cache saving (e.g. lockfiles change, pants.toml change, etc).
I used https://github.com/sureshjoshi/pants-plugins as a cache testing example:

With the second-last entry, in spite of removing almost all dependencies in that commit, we’re still pulling 220MB of cache - and that never gets cleared out. We have to explicitly bust the cache with a new cache key, and run everything from scratch to get the benefit.
Here is another example where I nuke the cache, but since the cache key doesn't change - this gives the "Cache hit occurred ... not saving cache"

I had the idea to try to use the gh cli to prematurely delete/expire caches, but since this would happen after the cache is downloaded - it would require special treatment.
I think the most reasonable, practical answer is to add some more documentation to this Action (and probably pantsbuild.org), as well as having some sort of automatic nuke-check on cache saving.
This might require using the restore/save cache actions, if there is no hook on cache itself to know if the saving cache key will be invalidated easily.
Essentially:
- Run the action as normal
- During post-action hooks, ask if it's a new cache key? (e.g. was
pants.toml or named-caches-hash modified)
- If not, do nothing
- If so, run
nuke_if_too_big $named_cache_dir $named_cache_limit_mb
In the example projects, we have this handy piece of info:
And then we have the suggestion to use this action, and instructions about manual usage and a cache nuke function: https://www.pantsbuild.org/2.21/docs/using-pants/using-pants-in-ci#directories-to-cache
Problem is, as the partial restore key is so lenient - and the cache key is strict enough, that using the nuke function from the docs won't work most of the time.
To reduce the monotonically increasing cache usage, a user will need to explicitly and manually change the cache key, or run a nuke function in the actions that will ALSO have an impact on cache saving (e.g. lockfiles change,
pants.tomlchange, etc).I used https://github.com/sureshjoshi/pants-plugins as a cache testing example:
With the second-last entry, in spite of removing almost all dependencies in that commit, we’re still pulling 220MB of cache - and that never gets cleared out. We have to explicitly bust the cache with a new cache key, and run everything from scratch to get the benefit.
Here is another example where I nuke the cache, but since the cache key doesn't change - this gives the "Cache hit occurred ... not saving cache"
I had the idea to try to use the
ghcli to prematurely delete/expire caches, but since this would happen after the cache is downloaded - it would require special treatment.I think the most reasonable, practical answer is to add some more documentation to this Action (and probably pantsbuild.org), as well as having some sort of automatic nuke-check on cache saving.
This might require using the restore/save cache actions, if there is no hook on cache itself to know if the saving cache key will be invalidated easily.
Essentially:
pants.tomlornamed-caches-hashmodified)nuke_if_too_big $named_cache_dir $named_cache_limit_mb