Skip to content

Add caching to instrument and live-data endpoints#622

Draft
Dagonite wants to merge 2 commits intomainfrom
endpoint-caching
Draft

Add caching to instrument and live-data endpoints#622
Dagonite wants to merge 2 commits intomainfrom
endpoint-caching

Conversation

@Dagonite
Copy link
Collaborator

@Dagonite Dagonite commented Mar 10, 2026

Closes #

Description

Adds Valkey caching to four endpoints that previously hit the database on every request despite returning data that rarely changes. This follows the same cache-aside pattern already established for the job list/count endpoints in jobs.py.

Cached endpoints

Endpoint TTL Cache key Invalidated by
GET /live-data/instruments 120s fia_api:live_data:instruments TTL expiry only
GET /live-data/{instrument}/script 60s fia_api:live_data:script:{INSTRUMENT} PUT /live-data/{instrument}/script
GET /instrument/{name}/specification 120s fia_api:instrument:spec:{NAME} PUT /instrument/{name}/specification
GET /instrument/{instrument}/latest-run 15s fia_api:instrument:latest_run:{INSTRUMENT} PUT /{instrument}/latest-run

How it works

Each GET endpoint checks Valkey for a cached response before querying PostgreSQL. On a cache miss, the DB result is stored in Valkey with a TTL. When the corresponding PUT endpoint is called (e.g. a staff member updates a live data script), the cache entry is immediately invalidated by writing None with a 1-second TTL, so the next GET fetches fresh data from the database.

For endpoints where the value can legitimately be None (live data script, instrument specification), the cached value is wrapped in a dict (e.g. {"script": value}) so that a cache miss (None from Valkey) can be distinguished from a cached None value.

Why this is useful

  • Live data instruments — The list of instruments with live data support only changes when an admin enables/disables an instrument. Without caching, every page load of the live data view queries the instruments table.
  • Live data scripts — Scripts are updated infrequently by staff. The live data viewer polls this endpoint, meaning the same script is fetched from the database repeatedly for no reason.
  • Instrument specifications — Configuration data that only changes via explicit staff PUT calls. Cached with a longer TTL since changes are rare.
  • Instrument latest run — Changes more frequently (updated when new data arrives from the beamline), so it has a shorter 15s TTL. This endpoint is polled by the frontend instrument status view.

Configuration

All TTLs are configurable via environment variables and default to sensible values:

LIVE_DATA_INSTRUMENTS_CACHE_TTL_SECONDS=120
LIVE_DATA_SCRIPT_CACHE_TTL_SECONDS=60
INSTRUMENT_SPEC_CACHE_TTL_SECONDS=120
INSTRUMENT_LATEST_RUN_CACHE_TTL_SECONDS=15

Setting any of these to 0 disables caching for that endpoint (matching the existing pattern for JOB_LIST_CACHE_TTL_SECONDS).

Testing locally

# Run the new cache-hit tests
pytest test/e2e/test_endpoint_cache.py -v

The new tests mock cache_get_json to return a cached payload and assert that the underlying DB service function is never called, confirming the cache-hit path works correctly. There is also a test for the edge case where a None script is cached.

Manual verification with a local Valkey instance

If you want to test the caching behaviour against a real Valkey server rather than relying on the mocked unit tests:

# Start a Valkey container from WSL
docker run -d --name valkey -p 6379:6379 valkey/valkey:latest

# Set the env var so the API connects to it
export VALKEY_URL=redis://localhost:6379/0

# Start the API in dev mode
DEV_MODE=1 uvicorn fia_api.fia_api:app --reload

You can then watch cache keys being set and expiring:

# In another terminal, connect to the Valkey CLI
docker exec -it valkey valkey-cli

# Monitor all commands in real time
127.0.0.1:6379> MONITOR

# Or inspect specific keys
127.0.0.1:6379> KEYS fia_api:*
127.0.0.1:6379> TTL fia_api:live_data:instruments
127.0.0.1:6379> GET fia_api:live_data:instruments

Calling GET /live-data/instruments twice should show a SETEX on the first call and a GET hit on the second. Calling the corresponding PUT endpoint should show the key being invalidated.

Test plan

  • New cache-hit tests pass (test/e2e/test_endpoint_cache.py — 5 tests)
  • Existing unit tests unaffected (158 passed)
  • E2E tests pass with live DB and Valkey
  • Verify each PUT endpoint invalidates the corresponding cache (next GET returns fresh data)
  • Verify setting a TTL env var to 0 disables caching for that endpoint

@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.53%. Comparing base (dc11fea) to head (b20993b).

Files with missing lines Patch % Lines
fia_api/routers/live_data.py 20.83% 19 Missing ⚠️
fia_api/routers/instrument.py 28.57% 10 Missing ⚠️
fia_api/routers/instrument_specs.py 28.57% 10 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (dc11fea) and HEAD (b20993b). Click for more details.

HEAD has 5 uploads less than BASE
Flag BASE (dc11fea) HEAD (b20993b)
8 3
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #622       +/-   ##
===========================================
- Coverage   96.37%   79.53%   -16.85%     
===========================================
  Files          48       48               
  Lines        1961     2008       +47     
===========================================
- Hits         1890     1597      -293     
- Misses         71      411      +340     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant