Skip to content

audit-fixup: wire juniper_data_datasets_cached gauge to cache layer#92

Merged
pcalnon merged 2 commits intomainfrom
audit-fixup/wire-datasets-cached-gauge
May 6, 2026
Merged

audit-fixup: wire juniper_data_datasets_cached gauge to cache layer#92
pcalnon merged 2 commits intomainfrom
audit-fixup/wire-datasets-cached-gauge

Conversation

@pcalnon
Copy link
Copy Markdown
Owner

@pcalnon pcalnon commented May 6, 2026

Summary

Wires the previously-dead juniper_data_datasets_cached Prometheus Gauge to the CachedDatasetStore cache layer. The metric and its set_datasets_cached(n) helper have lived in juniper_data/api/observability.py since R3, with helper-level test coverage, but had zero production callers -- so the gauge always read 0 in scrape output.

This is the only test-vs-production gap surfaced by the post-METRICS-MON state report (juniper-ml#223 §15) -- a parallel PR is adding it to the post-METRICS-MON tracker (juniper-ml notes/code-review/POST_METRICS_MON_TRACKER_2026-05-05.md).

What changed

A new private CachedDatasetStore._emit_cached_count() helper probes the cache backend's list_datasets() (probe limit 10_000, matching warm_cache()) and publishes the count via set_datasets_cached(). Failures are caught and logged at DEBUG so the observability path can never break the storage path (mirrors the contextlib.suppress(Exception) discipline already used throughout the class).

Cache mutation sites wired:

Site File:Line
save() (write-through path) juniper_data/storage/cached.py:88
get_artifact_bytes() (read-through cache populate) juniper_data/storage/cached.py:131-132
delete() juniper_data/storage/cached.py:162-163
invalidate_cache() juniper_data/storage/cached.py:216
warm_cache() (post-batch) juniper_data/storage/cached.py:248-249

update_meta() is intentionally not wired -- it mutates an existing entry, never changes cardinality.

Gauge semantics: the metric's HELP string is "Number of datasets currently cached in storage", so this is Option A (count, not bytes). set_datasets_cached(int) lines up.

Test plan

  • New TestDatasetsCachedGauge class in juniper_data/tests/unit/test_cached_store.py (6 tests):
    • test_save_emits_cache_count -- gauge tracks insert count up to 3
    • test_delete_emits_decremented_cache_count -- gauge decrements on evict
    • test_invalidate_cache_emits_decremented_cache_count -- cache-only evict, primary untouched
    • test_warm_cache_emits_populated_count -- batch populate emits final count
    • test_get_artifact_bytes_read_through_emits_cache_count -- read-through populate emits
    • test_save_without_write_through_does_not_emit -- baseline: cache untouched, gauge untouched
  • Pattern: _value.get() on the registered Gauge plus a registry-reset fixture matching test_observability.py's autouse cleanup, to guard against Duplicated timeseries in CollectorRegistry.
  • pytest juniper_data/tests/unit/ -x -- 813 passed (pre-existing 807 + 6 new)
  • pre-commit run --files juniper_data/storage/cached.py juniper_data/tests/unit/test_cached_store.py -- all hooks pass (ruff lint, ruff format, mypy prod, mypy tests, bandit)
  • CI green

Refs

  • juniper-ml#223 (post-METRICS-MON state report §15: production-caller gap)
  • juniper-ml notes/code-review/POST_METRICS_MON_TRACKER_2026-05-05.md (parallel PR adds this as a tracked item)
  • juniper-data#65 (R4.5: original metric definition)

Closes the only test-vs-production gap from the state report.

The juniper_data_datasets_cached Gauge was defined in
juniper_data/api/observability.py with a tested set_datasets_cached()
helper, but had zero production callers -- the gauge would always
read 0 in scrape output. juniper-ml#223 (post-METRICS-MON state
report §15) flagged this as the only remaining test-vs-production
gap.

Wires the gauge into CachedDatasetStore (the canonical cache layer)
via a private _emit_cached_count() helper that probes the cache
backend for its current population and publishes via
set_datasets_cached(). Failures are swallowed so observability
never breaks the storage path -- mirrors the contextlib.suppress
discipline already used everywhere else in the class.

Cache mutation sites wired:

  - save() (write_through path)            cached.py:88
  - get_artifact_bytes() (read-through)    cached.py:131-132
  - delete()                               cached.py:162-163
  - invalidate_cache()                     cached.py:216
  - warm_cache() (post-batch)              cached.py:248-249

update_meta() is intentionally NOT wired -- it mutates an existing
entry, never changes cardinality.

Tests: six new tests in test_cached_store.py cover insert / evict /
invalidate / warm / read-through populate / no-write-through. Uses
the _value.get() pattern + a registry-reset fixture matching
test_observability.py.

Local results:
  pytest juniper_data/tests/unit/ -x   813 passed
  pre-commit run --files cached.py test_cached_store.py   all pass

Refs: juniper-ml#223 §15
Tracker: juniper-ml notes/code-review/POST_METRICS_MON_TRACKER_2026-05-05.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread juniper_data/tests/unit/test_cached_store.py Fixed
@pcalnon pcalnon self-assigned this May 6, 2026
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Owner Author

@pcalnon pcalnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

@pcalnon pcalnon merged commit 3618d40 into main May 6, 2026
18 checks passed
pcalnon added a commit that referenced this pull request May 6, 2026
…_reuse (Phase 2a) (#95)

* refactor(observability): migrate to juniper-observability register_or_reuse (Phase 2a)

Phase 2a of the migration plan in
``juniper-ml/notes/observability/REGISTER_OR_REUSE_HELPER_DESIGN_2026-05-05.md``.
Drops the inline ``_get_or_create`` helper added in PR #87 and calls
the canonical ``juniper_observability.register_or_reuse`` shipped in
juniper-observability ``0.2.0``.

### What changes

- ``juniper_data/api/observability.py``: ``_ensure_dataset_metrics``
  now imports ``register_or_reuse`` from ``juniper_observability``
  (lazy, inside the ``if _dataset_metrics is None`` branch) and uses
  it for all four collectors (Counter ×2 + Histogram + Gauge). The
  inline ``_get_or_create`` closure and the manual
  ``REGISTRY._names_to_collectors`` lookup are gone.
- ``pyproject.toml``: ``juniper-observability>=0.1.1`` →
  ``juniper-observability>=0.2.0``. The new helper is the reason for
  the bump; existing 0.1.1 callers' behaviour is unchanged.

### Drive-by fix: storage circular import

PR #92 (``audit-fixup: wire juniper_data_datasets_cached gauge to
cache layer``) added ``from juniper_data.api.observability import
set_datasets_cached`` at the top of
``juniper_data/storage/cached.py``. That triggers the import chain:

    juniper_data.storage.__init__
      → cached.py
      → juniper_data.api.observability
      → juniper_data.api.__init__
      → app.py
      → juniper_data.storage  (← still initialising!)
      → ImportError: cannot import name 'LocalFSDatasetStore'

…which broke ``pytest --collect-only`` on origin/main as of
2026-05-06: 31 collection errors covering every test that imports
``juniper_data.storage`` directly or transitively. ``--collect-only``
amplifies the breakage, but a normal full pytest run also hits the
same path during fixture collection.

Fix: defer the ``set_datasets_cached`` import to inside
``CachedDatasetStore._emit_cached_count`` so the cycle never fires
at module-import time. The function is best-effort with a
``except Exception`` swallowing failures, so a deferred import that
raises ``ImportError`` (e.g. during a ``CachedDatasetStore``
instantiation in a non-API context) gets logged at DEBUG and skipped
the same way any other observability failure does.

### Verification

Full juniper-data suite under JuniperData env: **950 passed**
(was ``31 errors during collection`` on origin/main pre-fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ruff linter fix

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants