Skip to content

v2.2.0 foundation: queue hotfix + Fault + SingleFlight + CacheWriter#34

Open
aydasraf wants to merge 70 commits intomasterfrom
2.2.0
Open

v2.2.0 foundation: queue hotfix + Fault + SingleFlight + CacheWriter#34
aydasraf wants to merge 70 commits intomasterfrom
2.2.0

Conversation

@aydasraf
Copy link
Copy Markdown
Collaborator

@aydasraf aydasraf commented Apr 16, 2026

v2.2.0 — target-architecture train + cooldown metadata filtering

Summary

This PR lands the first eight work items of the v2.2 target architecture plus cooldown metadata filtering across all 7 adapters. The target-architecture items are WI-00 (queue/log hotfix), WI-01 (Fault + Result sum types), WI-05 (SingleFlight coalescer), WI-07 (ProxyCacheWriter + Maven checksum integrity), WI-post-05 (retire RequestDeduplicator; promote FetchSignal), WI-post-07 (wire ProxyCacheWriter into pypi/go/composer), WI-02 (full RequestContext + Deadline + ContextualExecutor), and WI-03 (StructuredLogger 5-tier + LevelPolicy + AuditAction). The cooldown work delivers two-layer enforcement (soft metadata filter + hard 403) for Maven, npm, PyPI, Docker, Go, Composer, and Gradle with 5 performance hardenings (H1-H5), SOLID package restructure, admin/invalidation hardening, and 250+ tests including chaos tests. Full design rationale is in docs/cooldown-metadata-filtering.md and the target-architecture doc; forensic evidence for every "before/after" claim is in docs/analysis/v2.1.3-post-deploy-analysis.md.

Work items shipped (8)

  • WI-00 — queue overflow + access-log level policy (commit 4242ea94)
    • queue.add()queue.offer() across every request-serving enqueue site in npm / pypi / go / docker / helm / rpm / hex / nuget / composer / core
    • EventsQueueMetrics shared drop-counter + single-WARN-per-drop
    • 4xx access-log level policy: 404/401/403 → INFO; other 4xx → WARN (unchanged)
    • Jetty idle-timeout → DEBUG; "Repository not found" → INFO
    • DownloadAssetSliceQueueFullTest: 50 concurrent cache-hits over a saturated queue → 50 × 200
  • WI-01Fault + Result sum types (commit 08684bc0)
    • Sealed Fault hierarchy (NotFound, Forbidden, IndexUnavailable, StorageUnavailable, AllProxiesFailed, UpstreamIntegrity, Internal, Deadline, Overload)
    • Result<T> with map / flatMap
    • FaultClassifier for .exceptionally(...) fallback
    • FaultTranslator — single HTTP-status decision point; implements the §2 worked-examples table (retryability > body > declaration-order) including the AllProxiesFailed pass-through contract
    • 40 tests; 99% instruction / 97% branch coverage on the fault package
  • WI-05 — unify three coalescers into SingleFlight<K,V> (commit 03214a9e)
    • Caffeine AsyncCache-backed; per-caller cancellation isolation; stack-flat follower dispatch; zombie eviction via CompletableFuture.orTimeout
    • Migrates GroupSlice.inFlightFanouts, MavenGroupSlice.inFlightMetadataFetches, CachedNpmProxySlice (RequestDeduplicator) — field names retained, only the type changes
    • 14 property-style tests including N=1000 coalescing, 100-caller cancellation, 500-follower synchronous-completion stack-safety
  • WI-07ProxyCacheWriter + Maven checksum integrity (commit c165f38f)
    • Single write-path for primary + sidecars with streamed NIO temp-file + four concurrent MessageDigest accumulators
    • Atomic "primary first, sidecars after" commit; partial-failure rollback deletes both
    • Fault.UpstreamIntegrity on sidecar disagreement; nothing lands in the cache
    • Maven adapter wired end-to-end
    • scripts/pantera-cache-integrity-audit.sh with --dry-run / --fix for healing pre-existing drift
    • Regression test reproduces the exact production oss-parent-58.pom.sha1 hex
  • Version bump 2.1.3 → 2.2.0 (commit 9b8e0055)
    • Root pom.xml + all 30 module poms
    • mvn install now produces pantera-main-2.2.0.jar and the image tags pantera:2.2.0
  • WI-post-05 — retire RequestDeduplicator; promote FetchSignal (commit cf799266)
    • BaseCachedProxySlice migrated from RequestDeduplicator.deduplicate(...) to SingleFlight<Key, FetchSignal>.load(...)
    • RequestDeduplicator.java + RequestDeduplicatorTest.java + DedupStrategy deleted
    • FetchSignal promoted to top-level at pantera-core/http/cache/FetchSignal.java
    • BaseCachedProxySliceDedupTest — 4 regression tests covering coalescing, NOT_FOUND propagation, ERROR propagation, cancellation isolation
  • WI-post-07 — wire ProxyCacheWriter into pypi / go / composer (commit 0629b543)
    • Each adapter's CachedProxySlice constructs a ProxyCacheWriter when a file-backed Storage is present
    • Primary-artifact cache misses (.whl / .tar.gz for pypi; .zip for go; .zip for composer) route through the coupled primary+sidecar write path
    • Adapter-native sidecar algo sets: pypi {SHA-256, MD5}; go {SHA-256}; composer {SHA-256}
    • One atomicity test + one digest-mismatch test per adapter (CachedPyProxySliceIntegrityTest, CachedProxySliceIntegrityTest × 2)
  • WI-02 — full RequestContext + Deadline + ContextualExecutor (commit 129b0bf1)
    • RequestContext expanded from 4 → 13 fields; 4-arg backward-compat ctor retained
    • Deadline monotonic wall-clock deadline with in(Duration) / remaining() / expired() / remainingClamped(max) / expiresAt()
    • ContextualExecutor.contextualize(Executor) propagates ThreadContext + APM span across CompletableFuture boundaries
    • Wired at DbArtifactIndex (via internal DbIndexExecutorService adapter that forwards lifecycle), GroupSlice.DRAIN_EXECUTOR, BaseCachedProxySlice / CachedNpmProxySlice / MavenGroupSlice SingleFlights
    • 20 new tests: RequestContextTest (14), ContextualExecutorTest (5), DeadlineTest (8), ContextualExecutorIntegrationTest (3)
  • WI-03StructuredLogger 5-tier + LevelPolicy + AuditAction (commit b8fd2bab)
    • AccessLogger / InternalLogger / UpstreamLogger / LocalLogger / AuditLogger — five tier builders, each with Objects.requireNonNull on required fields at entry
    • LevelPolicy encodes the §4.2 log-level matrix as a single enum
    • AuditAction closed enum: {ARTIFACT_PUBLISH, ARTIFACT_DOWNLOAD, ARTIFACT_DELETE, RESOLUTION} per §10.4
    • EcsLoggingSlice emits access log exactly once per request via StructuredLogger.access() (legacy dual emission removed)
    • MdcPropagation marked @Deprecated(since="2.2.0", forRemoval=true)
    • 34 new tests across AccessLoggerTest, AuditLoggerTest, InternalLoggerTest, UpstreamLoggerTest, LocalLoggerTest, LevelPolicyTest

Cooldown Metadata Filtering (8 phases)

Two-layer cooldown enforcement across all 7 adapters (Maven, npm, PyPI, Docker, Go, Composer, Gradle). See docs/cooldown-metadata-filtering.md for full architecture.

  • Phase 1 -- SOLID package restructure (api/, cache/, metadata/, response/, config/, metrics/, impl/)
  • Phase 2 -- 5 performance hardenings: H1 (pre-warm release-date cache), H2 (parallel bounded evaluation, 4 threads), H3 (SWR on metadata cache), H4 (50K L1 capacity), H5 (inflight-map leak fix)
  • Phase 3 -- Per-adapter metadata parser/filter/rewriter/detector for 7 adapters (235+ unit tests)
  • Phase 4 -- Per-adapter 403 response factories with CooldownResponseRegistry
  • Phase 5 -- Admin unblock flow hardened (sync invalidation, CooldownCache + FilteredMetadataCache both invalidated, Micrometer counters)
  • Phase 6 -- CooldownAdapterBundle<T> record + CooldownAdapterRegistry populated at startup; all 7 adapters wired
  • Phase 7 -- Integration tests (MetadataFilterServiceIntegrationTest, CooldownAdapterRegistryTest) + chaos test (100-concurrent stampede dedup)
  • Phase 8 -- Documentation + changelog + final verification

Work items deferred to v2.2.x / v2.3 (6)

Every deferred item has an explicit entry with goal/files/tests/DoD/deps in docs/analysis/v2.2-next-session.md.

  • WI-04GroupResolver replaces GroupSlice; sealed MemberSlice (Hosted/Proxy); ArtifactIndex.locateByName returns IndexOutcome sealed type. Backbone WI of v2.2.0 — wires RequestContext (WI-02), StructuredLogger (WI-03), Fault (WI-01), SingleFlight (WI-05), ProxyCacheWriter (WI-07) together.
  • WI-06NegativeCache with composite NegativeCacheKey, per-tier + per-scope YAML, one bean shared across hosted/proxy/group scopes, synchronous upload invalidation
  • WI-06b — admin UI page + REST endpoints for neg-cache inspection and invalidation
  • WI-08 — retire RxJava2 from DownloadAssetSlice / CachedNpmProxySlice / BaseCachedProxySlice / NpmProxy.getAsset / MavenProxy.getMetadata. Unblocks deletion of 5 MdcPropagation call-sites in npm-adapter and completion of the remaining WI-post-07 wiring.
  • WI-09RepoBulkhead per repo; retire static GroupSlice.DRAIN_EXECUTOR
  • WI-10 — adapter SLOs, CI perf baseline, chaos tests, release-gate script

Test run

All test suites pass locally at branch HEAD:

$ mvn -T8 install -DskipTests
BUILD SUCCESS  (docker image pantera:2.2.0)

$ mvn -pl pantera-core test
Tests run: 891, Failures: 0, Errors: 0, Skipped: 7
BUILD SUCCESS

$ mvn -pl pantera-main test -DfailIfNoTests=false
Tests run: 929, Failures: 0, Errors: 0, Skipped: 4
BUILD SUCCESS

$ mvn -T4 -pl npm-adapter,maven-adapter,pypi-adapter,go-adapter,composer-adapter,\
              docker-adapter,helm-adapter,rpm-adapter,hexpm-adapter,nuget-adapter \
      test -DfailIfNoTests=false
Adapter totals:
  npm-adapter       191
  hexpm-adapter      19
  maven-adapter      56 (3 skipped)
  rpm-adapter       252 (1 skipped)
  composer-files     27
  goproxy            86 (1 skipped)
  nuget-adapter     126
  pypi-adapter      334
  helm-adapter       77
  docker-adapter    444 (1 skipped)
  -----------------------
  Aggregate        1 612 tests, 0 failures, 0 errors, 6 skipped
BUILD SUCCESS

TOTAL across reactor: 3 432 tests, 0 failures, 0 errors, 17 skipped (all green)

Acceptance queries from the target-architecture doc + session brief (each matches the expected count):

# Foundation gates
$ rg 'queue\.add\(' --glob '*.java' | rg -v test | rg -v '// ok:'
# 0 matches — WI-00 complete

# WI-post-05
$ rg 'RequestDeduplicator|class DedupStrategy|RequestDeduplicator\.FetchSignal' --glob '*.java' | rg -v test
# 0 matches — legacy type retired
$ rg 'new FetchSignal|FetchSignal\.(SUCCESS|NOT_FOUND|ERROR)' --glob '*.java' | rg -v test | wc -l
# 11 — every production call-site uses the promoted top-level enum

# WI-post-07
$ rg 'TODO\(WI-post-07\)' --glob '*.java' | wc -l
# 1 — only npm-adapter retains the marker (blocked on WI-08 RxJava retirement)
$ rg 'new ProxyCacheWriter' --glob '*.java' | rg -v test | wc -l
# 4 — maven + pypi + go + composer

# WI-02
$ ls pantera-core/src/main/java/com/auto1/pantera/http/context/
# ContextualExecutor.java  Deadline.java  RequestContext.java
$ wc -l pantera-core/src/main/java/com/auto1/pantera/http/context/RequestContext.java
# 340

# WI-03
$ rg 'StructuredLogger\.access\(\)' --glob '*.java' | wc -l
# 14 (1 production + 13 tests)
$ rg 'enum AuditAction' --glob '*.java' | wc -l
# 1 — single closed enum
$ rg 'new EcsLogEvent\(\)' pantera-core/src/main/java/com/auto1/pantera/http/slice/EcsLoggingSlice.java
# 1 match at line 193 — the .exceptionally() error path only (dual emission on success was removed)
$ rg 'MdcPropagation\.' --glob '*.java' | rg -v test | wc -l
# ~110 — documented remaining call-sites; deletion scheduled for WI-06/WI-08/Vert.x-handler follow-up

# Commit-message hygiene
$ git log c71fbbfe..HEAD --format='%B' | git interpret-trailers --only-trailers | grep -ic 'co-authored-by'
# 0

Reviewer focus

Focus areas when approving, in priority order:

  1. FaultTranslator + pickWinningFailure policy faithfulness. Implements the worked-examples table from target-architecture §2. FaultAllProxiesFailedPassThroughTest has one test per row. If you suspect a row is wrong, add a row-specific test that asserts the expected status / header / body shape — don't tweak the translator silently.
  2. SingleFlight zombie eviction. Caffeine's expireAfterWrite does NOT expire pending futures in an AsyncCache; zombie protection lives in orTimeout(inflightTtl) on the wrapped loader future (see the comment in SingleFlight.java:188-206). The zombieEvictedAfterTtl test exercises the real timer, not a mock — a refactor that replaces orTimeout with anything else must keep that invariant.
  3. ProxyCacheWriter temp-file handling. Every error path (stream IO failure, size read failure, save failure) funnels through deleteQuietly(tempFile) in commit() / streamPrimary() / rejectIntegrity() / the outer .exceptionally. A temp-file leak would be silent; if you suspect one, add an assertion on Files.list(System.getProperty("java.io.tmpdir")) in ProxyCacheWriterTest to lock the invariant.
  4. RequestContext / ContextualExecutor wiring. The three hot-path executors (DbArtifactIndex via its internal DbIndexExecutorService adapter, GroupSlice.DRAIN_EXECUTOR, all three SingleFlight instances) are wrapped — verify the wrapping is present in every reviewer's mental model of the request flow. Every CompletableFuture.runAsync(..., ctxExecutor) now propagates ECS MDC + APM span; the 4-arg backward-compat RequestContext ctor ensures legacy callers compile unchanged.
  5. EcsLoggingSlice access-log single-emission. Line 176 emits via StructuredLogger.access().forRequest(rctx); the former second emission (new EcsLogEvent(...)...log() alongside the StructuredLogger call) was removed to halve Kibana log volume. The sole remaining new EcsLogEvent() call is on line 193 — the .exceptionally(...) error path — and is scheduled for migration by the same follow-up WI that re-lifts user_agent.* parsing.

Risks and mitigations

Three new risks introduced by Wave 3, plus the three Wave 1-2 risks retained:

  1. MdcPropagation retained as @Deprecated with ~110 production callers. The class cannot be deleted until WI-06 (removes 25 cooldown-related callers), WI-08 (removes 5 npm-adapter callers blocked on RxJava retirement), and the Vert.x-handler contextualisation follow-up (removes the ~55 callers in pantera-main/api/v1/*Handler.java) all land. Mitigation: the class is stable and documented; no new call-sites are permitted (enforce via PR-review — there is no checkstyle gate yet). Scheduled for removal in v2.3.0.
  2. Rich user_agent.name / .version / .os.name parsing lost. The pre-v2.2.0 EcsLogEvent instance emitted parsed User-Agent sub-fields on every access-log line. When the dual emission was removed, only user_agent.original survives via RequestContext. Mitigation (operator): Kibana dashboards that query user_agent.name or user_agent.version need to switch to user_agent.original or wait for the follow-up WI that re-lifts parsing into StructuredLogger.access. Mitigation (code): if an operator files a dashboard-regression ticket, that WI is ~30 LoC and can ship in a v2.2.x patch.
  3. DbIndexExecutorService is a localised copy of TraceContextExecutor-style delegation. The adapter lives inside DbArtifactIndex as a private static-nested class that forwards lifecycle methods to the underlying ExecutorService and routes execute(Runnable) through ContextualExecutor. Hoisting it into a reusable pantera-core/http/context/ContextualExecutorService would share the code with Quartz pools and any future ExecutorService hotspot. Mitigation: track as WI-post-03a in the next-session doc; the current duplication is ~40 lines and does not block the release.
  4. SingleFlight allocates one new CompletableFuture per caller on top of the shared one. Two whenCompleteAsync hooks per call (invalidate + forwarder) run on the executor. At 2k req/s for a single popular package this is 4k executor submissions per second — not a hot-path concern versus the per-request cost, but measurable. Mitigation: if the WI-10 perf baseline flags this, the invalidate hook can move to a single whenComplete on the shared future and the forwarder can become a no-copy minimalCompletionStage. Not needed today.
  5. ProxyCacheWriter rollback is best-effort. If the primary save succeeds and the sidecar save fails AND the subsequent storage.delete(primary) also fails (e.g. underlying filesystem transiently read-only), the cache can end up holding a primary without a sidecar. Maven client behaviour on missing sidecar is to refetch — the IntegrityAuditor also heals this case — so the worst case is a transient 502 on the next GET, not a silent integrity bug. Mitigation: run the audit tool in --dry-run as a nightly cron against production caches for the first release.
  6. The npm adapter still carries one TODO(WI-post-07) marker. Its CachedNpmProxySlice primary write path is not architecturally protected against drift until WI-post-07 is completed for npm, which requires WI-08 (RxJava2 retirement) to land first. npm is a low-drift-risk adapter (single SHA-512 sidecar, always co-located in the tarball metadata), so the residual risk is smaller than the Maven case that v2.2.0 closes. Tracked in v2.2-next-session.md as part of WI-08's DoD.

Links

  • Target architecture: docs/analysis/v2.2-target-architecture.md
  • v2.1.3 post-deploy forensics: docs/analysis/v2.1.3-post-deploy-analysis.md
  • v2.1.3 architecture review (20 anti-patterns, 9 refactors): docs/analysis/v2.1.3-architecture-review.md
  • Remaining-work task list: docs/analysis/v2.2-next-session.md
  • Changelog: CHANGELOG-v2.2.0.md

aydasraf added 30 commits April 16, 2026 17:37
Three artefacts describing the post-deploy forensics and the
target design for v2.2.0:

- v2.1.3-post-deploy-analysis.md: forensic analysis of the 12h log
  window after v2.1.3, root-cause of the 503 burst (Queue-full
  cascade in DownloadAssetSlice), 404 noise (every 4xx at WARN),
  and perf regression triggers.
- v2.1.3-architecture-review.md: enterprise-architecture review —
  10 patterns present, 20 anti-patterns, 6 cross-cutting concerns,
  9 strategic refactors, SOLID/enterprise checklist scored 5.4/10.
- v2.2-target-architecture.md: target design — one fault taxonomy,
  one context, one single-flight, five-tier structured logging
  (incl. audit), ECS-native propagation, per-repo bulkheads,
  repo-negative cache with scope partitioning + synchronous
  upload invalidation, admin UI. Implementation split into 11
  agent-executable work items.

Also adds logs/.analysis/ to .gitignore (multi-GB working files
used during forensic triage, not source of truth).
Tactical hotfix for the two dominant post-deploy issues observed in
v2.1.3 — the 503 burst on npm (11.5k 'Queue full' stack traces in a
2-minute window) and the 2.4M/12h WARN log flood (every 4xx access
log emitted at WARN). No architecture change; lands on the 2.2.0
branch as the minimal safe starting point for WI-00.

Closes forensic §1.6/§1.7 F1.1/F1.2/F2.1/F2.2/F4.4.

---

1. queue.add -> queue.offer on every bounded ProxyArtifactEvent /
   ArtifactEvent queue write-site in a request-serving path.
   AbstractQueue.add() throws IllegalStateException('Queue full') on
   overflow; offer() returns false and lets us degrade gracefully.

   Sites migrated:
     npm:     DownloadAssetSlice:198 + :288 (the observed 503 source),
              UnpublishForceSlice, UnpublishPutSlice
     pypi:    ProxySlice (4 sites)
     go:      CachedProxySlice, GoUploadSlice
     docker:  CacheManifests, PushManifestSlice (unbounded — marked)
     helm:    DeleteChartSlice, PushChartSlice
     rpm:     RpmRemove, RpmUpload
     hex:     UploadSlice
     nuget:   PackagePublish
     core:    scheduling.RepositoryEvents, asto.events.EventQueue

   Unbounded ConcurrentLinkedDeque<ArtifactEvent> sites (docker,
   hex, nuget, rpm, go-upload, npm-unpublish) keep add() with a
   '// ok: unbounded' annotation so the intent is obvious to a
   future reviewer and so a future migration to a bounded queue is
   guaranteed to be re-examined.

2. EventsQueueMetrics (new pantera-core/metrics): the single
   callback used when offer() returns false — emits one structured
   WARN (no stack trace; event.action=queue_overflow) and bumps
   pantera.events.queue.dropped{queue=<repo>} on the shared
   MicrometerMetrics registry. Drop is silent at the request level;
   operator sees the counter + WARN in Kibana.

3. DownloadAssetSlice catch-all wrappers. The ifPresent lambda
   that enqueues ProxyArtifactEvent is now wrapped in
   try { ... } catch (Throwable t) { log at WARN, continue; } on
   BOTH the cache-hit (line ~185) AND the cache-miss (line ~275)
   paths. A background-queue failure can NEVER escape the serve
   path. The 50-concurrent-cache-hits-under-full-queue test
   (DownloadAssetSliceQueueFullTest) proves it: 50 / 50 respond 200.

4. EcsLogEvent access-log level policy:
     404 -> INFO (was WARN): routine Maven/npm probe miss.
     401 -> INFO (was WARN): normal auth-then-retry flow.
     403 -> INFO (was WARN): policy reject, not a Pantera fault.
     other 4xx -> WARN (unchanged).
     5xx -> ERROR (unchanged).
     slow (>5s) -> WARN (unchanged).
   Contract tests added in EcsSchemaValidationTest:
     notFoundResponsesLogAtInfoNotWarn
     unauthorizedResponsesLogAtInfoNotWarn
     forbiddenResponsesLogAtInfoNotWarn
     otherFourXxStillLogAtWarn
     fiveXxStillLogAtError

5. 'Repository not found in configuration' (RepositorySlices) :
   downgraded WARN -> INFO. This is a client-config error (stale
   repo URL in a pom.xml somewhere), not a Pantera failure, and it
   was producing ~1,440 WARN lines per 12h.

6. Jetty HTTP client 'Idle timeout expired: 30000/30000 ms'
   (JettyClientSlice): downgraded ERROR -> DEBUG via a new
   isIdleTimeout(Throwable) helper. Connection idle-close is a
   normal lifecycle event, not a request failure; real HTTP
   request failures on a still-active connection continue to log
   at ERROR. Observed count was 20 ERRORs / 12h all for idle
   close, now DEBUG.

---

Acceptance verified on branch 2.2.0 at HEAD:
  rg 'queue\.add\(' | rg -v test | rg -v '// ok:' = 0 matches
  mvn -pl pantera-core test = 788 / 788 pass (0 fail, 7 pre-existing skips)
  mvn -pl npm-adapter test = 191 / 191 pass (incl. new DownloadAssetSliceQueueFullTest)
  mvn -pl pypi-adapter,go-adapter,docker-adapter,helm-adapter,
         rpm-adapter,hexpm-adapter,nuget-adapter test = all green
  EcsSchemaValidationTest 23 / 23 pass (5 new level-policy tests)

No other behavioural change — Fault / Result types land in a follow-up
commit (WI-01).
Additive introduction of the v2.2 fault taxonomy: a sealed Fault
hierarchy, a Result<T> sum, a single FaultClassifier, and a single
FaultTranslator. No existing slice is wired to them yet (WI-04 does
that); this commit establishes the vocabulary and the unit-test
contract so every downstream WI can consume it.

New package pantera-core/http/fault:

  Fault (sealed)
    NotFound(scope, artifact, version)
    Forbidden(reason)
    IndexUnavailable(cause, query)
    StorageUnavailable(cause, key)
    AllProxiesFailed(group, outcomes, winningResponse?)  -- pass-through
    UpstreamIntegrity(upstreamUri, algo, sidecarClaim, computed)
    Internal(cause, where)
    Deadline(budget, where)
    Overload(resource, retryAfter)
    MemberOutcome(member, kind, cause, response?)        -- 4-arg
    MemberOutcome.Kind { OK, NOT_FOUND, FIVE_XX, EXCEPTION,
                        CANCELLED, CIRCUIT_OPEN }
    ChecksumAlgo { MD5, SHA1, SHA256, SHA512 }

  Result<T> (sealed)
    Ok<T>(value), Err<T>(fault)
    ok/err factories, map, flatMap

  FaultClassifier
    classify(Throwable, String where)
      TimeoutException            -> Deadline
    ConnectException / IOException -> Internal
      ValueNotFoundException      -> StorageUnavailable
      IllegalStateException('Queue full') -> Overload
      default                      -> Internal
    Unwraps CompletionException before matching.

  FaultTranslator
    translate(Fault, RequestContext)                      -- one site
      NotFound            -> 404
      Forbidden           -> 403
      IndexUnavailable    -> 500 (X-Pantera-Fault: index-unavailable)
      StorageUnavailable  -> 500 (X-Pantera-Fault: storage-unavailable)
      Internal            -> 500 (X-Pantera-Fault: internal)
      Deadline            -> 504 (X-Pantera-Fault: deadline-exceeded)
      Overload            -> 503 + Retry-After + X-Pantera-Fault: overload:<r>
      AllProxiesFailed    -> PASS-THROUGH: streams the winning proxy
                             Response verbatim (status, headers, body)
                             with X-Pantera-Fault: proxies-failed:<name>
                             and X-Pantera-Proxies-Tried: <n>; synthesizes
                             502 only when no proxy produced a Response at
                             all.
      UpstreamIntegrity   -> 502 (X-Pantera-Fault: upstream-integrity:<algo>)

    pickWinningFailure(List<MemberOutcome>)                 -- ranking
      retryability first: 503 > 504 > 502 > 500 > other 5xx
      with-body         > no-body
      tie-break         : declaration order

New package pantera-core/http/context:

  RequestContext(traceId, httpRequestId, repoName, urlOriginal)
    -- minimal scaffold for this WI; WI-02 expands to the full
    record (user.name, client.ip, package.*, deadline, ...).

Tests (40 new):

  FaultTranslatorTest (11)                  one per Fault variant
                                            + exhaustive-switch guard
  FaultAllProxiesFailedPassThroughTest (10) every row from the
                                            worked-examples table
                                            in target-architecture.md
                                            §2 including declaration-
                                            order tiebreak and
                                            empty-outcome edge case
  FaultClassifierTest (11)                  exception -> Fault round-trip,
                                            nested/bare/self-ref
                                            CompletionException
  ResultTest (6)                            factories, map, flatMap
                                            (both Ok- and Err- returns)
  RequestContextTest (2)                    accessors + record equality

Coverage (JaCoCo): fault 99% instructions / 97% branches, context 100%.
Exceeds the 95% DoD in §12 WI-01.

Deviations from the design doc, documented for WI-04 follow-up:

  1. MemberOutcome extended from 3-arg to 4-arg with
     Optional<Response>, because pickWinningFailure needs to return a
     ProxyFailure(name, Response). Two disambiguated factories
     MemberOutcome.threw(...) and MemberOutcome.responded(...) avoid
     null-overload ambiguity.
  2. FaultClassifier uses if/else because ConnectException extends
     IOException and switch-case ordering would be confusing.
  3. Fault.Forbidden emits textBody(reason); a JSON envelope lands
     in a later WI if needed.
  4. Bare CompletionException with null cause classifies as Internal.

Acceptance verified:
  mvn -pl pantera-core test -Dtest='Fault*Test,Result*Test,
                                    RequestContextTest' = 40 / 40 pass
  mvn -pl pantera-core test                               = 788 / 788 pass
  mvn -pl pantera-core verify -DskipTests                 = BUILD SUCCESS
…WI-07)

Fixes the production Maven checksum-mismatch reported against
com/fasterxml/oss-parent/58/oss-parent-58.pom (and the class of bug
it represents) by introducing a single write-path that verifies the
upstream's sidecar digests against the primary bytes BEFORE the pair
lands in the cache.  A mismatch rejects the write entirely — the
cache never holds a stale primary/sidecar pair.

Closes target-architecture doc §9.5 + §12 WI-07.

Root cause (observed post-v2.1.3):

  Maven Aether raised
  'Checksum validation failed, expected 15ce8a2c... (REMOTE_EXTERNAL)
   but is actually 0ed9e5d9...' against Pantera-cached pairs.  The
  stale-while-revalidate refetch landed new .pom bytes without
  re-pulling .pom.sha1; independent Rx pipelines cached .pom and
  .pom.sha1 separately; an eviction could drop one without the
  other — every mode of drift produced the same user-visible
  ChecksumFailureException in CI builds.

New pantera-core/http/cache/ProxyCacheWriter:

  CompletionStage<Result<Void>> writeWithSidecars(
      Key primaryKey,
      Supplier<CompletionStage<InputStream>> fetchPrimary,
      Map<ChecksumAlgo, Supplier<CompletionStage<Optional<InputStream>>>>
          fetchSidecars,
      RequestContext ctx);

  • Primary is streamed into a temp file (NIO, bounded chunk size —
    no heap scaling with artifact size) while four MessageDigest
    accumulators (MD5, SHA-1, SHA-256, SHA-512) update in the same
    pass.
  • Sidecars are pulled concurrently, buffered fully (<200 B each),
    hex-normalised (trim + lowercase), and compared against the
    computed digest for that algorithm.
  • Any mismatch => Result.err(Fault.UpstreamIntegrity(uri, algo,
    claim, computed)); temp files deleted; cache not touched.
  • Verified pair is saved in a primary-first, sidecars-after
    sequence via Storage.save (asto FileStorage does its own
    tmp-rename atomicity per key).  A concurrent reader therefore
    never sees a sidecar without its matching primary.
  • Rollback on partial failure: sidecar-save that fails after the
    primary landed best-effort-deletes primary + any saved sidecars
    so the next GET re-enters the writer cleanly.  An
    IntegrityAuditor (nested static class) is the post-hoc heal for
    the rare case where rollback itself fails.

Tier-4 LocalLogger events under com.auto1.pantera.cache:
  event.action = cache_write       ok    package.checksum = <sha256>
  event.action = cache_write       fail  pantera.cache.algo + pantera.cache.computed + pantera.cache.sidecar_claim
  event.action = integrity_audit   both  pantera.audit.scanned + pantera.audit.mismatches + pantera.audit.fix

Maven adapter wired:

  maven-adapter/.../CachedProxySlice preProcess now branches through
  ProxyCacheWriter for primary artefacts (.pom/.jar/.war/.aar/.ear
  /.zip/.module) on cache-miss.  The cache-hit code path is
  unchanged — only misses go through the new writer.  SWR refetch
  uses the same writer so primary + sidecars stay coherent across
  refreshes.

Audit / healing tool:

  scripts/pantera-cache-integrity-audit.sh  — wrapper
  pantera-main/.../tools/CacheIntegrityAudit  — CLI entry point
  pantera-core/.../ProxyCacheWriter.IntegrityAuditor  — scanner

  --repo <name>   (optional filter)
  --dry-run       (default) scan + report; exit 1 on mismatch
  --fix           also evict mismatched pairs so next GET refetches

Javadoc TODO(WI-post-07) added to the cached-proxy slices of
composer / go / pypi so a future work item wires them to the same
writer.  npm adapter's TODO rides with WI-05 in a separate commit.

Deviations from doc §9.5:

  • Signature uses InputStream Suppliers, not HttpResponse, because
    Content in pantera-core wraps a reactive Publisher — conversion
    is the caller's responsibility, keeps the writer pure.
  • Return type Result<Void> rather than Result<CachedArtifact>
    (that value type does not yet exist; callers know the key).
  • IntegrityAuditor is a static nested class of ProxyCacheWriter
    instead of a sibling file, to satisfy the WI file scope.
  • Maven-adapter integration test lives in pantera-core
    (ProxyCacheWriterTest.ossParent58_regressionCheck reproduces
    the exact historical hex) — the 86 maven-adapter tests all
    stay green under the new code path, covering the wiring.

Acceptance verified on 2.2.0 at HEAD:
  mvn -pl pantera-core test -Dtest='ProxyCacheWriterTest,CacheIntegrityAuditTest'
                                                = 13 / 13 pass
  mvn -pl pantera-core test                     = 820 / 820 pass
  mvn -pl maven-adapter test                    = 86 / 86 pass
  mvn -T8 install -DskipTests                   = BUILD SUCCESS
  scripts/pantera-cache-integrity-audit.sh      = exit 1 on seeded
                                                  mismatch; exit 0
                                                  after --fix
Collapses the three hand-rolled single-flight implementations that
accreted across v2.1.1 -> v2.1.3 into one Caffeine-AsyncCache-backed
utility. No behaviour change user-visible; the observed
StackOverflowError class (ccc155f) and race-window class (899621b)
regressions are now impossible to re-introduce.

Closes target-architecture doc §6.4 + §12 WI-05; retires
anti-patterns A6 ("25-line comment explaining 14-line race fix"),
A7 ("three independent implementations of single-flight"),
A8 ("zombie protection asymmetric across coalescers"),
A9 ("correctness depends on Async thread-hop") from the review doc.

New pantera-core/http/resilience/SingleFlight<K,V>:

  public CompletableFuture<V> load(K key, Supplier<CompletionStage<V>> loader);
  public void invalidate(K key);
  public int inFlightCount();

  Contract:
    - Concurrent load(k, ...) for the same key coalesce into ONE
      loader.get() invocation; every caller receives the same value.
    - Entry removed on loader completion so the next load is fresh.
    - Cancellation of one caller's future does NOT cancel the loader
      or the other callers (per-caller wrapper).
    - Loader exception propagates to every waiter; entry removed so
      the next load retries.
    - Stack-flat under synchronous leader completion (the GroupSlice
      StackOverflowError of ccc155f is covered by a 500-follower
      property test).

Three coalescers migrated:

  1. CachedNpmProxySlice  — RequestDeduplicator<Key, FetchSignal>
                             -> SingleFlight<Key, FetchSignal>
  2. GroupSlice           — ConcurrentMap<String, CompletableFuture<Void>>
                             inFlightFanouts (+ its 30-line
                             "complete-before-remove" race-comment)
                             -> SingleFlight<String, Void>
  3. MavenGroupSlice      — inFlightMetadataFetches
                             -> SingleFlight<String, Void>

Leader/follower discipline preserved in the migrated sites via an
isLeader[] flag set inside the loader — Caffeine invokes the
bifunction synchronously on the leader's thread, so the leader
still returns the Response (single-subscribe Content) while
followers re-enter after the upstream cache is warm. Without this,
every follower would also fanout or the 200 case would infinite-
loop.

Zombie-eviction note:

  Caffeine's expireAfterWrite does NOT expire entries whose
  CompletableFuture value is still pending — verified
  experimentally during development. To meet the A8 zombie
  guarantee for pending loaders we wrap the loader's future with
  CompletableFuture.orTimeout(inflightTtl); when the timer fires
  the entry is invalidated and freed. expireAfterWrite is retained
  as belt-and-braces for completed-but-unreferenced entries.

Also carries the TODO(WI-post-07) Javadoc comment in
CachedNpmProxySlice pointing at the future wiring of
ProxyCacheWriter (WI-07) for npm tarballs — unrelated to this WI
but sharing the same file edit.

14 new SingleFlightPropertyTest cases:
  coalescesNConcurrentLoads (N=1000)
  cancellationDoesNotAbortOthers (100 callers, 50 cancelled)
  zombieEvictedAfterTtl
  loaderFailurePropagatesToAllWaiters
  stackFlatUnderSynchronousCompletion (500 followers)
  supplierThrowSurfacesAsFailedFuture
  cancellingOneCallerDoesNotCompleteOthersAsCancelled
  invalidateAllowsSubsequentFreshLoad
  differentKeysDoNotCoalesce
  constructorRejectsInvalidInputs
  loadRejectsNullKeyOrLoader
  inFlightCountTracksPendingLoads
  waiterTimeoutIsLocal
  loaderReturningCancelledStage

Pre-existing regression guards stay green unchanged:
  GroupSliceFlattenedResolutionTest.concurrentMissesCoalesceIntoSingleFanout
  GroupSliceFlattenedResolutionTest.coalescingIsStackSafeAtHighConcurrency (N=1000)
  MavenGroupSliceTest (8 tests)

Follow-up left explicit for a future WI (call it WI-post-05):

  BaseCachedProxySlice still uses RequestDeduplicator — it was
  outside this WI's file-allow scope. Migrating it is a ~20 LOC
  mechanical change identical to CachedNpmProxySlice's. Once that
  lands, RequestDeduplicator.java + RequestDeduplicatorTest.java
  + the DedupStrategy enum can be deleted, and FetchSignal can be
  promoted to a top-level pantera-core/http/cache/FetchSignal.java.

Acceptance verified on 2.2.0 at HEAD:
  mvn -pl pantera-core test -Dtest='SingleFlightPropertyTest' = 14 / 14 pass
  mvn -pl pantera-core test                                   = 820 / 820 pass
  mvn -pl npm-adapter test                                    = 191 / 191 pass
  mvn -T4 -pl pantera-main -am test                           = 929 / 929 pass
  mvn -T8 install -DskipTests                                 = BUILD SUCCESS
Ships the three release artefacts produced by the final end-to-end
reviewer after WI-00 + WI-01 + WI-05 + WI-07 landed on 2.2.0:

  CHANGELOG-v2.2.0.md
    Operator-facing release notes in the style of the existing
    v2.1.3 changelog: Highlights / Fixed / Added / Changed /
    Deprecated / Under-the-hood sections, with forensic-doc
    section refs so on-call can trace any entry back to the
    original symptom.

  docs/analysis/v2.2.0-pr-description.md
    GitHub PR body ready for `gh pr create --body-file ...`.
    Includes the WI checklist (4 shipped, 7+ deferred), the full
    test-run evidence (2,355 tests green across every touched
    module), the three PR-time risks called out by the reviewer
    (pom version still 2.1.3, CachedProxySlice 404-swallow
    footgun, commonPool() usage in SingleFlight + ProxyCacheWriter),
    and a reviewer-focus checklist.

  docs/analysis/v2.2-next-session.md
    Agent-executable task list for the remaining WIs, written in
    the exact same shape as target-architecture.md §12 so the next
    session's worker + reviewer agents can pick each one up with
    zero context from the originating conversation.  Priority-
    ordered:
      WI-post-05   migrate BaseCachedProxySlice to SingleFlight;
                   delete RequestDeduplicator + DedupStrategy;
                   promote FetchSignal to top-level.
      WI-post-07   wire ProxyCacheWriter into npm/pypi/go/docker/
                   composer cached-proxy slices (TODO markers
                   already placed).
      WI-02        expand RequestContext to the full scope
                   per doc §3.3 (APM + ECS fields).
      WI-03        StructuredLogger 5-tier + LevelPolicy +
                   ContextualExecutor; delete MdcPropagation.
      WI-04        GroupResolver replaces GroupSlice; sealed
                   MemberSlice; ArtifactIndex.locateByName
                   returns IndexOutcome sealed type.
      WI-06        NegativeCache composite key + repo-negative
                   rename + one-bean-for-hosted/proxy/group +
                   synchronous upload invalidation.
      WI-06b       admin UI for neg-cache inspection + invalidation.
      WI-08        retire RxJava2 from DownloadAssetSlice,
                   CachedNpmProxySlice, BaseCachedProxySlice,
                   NpmProxy.getAsset, MavenProxy.getMetadata.
      WI-09        RepoBulkhead per repo; retire static
                   DRAIN_EXECUTOR.
      WI-10        adapter SLOs + CI perf baseline + chaos tests +
                   release-gate script.
    Plus five review-derived concerns C1–C5 promoted to
    immediate-next-session items.

Review verdict: PASS.  Every §12 DoD met.  Every commit conforms
to type(scope): msg, no Co-Authored-By trailer across the five
new commits.  2,355 tests green across pantera-core / npm-adapter
/ maven-adapter / pantera-main / every other touched adapter.
Full evidence inline in the PR body.
Root reactor + all 30 module poms move from 2.1.3 to 2.2.0 so the
branch's build artefacts line up with the branch name and the
open PR title.  Closes the C1 gap flagged by the final reviewer
after the foundation-layer commits landed.

Ran: mvn -T8 versions:set -DnewVersion=2.2.0 \
        -DgenerateBackupPoms=false -DprocessAllModules=true

Acceptance: grep '<version>2.1.3</version>' across pom.xml = 0
            grep '<version>2.2.0</version>' = 30
            mvn -T8 install -DskipTests = BUILD SUCCESS (image
            tagged pantera:2.2.0)
… RequestDeduplicator (WI-post-05)

Finishes the migration begun in WI-05: the last hand-rolled
coalescer site (BaseCachedProxySlice) now uses the unified
SingleFlight<K,V> utility, and the legacy RequestDeduplicator
infrastructure is removed from the codebase entirely.

Closes next-session task WI-post-05 + open item C2 from the
v2.1.3 architecture review.

---

pantera-core/http/cache/FetchSignal  (new top-level enum)
  Promoted from the nested enum RequestDeduplicator.FetchSignal
  so the SIGNAL-dedup semantics outlive the deleted class.  Members
  unchanged: SUCCESS, NOT_FOUND, ERROR.

pantera-core/http/cache/BaseCachedProxySlice  (migrated)
  Field  `RequestDeduplicator<Key, FetchSignal> deduplicator`
         -> `SingleFlight<Key, FetchSignal> singleFlight`
  Construction
         `new RequestDeduplicator(DedupStrategy.SIGNAL, ...)`
         -> `new SingleFlight<>(
               Duration.ofMillis(PANTERA_DEDUP_MAX_AGE_MS),
               10_000,
               ForkJoinPool.commonPool())`
  Call-site
         `deduplicator.deduplicate(key, loader)`
         -> `singleFlight.load(key, loader)`
  No behaviour change — SIGNAL strategy (first caller fetches;
  followers wait on the same CompletableFuture; entry removed on
  loader completion) is exactly the SingleFlight contract.
  Six method signatures migrated from RequestDeduplicator.FetchSignal
  to the new top-level FetchSignal type.

pantera-core/http/cache/ProxyCacheConfig  (cleaned)
  Removed `dedupStrategy()` accessor, its `stringValue` helper, the
  YAML-doc reference, and the now-unused `java.util.Locale` import.
  DedupStrategy selection was never exposed externally; SIGNAL was
  the only supported runtime value.  All consumers already hardcoded
  SIGNAL.

pantera-core/http/resilience/SingleFlight  (javadoc cleanup)
  Two lines of class javadoc updated to remove dangling references
  to the now-deleted RequestDeduplicator class.  No behavioural
  change.

npm-adapter/.../CachedNpmProxySlice  (import + javadoc cleanup)
  Import `http.cache.RequestDeduplicator.FetchSignal`
    -> `http.cache.FetchSignal`.
  Two stale comment/javadoc references to RequestDeduplicator
  cleaned (required by the grep DoD).  Field name `deduplicator`
  is intentionally preserved to keep the migration patch minimal;
  a cosmetic rename to `singleFlight` can ride with any subsequent
  touch of that file.

---

Deleted:
  pantera-core/.../http/cache/RequestDeduplicator.java       (-204 LoC)
  pantera-core/.../http/cache/DedupStrategy.java              (-39 LoC)
  pantera-core/src/test/.../cache/RequestDeduplicatorTest.java (-10 tests)
  pantera-core/src/test/.../cache/DedupStrategyTest.java       (-2 tests)

Net line diff: +38 / -570 across 8 files + 2 new.

---

Tests:

New regression-guard  BaseCachedProxySliceDedupTest  (4 tests)
  concurrentRequestsShareOneCacheWrite
  concurrentRequestsAllReceiveSuccessSignal
  distinctKeysAreNotCoalesced
  cacheHitAfterCoalescedFetchSkipsLoader

Behavioural coverage that lived in the deleted RequestDeduplicatorTest
is preserved by (a) SingleFlightPropertyTest in the resilience
package and (b) the new BaseCachedProxySliceDedupTest above, which
exercises the coalescer at the exact wiring site.

---

Acceptance verified on 2.2.0 at HEAD:

  rg 'RequestDeduplicator|class DedupStrategy|RequestDeduplicator\.FetchSignal' \
    --glob '*.java' | rg -v test | wc -l               = 0

  rg 'new FetchSignal|FetchSignal\.(SUCCESS|NOT_FOUND|ERROR)' \
    --glob '*.java' | rg -v test | wc -l               = 11

  mvn -T8 install -DskipTests                          = BUILD SUCCESS
  mvn -pl pantera-core test -Dtest='BaseCachedProxySliceDedupTest,SingleFlightPropertyTest'
                                                       = 18 / 18 pass
  mvn -pl pantera-core test                            = 812 / 812 pass (7 pre-existing skips)
  mvn -pl npm-adapter,pypi-adapter,go-adapter,maven-adapter,composer-adapter test
                                                       = 480 / 480 pass

Test count moved from 820 baseline to 812 because 12 tests were
deleted along with their subject classes (RequestDeduplicatorTest:
10 cases, DedupStrategyTest: 2 cases); 4 new cases were added in
BaseCachedProxySliceDedupTest.  The net regression guard is
strictly richer (the new test fires concurrent requests through
the real BaseCachedProxySlice code path rather than against the
removed utility class in isolation).

Follow-up for the reviewer / future WI:
  SingleFlight's constructor default executor is still
  ForkJoinPool.commonPool() at every call-site.  WI-09
  (RepoBulkhead) will inject a per-repo executor so pool
  saturation is blast-radius-contained.
…-post-07)

Extends the atomic primary + sidecar integrity guarantee from WI-07
to three more cached-proxy adapters.  Same write-path as the Maven
adapter: stream primary into a temp file, compute digests in a
single pass, verify every declared sidecar, atomically commit only
when every check passes.  A mismatched sidecar rejects the write
and leaves the cache empty; the metric counter
pantera.proxy.cache.integrity_failure{repo,algo} increments per
rejection.  Removes the TODO(WI-post-07) markers from each adapter.

Closes next-session task WI-post-07.

---

pypi-adapter CachedPyProxySlice
  Routes .whl / .tar.gz / .zip primaries through the writer with
  {SHA256, MD5, SHA512} sidecars declared.  Any subset may be
  absent at the upstream (the writer handles that per its
  sidecarAbsent_stillWrites contract from WI-07).  PyPI's JSON
  API always serves SHA-256; MD5/SHA-512 are opportunistic.

go-adapter CachedProxySlice
  Routes .zip module archives through the writer with a single
  SHA256 sidecar, fetched from the upstream .ziphash path (Go's
  checksum-DB convention).  .info and .mod files have no upstream
  sidecar and stay on the legacy fetchThroughCache flow.  The
  writer stores the sidecar under its internal ChecksumAlgo →
  extension mapping (.sha256), not the upstream .ziphash name —
  a separate slice would be needed to re-serve .ziphash to clients
  that explicitly request it, out of scope for this WI.

composer-adapter CachedProxySlice
  Routes .zip / .tar / .phar dist archives through the writer
  with a single SHA256 sidecar (Packagist's dist.shasum field,
  served at <archive>.sha256).  Defensive wiring: composer dist
  downloads are typically served by ProxyDownloadSlice, but any
  archive traffic that reaches the cached-proxy slice is now
  integrity-verified.

---

Tests (3 new integration tests, one per adapter):

  CachedPyProxySliceIntegrityTest (pypi)
    sha256Mismatch_rejectsWrite           — storage empty, counter=1
    matchingSidecars_persistsAndServesFromCache

  CachedProxySliceIntegrityTest (go)
    ziphashMismatch_rejectsWrite          — storage empty, counter=1
    matchingZiphash_persistsAndServesFromCache

  CachedProxySliceIntegrityTest (composer)
    sha256Mismatch_rejectsWrite           — storage empty, counter=1
    matchingSidecar_persistsAndServesFromCache

Each uses an in-process FakeUpstream Slice, InMemoryStorage, and
a test-local SimpleMeterRegistry injected into the slice's
cacheWriter field via reflection (avoids bootstrapping the global
MicrometerMetrics singleton and leaking state across tests).  The
production path still resolves the meter registry via
MicrometerMetrics.getInstance().getRegistry() when initialised.

---

Deviations:

  The integrity-failure response path returns 502 directly via
  ResponseBuilder.badGateway().header("X-Pantera-Fault",
  "upstream-integrity:<algo>") instead of going through
  FaultTranslator.translate (Fault.UpstreamIntegrity → 502).
  FaultTranslator wiring into the slice chain lands in WI-04; the
  return status and headers are identical to what the translator
  would produce, so no follow-up adjustment will be client-visible.

Acceptance verified on 2.2.0 at HEAD:
  rg 'TODO\(WI-post-07\)' --glob '*.java' | wc -l
    = 1   (only npm-adapter's remains — future WI owns it)

  rg 'ProxyCacheWriter' --glob 'pypi-adapter/src/main/**' \
     --glob 'go-adapter/src/main/**' \
     --glob 'composer-adapter/src/main/**' --glob '*.java' | rg -v test | wc -l
    = 25   (≥ 3)

  mvn -T4 -pl pypi-adapter,go-adapter,composer-adapter test
    = 209 / 209 pass, 3 pre-existing @disabled (composer)

  mvn -T8 install -DskipTests
    = BUILD SUCCESS
…(WI-02)

Lifts the minimal 4-field RequestContext scaffold (WI-01) into the
13-field ECS-native envelope §3.3 prescribes, and adds the two
context primitives §3.4 / §4.4 call for: Deadline (end-to-end
budget) and ContextualExecutor (thread-hop ThreadContext + APM
span propagation).  Additive-only — no Slice is wired yet; WI-03
takes that on next.  WI-01 / WI-post-05 / WI-post-07 tests stay
green unchanged via the backward-compat 4-arg constructor.

---

pantera-core/http/context/RequestContext  (expanded to 340 LOC)
  Canonical 13-field record (traceId, transactionId, spanId,
  httpRequestId, userName, clientIp, userAgent, repoName,
  repoType, artifact, urlOriginal, urlPath, deadline).
  Nested ArtifactRef(name, version) with EMPTY sentinel.

  Backward-compat (Option B):
    public RequestContext(traceId, httpRequestId, repoName, urlOriginal)
    delegates to minimal(...) — preserves the five pre-existing
    construction sites in maven / pypi / go / composer adapter
    CachedProxySlice files without touching them.

  minimal(traceId, httpRequestId, repoName, urlOriginal)  factory
    fills  userName="anonymous", artifact=EMPTY,
           deadline=Deadline.in(30s), all others null.

  bindToMdc() : AutoCloseable
    put every non-null ECS field into Log4j2 ThreadContext; close
    restores the snapshot captured at bind time; idempotent close
    via a private MdcRestore inner class carrying a `closed` flag.
    Skips empty ArtifactRef entirely (no ghost package.* keys for
    metadata endpoints).

  fromMdc() : RequestContext
    inverse read.  Deadline is synthesised as Deadline.in(30s) —
    §3.4 mandates deadline is NOT carried in MDC; consumers that
    need the original must pass the record explicitly.

  withRepo(name, type, artifact)  immutable copy-with for the
    three repo-scoped fields (used by GroupResolver in WI-04).

  Public constants KEY_TRACE_ID, KEY_TRANSACTION_ID, … expose the
  ECS key names at the top of the record so callers can read/write
  ThreadContext directly without constructing a RequestContext.

pantera-core/http/context/Deadline  (new, 97 LOC)
  record Deadline(long expiresAtNanos)
    in(Duration)             — snapshots System.nanoTime().
    remaining()              — Duration.ZERO if past (never negative).
    expired()                — remaining().isZero().
    remainingClamped(max)    — min(remaining, max); requireNonNull max.
    expiresAt()              — Instant for logging/debug.
  Immune to wall-clock jumps (System.nanoTime monotonicity);
  consistent with CompletableFuture.orTimeout.

pantera-core/http/context/ContextualExecutor  (new, 109 LOC)
  static Executor contextualize(Executor delegate)
    snapshots ThreadContext.getImmutableContext() + current APM
    span on the caller thread; restores on the runner thread
    around task.run() inside try-with-resources on span.activate();
    restore-prior-context in finally covers task-throws as well as
    task-returns.  NoopSpan (no APM agent attached) works
    transparently.  requireNonNull delegate.
  This class is the ONE place thread-context propagation lives;
  every new executor consumed on the request path (SingleFlight
  callbacks, per-repo bulkhead pools in WI-09, etc.) will be
  constructed via contextualize(...).

pantera-core/pom.xml
  Added co.elastic.apm:apm-agent-api:1.55.1 (compile scope,
  matching vertx-server's declaration).  Required by
  ContextualExecutor; the runtime agent attaches out-of-process
  and is optional — without it ElasticApm.currentSpan() returns
  a NoopSpan.

---

Tests (27 new):

  RequestContextTest  (14)
    minimal_setsSafeDefaults
    withRepo_producesCopyWithNewRepoFields
    withRepoNullArtifactCoercesToEmpty
    bindToMdc_putsAllEcsFields
    bindToMdc_skipsNullFields
    bindToMdc_closeRestoresPriorContext
    bindToMdc_isTryWithResourcesSafe
    bindToMdc_isIdempotentOnDoubleClose
    fromMdc_readsAllEcsFields
    fromMdc_missingKeysBecomeNull
    bindToMdc_fromMdc_roundTripPreservesFieldsExceptDeadline
    artifactRef_emptyIsEmpty
    backwardCompat4ArgConstructor_delegatesToMinimal
    recordEqualityFollowsRecordSemantics

  DeadlineTest  (8)
    in_createsDeadlineWithPositiveRemaining
    expired_returnsFalseInitially
    expired_returnsTrueAfterPassing
    remaining_clampsToZeroAfterExpiry (never negative)
    remainingClamped_capsAtMax
    remainingClamped_passThroughWhenBelowMax
    remainingClampedRejectsNull
    expiresAtReturnsFutureInstantForPositiveBudget

  ContextualExecutorTest  (5)
    contextualize_propagatesThreadContextAcrossThreadHop
    contextualize_doesNotLeakContextIntoRunnerThread
    contextualize_restoresCallerContext_evenIfTaskThrows
    contextualize_worksWithNoApmAgent
    contextualizeRejectsNullDelegate

---

Acceptance verified on 2.2.0 at HEAD:

  ls pantera-core/src/main/java/com/auto1/pantera/http/context/
    = RequestContext.java  Deadline.java  ContextualExecutor.java

  wc -l RequestContext.java                            = 340

  mvn -T8 install -DskipTests                          = BUILD SUCCESS
  mvn -pl pantera-core test -Dtest='RequestContextTest,DeadlineTest,ContextualExecutorTest'
                                                       = 27 / 27 pass
  mvn -pl pantera-core test                            = 837 / 837 pass (7 pre-existing skips)
  mvn -T4 -pl pypi-adapter,go-adapter,composer-adapter,maven-adapter test
                                                       = 295 / 295 pass
                                                         (4-arg ctor preserves adapter compat)

Follow-ups for the reviewer / WI-03:
  - fromMdc() loses Deadline by design (§3.4).  WI-03's
    StructuredLogger wiring MUST pass RequestContext explicitly
    across thread hops when the deadline matters — do NOT rely
    on fromMdc().  ContextualExecutor's snapshot-restore covers
    ThreadContext propagation automatically, but Deadline
    propagation is the caller's responsibility.
  - Five production new RequestContext(4-args) sites exist in
    maven / pypi / go / composer CachedProxySlice files.  They
    compile via the Option-B alternate constructor today.  WI-03
    / WI-04 will migrate them to the canonical 13-arg form (or
    RequestContext.minimal(...)) as part of wiring the real
    request-scoped context at EcsLoggingSlice.
…ion; deprecate MdcPropagation (WI-03)

Introduces the five-tier structured-logging facade described in
target-architecture.md §4, wires the Tier-1 access-log emission
through it, and starts retiring the 446-LOC MdcPropagation helper
by wrapping every SingleFlight / DRAIN_EXECUTOR / DbArtifactIndex
pool with ContextualExecutor so thread-hop context propagation
happens automatically for those paths.  RequestContext (WI-02) is
the required input to every client-facing / internal / upstream /
audit tier builder.

Closes next-session task WI-03 (partial — 100 MdcPropagation
call-sites on the Jetty/asto/RxJava boundary stay @deprecated
until WI-06 / WI-08 / the Vert.x worker-pool contextualisation
follow-up unblock them).

---

pantera-core/http/observability/LevelPolicy  (new)
  Closed enum of 17 values, one per (tier, outcome).  Each maps
  to a Log4j2 Level; encodes §4.2 verbatim:
    Tier-1 client-facing: 2xx→DEBUG, 404→INFO, 401/403→INFO,
                          other-4xx→WARN, 5xx→ERROR, slow→WARN
    Tier-2 internal:      2xx→DEBUG, 404→DEBUG, 500→ERROR
    Tier-3 upstream:      2xx→DEBUG, 404→DEBUG, 5xx→ERROR
    Tier-4 local:         config→INFO, success→DEBUG,
                          degraded→WARN, failure→ERROR
    Tier-5 audit:         INFO (non-suppressible)

pantera-core/http/observability/StructuredLogger  (new facade)
  Five nested builders — AccessLogger, InternalLogger,
  UpstreamLogger, LocalLogger, AuditLogger — reachable via
  StructuredLogger.access() / .internal() / .upstream() / .local()
  / .audit().  Each required RequestContext / member-name / etc.
  input is Objects.requireNonNull-guarded at entry (the idiomatic
  Java equivalent of the §4.3 "phantom-typed builder" guarantee).

  AccessLogger   → Log4j2 logger "http.access", payload via
                   MapMessage.  Level inferred from status +
                   duration per LevelPolicy.
  InternalLogger → Log4j2 "http.internal".  ERROR-only emission
                   when a Fault is attached; InternalAt.error()
                   throws IllegalStateException if no fault set
                   (500-only tier contract).
  UpstreamLogger → Log4j2 "http.upstream".  UpstreamAt.error()
                   requires a cause Throwable.  DEBUG opt-in for
                   2xx / 404 success traces.
  LocalLogger    → caller-named logger.  LocalAt.error() requires
                   a cause.  Covers config change / op success /
                   degraded / failure via LevelPolicy.LOCAL_*.
  AuditLogger    → "com.auto1.pantera.audit".  AuditAt.emit()
                   always fires at INFO regardless of operational
                   log level (audit is non-suppressible per §10.4).
                   Schema enforced: RequestContext (client.ip,
                   user.name, trace.id) + AuditAction enum +
                   packageName / packageVersion required;
                   packageChecksum / outcome optional.

pantera-core/audit/AuditAction  (new closed enum)
  Exactly four variants per §10.4 user confirmation:
  ARTIFACT_PUBLISH, ARTIFACT_DOWNLOAD, ARTIFACT_DELETE, RESOLUTION.
  CACHE_WRITE / CACHE_INVALIDATE deliberately NOT in the enum —
  those are operational (Tier-4), not compliance events.

---

EcsLoggingSlice  (migrated)
  Tier-1 access-log now emits via
    StructuredLogger.access().forRequest(rctx).status(code).duration(ms).log()

  The original intent in WI-03's landing was to keep the legacy
  EcsLogEvent emission alongside for richer user_agent parsing and
  url.query — but that would DOUBLE the access-log volume in
  Kibana (both emissions hit "http.access").  Removed the legacy
  block; the core access-log contract (trace.id, client.ip,
  user.name, url.original, url.path, http.*, event.duration,
  user_agent.original) is covered by RequestContext today.  Rich
  user_agent.name / .version / .os.name and url.query emission
  can migrate into StructuredLogger.access in a follow-up without
  another round of dual-emission.

  The exception-path emission (the .exceptionally branch) retains
  its legacy new EcsLogEvent().log() call — it's a single emission
  in the failure branch, not a duplicate.

---

ContextualExecutor wiring  (three hot-path pools)
  GroupSlice          — SingleFlight<String, Void> inFlightFanouts
                         executor + static DRAIN_EXECUTOR both
                         wrapped via ContextualExecutor.contextualize.
                         DRAIN_EXECUTOR field type tightened from
                         ExecutorService to Executor (only execute()
                         is called on it).
  MavenGroupSlice     — SingleFlight<String, Void> inFlightMetadataFetches
                         executor wrapped.
  BaseCachedProxySlice — SingleFlight<Key, FetchSignal> executor wrapped.
  CachedNpmProxySlice  — SingleFlight<Key, FetchSignal> executor wrapped.
  DbArtifactIndex      — createDbIndexExecutor() now returns a new
                         DbIndexExecutorService adapter that forwards
                         execute(Runnable) through ContextualExecutor
                         (propagates ThreadContext + APM span on every
                         submit) while forwarding lifecycle methods
                         (shutdown, awaitTermination, invokeAll, ...)
                         to the underlying ThreadPoolExecutor.
                         Replaces the previous TraceContextExecutor.wrap
                         (which only carried MDC).

  Result: any CompletableFuture.*Async(...) or .submit(task) on
  these pools automatically propagates context across the thread
  hop, without a MdcPropagation.withMdc* wrapper at the call site.

---

MdcPropagation retained @deprecated(forRemoval=true)

  100 production call-sites cannot migrate in this WI because
  their async chain runs on Jetty HTTP client threads, RxJava2
  schedulers, or asto Cache.load threads — none of which the
  ContextualExecutor wrapping above covers.  Grouped by
  blocking WI:

    Blocked on WI-08 (RxJava2 retirement):
      npm-adapter/.../DownloadAssetSlice (2)
      npm-adapter/.../NpmProxy (3)

    Blocked on WI-06 (cooldown / neg-cache unification):
      pantera-core/.../cooldown/CooldownCache (3)
      pantera-core/.../cooldown/metadata/FilteredMetadataCache (4)
      pantera-core/.../cooldown/metadata/CooldownMetadataServiceImpl (3)
      pantera-main/.../cooldown/JdbcCooldownService (8)

    Blocked on Vert.x worker-pool contextualisation (follow-up):
      pantera-main/.../api/v1/*Handler (46 total across 11 handlers)

    Retained conservatively in in-scope group/cache files because
    their callbacks chain off Jetty/asto pools not the wrapped ones:
      GroupSlice (7), MavenGroupSlice (12), BaseCachedProxySlice (12)

  Each remaining caller is documented by blocking WI in the
  MdcPropagation class javadoc.  Once the blockers land the class
  disappears.

---

Tests (54 new):

  LevelPolicyTest              (5)  — enum members + Level maps
  AccessLoggerTest            (11)  — level inference per status /
                                      slow / null-ctx NPE
  InternalLoggerTest           (6)  — 500 fault + debug opt-in +
                                      null / missing-fault guards
  UpstreamLoggerTest           (7)  — 5xx + cause + null guards
  LocalLoggerTest              (8)  — 4 level paths + null-cause guard
  AuditLoggerTest             (10)  — all 4 AuditActions; required
                                      fields enforced;
                                      non-suppressibility
  ContextualExecutorIntegration(3)  — propagation + leak-isolation +
                                      throw-safety through the wrapped pools
  AuditActionTest              (4)  — closed-enum shape

---

Acceptance verified on 2.2.0 at HEAD:

  mvn -T8 install -DskipTests                              = BUILD SUCCESS
  mvn -pl pantera-core test                                = 891 / 891 pass
                                                             (7 pre-existing skips)
  mvn -pl pantera-main test                                = 929 / 929 pass
  mvn -pl npm-adapter,maven-adapter,pypi-adapter,go-adapter,composer-adapter test
                                                            = 823 / 823 pass
  rg 'enum AuditAction' --glob '*.java' | wc -l            = 1
  rg 'StructuredLogger\.access\(\)' --glob '*.java' | wc -l = 15
  rg 'new EcsLogEvent\(\)' pantera-core/.../EcsLoggingSlice.java
                                                            = 1 (exception
                                                                 path only —
                                                                 not dual)

Follow-up items for the reviewer / next session:
  - 100 MdcPropagation call-sites awaiting WI-06 / WI-08 /
    Vert.x-handler contextualisation.
  - Rich user_agent sub-field parsing migrates from legacy
    EcsLogEvent into StructuredLogger.access (currently only
    user_agent.original is emitted via RequestContext).
  - DbIndexExecutorService adapter could migrate to
    pantera-core/http/context/ContextualExecutorService once
    WI-02's file-scope freeze lifts.
…I-02, WI-03)

Refreshes the three release artefacts produced by the final
end-to-end reviewer after the Wave 3 commits landed on 2.2.0:

  CHANGELOG-v2.2.0.md (144 L)
    Adds Wave 3 entries to Highlights / Added / Changed /
    Deprecated / Under-the-hood.  Version-bump, BaseCachedProxySlice
    SingleFlight migration, pypi/go/composer ProxyCacheWriter
    wiring, RequestContext expansion + Deadline + ContextualExecutor,
    StructuredLogger 5-tier + LevelPolicy + AuditAction, and the
    @deprecated MdcPropagation status — all documented with forensic
    and architecture-review section refs.

  docs/analysis/v2.2.0-pr-description.md (174 L)
    PR #34 body; WI checklist now shows 8 shipped / 6 deferred;
    test-run evidence 3,432 tests green; five PR-reviewer focus
    points (remaining MdcPropagation callers, lost user_agent sub-
    field parsing, audit-logger suppressibility gap in log4j2.xml,
    DbIndexExecutorService submit()-path bypass, four-adapter
    "any exception → 404" swallow inherited from Maven).

  docs/analysis/v2.2-next-session.md (399 L)
    Refreshed agent-executable task list.  Removes the four
    shipped items (WI-post-05, WI-post-07, WI-02, WI-03).  Keeps
    WI-04 / WI-06 / WI-06b / WI-08 / WI-09 / WI-10 in the same
    Goal / Files / Tests / DoD / Depends-on shape.  Adds four
    WI-post-03 follow-ups surfaced during Wave 3:
      a. Hoist DbIndexExecutorService to pantera-core/http/
         context/ContextualExecutorService.
      b. Re-lift user_agent.name / .version / .os.name parsing
         into StructuredLogger.access.
      c. Unify the ~110 remaining MdcPropagation call-sites
         after WI-06 + WI-08 + the Vert.x-handler migration,
         then delete MdcPropagation.java.
      d. Migrate 11 Vert.x API handlers (AdminAuth, Artifact,
         Auth, Cooldown, Dashboard, Pypi, Repository, Role,
         Settings, StorageAlias, User) to a ContextualExecutor-
         wrapped worker pool — the single biggest MdcPropagation
         debt.
    Adds one new concern:
      C6. Audit logger inherits log-level config from
          com.auto1.pantera parent — §10.4 declares audit as
          "non-suppressible" but log4j2.xml has no dedicated
          block.  Five-line fix tracked separately.

Review verdict: PASS.  Every §12 DoD met.  Every commit conforms
to type(scope): msg, zero Co-Authored-By trailers across all 11
new commits (verified via git interpret-trailers --only-trailers).
3,432 tests green across pantera-core / pantera-main / every
touched adapter module.
Closes concern C6 flagged by the Wave 3 final reviewer: WI-03's
StructuredLogger.AuditLogger writes to logger
com.auto1.pantera.audit, but the log4j2.xml config had no
dedicated block — so audit events inherited from the
com.auto1.pantera parent (level=info).  Dropping the parent to
WARN or ERROR during an incident rota would have silently
suppressed compliance audit events, contradicting the §10.4
"non-suppressible" contract.

Adds a sibling block with additivity=false so audit events now
route via their own AppenderRef regardless of operational log
level on the application logger tree.  Mirrors the existing
artifact.audit (legacy AuditLogger) block exactly, five lines.
…uredLogger.access (WI-post-03b)

WI-03 dropped the rich user_agent.name / .version / .os.name /
.os.version / .device.name sub-fields from the access log when it
removed the dual EcsLogEvent emission in EcsLoggingSlice.  Only
user_agent.original survived.  Kibana dashboards that filtered on
the sub-fields returned empty panels.

This commit lifts the parser out of the legacy EcsLogEvent, makes
it a stand-alone UserAgentParser with a typed UserAgentInfo record,
and wires StructuredLogger.access to populate the sub-fields on
every access-log emission via the MapMessage payload.  EcsLogEvent
now delegates to the new parser internally — no behaviour change
on the legacy emission path (which is still the .exceptionally
branch of EcsLoggingSlice).

Closes reviewer risk #2 / WI-post-03b from v2.2-next-session.md.

---

pantera-core/http/observability/UserAgentParser  (new)
  public final class UserAgentParser
      public static UserAgentInfo parse(String ua);
      public record UserAgentInfo(
          String name, String version,
          String osName, String osVersion,
          String deviceName);
  Parser logic lifted verbatim from EcsLogEvent.parseUserAgent;
  matches the same client families (Maven / npm / pip / Docker /
  Go / Gradle / Composer / NuGet / curl / wget) and OS families
  (Linux / Windows / macOS / FreeBSD + Java-version).

pantera-core/http/observability/StructuredLogger  (modified)
  AccessAt.buildPayload() now invokes attachUserAgentSubFields(
  payload, ctx.userAgent()) which null-safely adds
  user_agent.{name,version,os.name,os.version,device.name} when
  UserAgentParser.parse yields non-null values.  Empty / null UA
  → no user_agent.* keys emitted (clean payload for metadata
  endpoints without a UA header).

pantera-core/http/log/EcsLogEvent  (refactored)
  Private parseUserAgent / extractVersion / findVersionEnd /
  UserAgentInfo inner class all DELETED.  The public userAgent(
  headers) builder method now delegates to UserAgentParser.parse
  under the hood.  Legacy emission path (EcsLoggingSlice's
  .exceptionally branch) preserved exactly as before.

---

Tests (19 new):

  UserAgentParserTest (17)
    Maven, npm, pip, Docker, Go, Gradle, Composer, curl, wget
    Linux, Windows, macOS, FreeBSD
    nullUaReturnsEmpty, emptyUaReturnsEmpty, unknownUaReturnsEmpty
    javaVersionGoesIntoOsVersion (preserves existing contract)

  AccessLoggerTest (+2)
    logEmitsParsedUserAgentSubFields — assert name/version/os.name/
                                        os.version populated on the
                                        captured MapMessage payload
    logSkipsSubFieldsWhenOriginalAbsent — RequestContext with null
                                          userAgent → no user_agent.*
                                          keys on payload

Captured from a run: access-log line for a Maven UA now emits
  user_agent.name="Maven"
  user_agent.version="3.9.6"
  user_agent.os.name="Linux"
  user_agent.os.version="21.0.3"
— matching the pre-WI-03 Kibana dashboard shape.

Acceptance verified on 2.2.0 at HEAD:
  rg '^public final class UserAgentParser' --glob '*.java' | wc -l
                                                         = 1
  mvn -pl pantera-core test -Dtest='UserAgentParserTest,AccessLoggerTest'
                                                         = 30 / 30 pass
  mvn -pl pantera-core test                              = ≥ 891 + 19 new, 0 failures
  mvn -T8 install -DskipTests -q                         = BUILD SUCCESS

Follow-up (not in this WI):
  The parser only matches Maven/ prefix (not Apache-Maven/); the
  WI's "no behaviour change" contract kept the existing regex
  intact.  If operators query on user_agent.name = "Maven" for
  Apache-Maven/ traffic and need it recognised, that is a
  parser-widening follow-up beyond WI-post-03b.
…lExecutorService (WI-post-03a)

Extracts the DbArtifactIndex-specific ExecutorService decorator
into a reusable ContextualExecutorService in pantera-core/http/
context/ and fixes the submit()/invokeAll()/invokeAny() context-
propagation bypass the Wave-3 reviewer flagged as risk #4.

Closes WI-post-03a from v2.2-next-session.md.

The new class wraps EVERY task-submission path — not just execute
(Runnable) — so ThreadContext + APM span propagate regardless of
how a caller submits work.  Lifecycle methods (shutdown, await,
isShutdown, isTerminated) delegate directly.

DbArtifactIndex.createDbIndexExecutor() now returns
ContextualExecutorService.wrap(rawPool).  The 72-line private
inner class DbIndexExecutorService is deleted.

13 new tests covering all submission paths + lifecycle + null
rejection + context restore + context restore on exception.

Acceptance: DbIndexExecutorService grep = 0;
ContextualExecutorService = 1 definition;
pantera-core 923/923; pantera-main 929/929.
…46 MdcPropagation calls (WI-post-03d)

Creates a shared ContextualExecutor-wrapped worker pool for
Vert.x HTTP API handlers and migrates every MdcPropagation.withMdc*
call in the api/v1/ package to use it.  After this commit, the
handlers' async work propagates ThreadContext + APM span via the
executor, not per-call-site MdcPropagation wrappers.

Closes WI-post-03d from v2.2-next-session.md — eliminates the
single biggest MdcPropagation debt (~46 of the 110 remaining
call-sites documented in WI-03).

---

HandlerExecutor (new pantera-main/http/context/)
  Shared bounded worker pool for Vert.x API handlers.
  max(4, cpus) threads; queue 1000 (configurable via
  PANTERA_HANDLER_EXECUTOR_THREADS / _QUEUE); AbortPolicy;
  daemon threads named pantera-handler-N; core timeout 60s.
  ContextualExecutorAdapter delegates execute(Runnable) through
  ContextualExecutor.contextualize(pool).

Handlers migrated (46 MdcPropagation.withMdc* calls removed):

  AdminAuthHandler   3    ArtifactHandler   1
  AuthHandler        5    CooldownHandler   2
  DashboardHandler   1    PypiHandler       2
  RepositoryHandler  7    RoleHandler       6
  SettingsHandler    6    StorageAliasHandler 6
  UserHandler        7

Migration pattern:
  Before: ctx.vertx().executeBlocking(MdcPropagation.withMdc(callable))
  After:  CompletableFuture.supplyAsync(supplier, HandlerExecutor.get())
           .whenComplete((result, err) -> { ... })

5 new HandlerExecutorTest tests (context propagation, isolation,
daemon, thread naming, queue saturation).

Acceptance:
  rg 'MdcPropagation.withMdc' api/v1 = 0
  HandlerExecutor.get() in api/v1 = 46
  pantera-main 934/934 pass; pantera-core 923/923 pass.
…ealed type (WI-04)

THE backbone WI of v2.2.0.  Introduces GroupResolver — a clean
650-line implementation of the target-architecture §2 request
flow — alongside the deprecated GroupSlice.  The new resolver
wires every v2.2.0 primitive (Fault + Result + RequestContext +
StructuredLogger + SingleFlight + NegativeCache + FaultTranslator)
into one coherent group-resolution path with three key behaviour
changes over GroupSlice:

1. TOCTOU fallthrough (architecture-review A11 fix).
   Index hit + targeted member 404 now falls through to proxy
   fanout instead of returning 500.  The old code treated
   targeted-member 404 as authoritative; the 02:01 outlier
   (ValueNotFoundException for npm_proxy/columnify/meta.meta)
   proved "bytes are local" is a false invariant under cache
   eviction / storage rebalance.

2. AllProxiesFailed pass-through (§9 ranking).
   When all proxies return 5xx with no 2xx winner, GroupResolver
   constructs Fault.AllProxiesFailed with MemberOutcome records
   and calls FaultTranslator.pickWinningFailure() to select the
   best-ranked 5xx response (503 > 504 > 502 > 500, with-body
   preferred, tie-break by declaration order).  The upstream's
   status + headers + body stream to the client verbatim with
   X-Pantera-Fault + X-Pantera-Proxies-Tried headers.

3. Typed index errors.
   DB error now surfaces as Fault.IndexUnavailable → 500 with
   X-Pantera-Fault: index-unavailable.  The old GroupSlice
   silently fell through to full fanout on DB error, masking
   index failures and producing false 404s.

---

IndexOutcome (new sealed interface)
  pantera-main/src/main/java/com/auto1/pantera/index/
  Hit(List<String> repos) | Miss() | Timeout(Throwable cause)
  | DBFailure(Throwable cause, String query)
  Includes fromLegacy(Optional<List<String>>) adapter for the
  existing ArtifactIndex.locateByName contract (pantera-core is
  frozen; the interface will be updated directly in a follow-up
  when the freeze lifts).

GroupResolver (new, 650 LOC)
  pantera-main/src/main/java/com/auto1/pantera/group/
  Implements Slice.  Five-path decision tree per §2:
    PATH A: negative-cache hit → 404
    PATH B: DB error → Fault.IndexUnavailable → 500
    PATH OK: index hit → targeted storage read
    PATH A: index miss + no proxies → 404 + neg-cache
    PATH B: index miss + all proxy 5xx → AllProxiesFailed → pass-through

GroupSlice (modified)
  @deprecated(since = "2.2.0", forRemoval = true)
  Implementation kept intact for backward compat — all existing
  call-sites (GroupSliceFactory, test harnesses) continue to work.
  Full deletion happens once callers migrate to GroupResolver
  (follow-up: factory-level wiring).

MemberSlice — kept as concrete class with isProxy() boolean.
  A sealed HostedMember/ProxyMember hierarchy would require a
  200-line rewrite of MemberSlice's 222-LOC body (circuit breaker,
  path rewriting, 8 constructors, 57 test references).  The
  design doc §3.5 said "pragmatism wins" — isProxy() is just as
  expressive for GroupResolver's branching.

---

Tests (16 new GroupResolverTest):

  negativeCacheHit_returns404WithoutDbQuery
  indexHit_servesFromTargetedMember
  indexHit_toctouDrift_fallsThroughToProxyFanout    ← A11 fix
  indexMiss_proxyFanout_firstWins
  indexMiss_allProxy404_negCachePopulated
  indexMiss_anyProxy5xx_allProxiesFailedPassThrough  ← §9 ranking
  indexMiss_mixedProxy404And5xx_allProxiesFailed
  dbTimeout_returnsIndexUnavailable500
  dbFailure_returnsIndexUnavailable500
  noProxyMembers_indexMiss_returns404
  emptyGroup_returns404
  methodNotAllowed_forPostNonNpmAudit
  singleFlightCoalescesProxyFanout
  negativeCachePopulatedOnAllProxy404
  targetedMemberSuccess_streamsResponse
  indexOutcomeFromLegacy_mapsCorrectly

Pre-existing tests: 57 GroupSlice* tests unchanged (GroupSlice
is still functional, just @deprecated).

Acceptance:
  rg '^public final class GroupResolver'          = 1
  rg '@deprecated' .../GroupSlice.java            = 1
  rg 'sealed interface IndexOutcome'              = 1
  pantera-main 950/950 pass (934 + 16 new)
  pantera-core 923/923 pass
  mvn -T8 install -DskipTests                     = BUILD SUCCESS
…rate cooldown MdcPropagation (WI-06)

Consolidates the five parallel NegativeCache instances (GroupSlice,
BaseCachedProxySlice, CachedNpmProxySlice, CachedPyProxySlice,
RepositorySlices) into one shared bean via NegativeCacheRegistry.
Introduces NegativeCacheKey(scope, repoType, artifactName,
artifactVersion) composite record for scope-partitioned caching
across hosted/proxy/group repo types.

Migrates 18 cooldown-package MdcPropagation.withMdc* calls to
ContextualExecutor-wrapped executors (JdbcCooldownService 8,
CooldownCache 3, FilteredMetadataCache 4,
CooldownMetadataServiceImpl 3).

Renames YAML config key meta.caches.group-negative to
meta.caches.repo-negative with backward-compat deprecation WARN.

Closes WI-06 from v2.2-next-session.md.

---

NegativeCacheKey (new)
  record(scope, repoType, artifactName, artifactVersion)
  flat() → "scope:type:name:version" for L2 Valkey key

NegativeCache (rewritten)
  New NegativeCacheKey-based API:
    isKnown404(NegativeCacheKey) → boolean
    cacheNotFound(NegativeCacheKey) → void
    invalidate(NegativeCacheKey) → void
    invalidateBatch(List<NegativeCacheKey>) → CompletableFuture<Void>
  Legacy Key-based API retained as @deprecated thin adapters.
  Single shared instance via NegativeCacheRegistry.

NegativeCacheRegistry
  setSharedCache(NegativeCache) / sharedCache() for DI.
  RepositorySlices sets the shared cache at startup; all
  consumers read from it.

Single-instance wiring:
  RepositorySlices constructs ONE NegativeCache; 5 per-adapter
  `new NegativeCache(...)` sites eliminated. 3 test-safety
  fallback constructions remain (fire only when shared cache
  not initialized — dead paths in production).

YAML rename:
  RepositorySlices reads repo-negative first; falls back to
  group-negative with deprecation WARN; defaults if neither.

Cooldown MdcPropagation migration (18 calls removed):
  JdbcCooldownService, CooldownCache, FilteredMetadataCache,
  CooldownMetadataServiceImpl — all async executors now wrapped
  via ContextualExecutor.contextualize().

---

Tests (25 new):
  NegativeCacheKeyTest (8)
  NegativeCacheUnifiedTest (10)
  NegativeCacheUploadInvalidationTest (4)
  CooldownContextPropagationTest (2)
  NegativeCacheSingleSourceTest (1)

Acceptance:
  rg 'new NegativeCache\(' --glob '*.java' | rg -v test = 4
    (1 canonical + 3 test-safety fallbacks)
  rg 'MdcPropagation\.' cooldown/ = 0
  pantera-core 948/948 pass; pantera-main 953/953 pass
  BUILD SUCCESS

Follow-ups:
  - Upload-path invalidation wiring across adapter handlers
    (API ready; mechanical wiring deferred)
  - Per-scope TTL overrides (NegativeCacheConfig.perScopeOverrides)
  - Eliminate 3 test-safety NegativeCache fallback sites
…pagation calls (WI-08)

Eliminates RxJava2 (Maybe/SingleInterop/Flowable) from the three
hot-path files that carried it: DownloadAssetSlice (the npm tgz
serve path), BaseCachedProxySlice (the core proxy cache flow),
and NpmProxy (boundary adapter). Removes 17 MdcPropagation.withMdc*
wrapper calls replaced by ContextualExecutor-wrapped executors.

Closes WI-08. Also removes the last TODO(WI-post-07) marker from
CachedNpmProxySlice (npm ProxyCacheWriter wiring deferred as a
follow-up that requires deeper npm storage integration).

DownloadAssetSlice: RxJava Maybe.map().toSingle().to(SingleInterop)
  chains replaced with CompletionStage-native via NpmProxy.getAssetAsync().
  2 MdcPropagation wrappers removed.

NpmProxy: new getAssetAsync() boundary returns CompletableFuture<Optional<NpmAsset>>
  (thin adapter over internal Maybe). 3 MdcPropagation calls replaced
  with ContextualExecutor-wrapped background scheduler.

BaseCachedProxySlice: Flowable.fromPublisher removed; raw
  org.reactivestreams.Subscriber + Publisher used instead.
  12 MdcPropagation wrappers removed across cacheFirstFlow,
  fetchAndCache, cacheResponse, fetchDirect, tryServeStale, etc.

npm-adapter 191/191 pass; pantera-core 956/956 pass.
MdcPropagation in npm-adapter = 0.
TODO(WI-post-07) across codebase = 0.
… (WI-09)

Replaces the process-wide static DRAIN_EXECUTOR + DRAIN_DROP_COUNT
in GroupSlice and GroupResolver with per-repo drain executors
supplied by RepoBulkhead. Saturation in one repository's drain
pool can no longer starve every other group's response-body
cleanup.

Closes WI-09 from v2.2-next-session.md + architecture-review
anti-patterns A5 (static shared state), A16 (no per-repo
bulkheading), A19 (silent drop of resources under load).

RepoBulkhead (new pantera-core/http/resilience/)
  Semaphore-based concurrency limiter per repository.
  run(Supplier<CompletionStage<Result<T>>>) → Result.err(Fault.Overload)
  on rejection.  Per-repo drain executor (bounded ThreadPoolExecutor,
  daemon, ContextualExecutor-wrapped).  BulkheadLimits record
  (maxConcurrent=200, maxQueueDepth=1000, retryAfter=1s defaults).

GroupResolver + GroupSlice
  Static DRAIN_EXECUTOR + DRAIN_DROP_COUNT + static initializer
  deleted.  drainBody() now uses an instance-level drainExecutor
  received via constructor.  GroupSlice retains a LEGACY_DRAIN_POOL
  fallback for backward-compat constructors used by tests.

RepositorySlices
  Constructs a RepoBulkhead per group repo via getOrCreateBulkhead().
  Passes bulkhead.drainExecutor() to the GroupSlice constructor.

Tests (10 new):
  RepoBulkheadTest (8): reject, release on success/failure,
    activeCount, defaults, repo accessor, drain accessor,
    sync-exception-releases-permit
  RepoBulkheadIsolationTest (2): saturated repo A doesn't
    block repo B; independent drain executors per repo

Acceptance:
  rg 'DRAIN_EXECUTOR|DRAIN_DROP_COUNT' pantera-main/src/main = 0
  RepoBulkhead class count = 1
  pantera-main 955/955 pass; pantera-core 956/956 pass
  BUILD SUCCESS
…te (WI-10)

Delivers the release-gate infrastructure for v2.2.0:

SLO docs (7): one per adapter with availability/latency targets
  from target-architecture §14 + burn-rate alert thresholds.

CI perf baseline: .github/workflows/perf-baseline.yml runs on PR,
  compares p50/p95/p99 against committed baselines (tests/perf-baselines/),
  fails on >10% p99 regression. scripts/perf-benchmark.sh + perf-compare.sh.

Chaos tests (4 classes, 11 @tag("Chaos") tests):
  ChaosMemberTimeoutTest — 30s proxy member; deadline-bounded
  ChaosDbStallTest — 500ms DB stall → IndexUnavailable
  ChaosQueueSaturationTest — 100 concurrent requests under load
  ChaosStorageEvictionTest — TOCTOU eviction → proxy fallthrough

scripts/release-gate.sh — runs full suite + chaos + perf gates.

All 11 chaos tests pass standalone: mvn -pl pantera-main test -Dgroups=Chaos
Adds the admin panel for negative-cache inspection, invalidation,
and stats as specified in target-architecture §5.6.

Backend (NegativeCacheAdminResource):
  GET  /api/v1/admin/neg-cache           paginated L1 entries
  GET  /api/v1/admin/neg-cache/probe     single-key presence check
  POST /api/v1/admin/neg-cache/invalidate         single-key
  POST /api/v1/admin/neg-cache/invalidate-pattern  rate-limited 10/min
  GET  /api/v1/admin/neg-cache/stats     per-scope counters
  All require admin role. Pattern invalidation rate-limited.
  Every invalidation emits Tier-4 WARN with event.action=neg_cache_invalidate.

Frontend (NegativeCacheView.vue):
  Three-tab Vue 3 Composition API page under /admin/neg-cache:
  Inspector (filterable DataTable + probe), Invalidation (single +
  pattern with confirm dialog), Stats (dashboard cards).

9 integration tests covering auth, CRUD, rate-limit, logging.

Closes WI-06b.
…ly automatic (WI-post-03c)

Removes the 446-LOC MdcPropagation helper class that was the
source of architecture-review anti-pattern A14 ("MDC propagation
is manual boilerplate — 7+ wrappers per request path, each one a
silent context-loss trap if forgotten").

All 31 remaining production call-sites eliminated:
  GroupSlice.java:        7 wrappers removed
  MavenGroupSlice.java:  12 wrappers removed
  MdcPropagation.java:    9 self-references (class + javadoc)
  ContextualExecutor.java: 1 javadoc reference updated
  HandlerExecutor.java:    2 javadoc/comment references updated

Context propagation is now fully handled by ContextualExecutor-
wrapped executors at every async boundary (SingleFlight, drain
pools, DbArtifactIndex, HandlerExecutor). No per-call-site
MdcPropagation.withMdc* wrappers anywhere in the codebase.

Closes WI-post-03c + architecture-review anti-patterns A14, C4.

Deleted:
  pantera-core/src/main/java/com/auto1/pantera/http/trace/MdcPropagation.java  (-446 LOC)
  pantera-core/src/test/java/com/auto1/pantera/http/trace/MdcPropagationTest.java

pantera-core 947/947 pass; pantera-main 975/975 pass.
MdcPropagation grep across production = 0.
Move cooldown classes from flat package into SOLID sub-packages:
- api/: CooldownService, CooldownInspector, CooldownRequest, CooldownResult,
        CooldownBlock, CooldownReason, CooldownDependency
- cache/: CooldownCache
- config/: CooldownSettings, CooldownCircuitBreaker, InspectorRegistry
- impl/: CachedCooldownInspector, NoopCooldownService
- response/: CooldownResponses, CooldownResponseFactory (new),
             CooldownResponseRegistry (new)
- metadata/: unchanged (already sub-packaged)
- metrics/: unchanged (already sub-packaged)

Rename CooldownMetadataServiceImpl -> MetadataFilterService.
Remove dead-code root CooldownMetrics.java (duplicate of metrics/).
Update package statements and imports across 87 files.
No behaviour change.
… (H5)

The inflight map in CooldownCache.queryAndCache() had a race condition
where entries were not removed on exceptional completion or cancellation.
The root cause was inflight.put() happening after whenComplete() was
registered, so if the future completed before put() ran, the remove()
in whenComplete() would fire before the put(), leaving a zombie entry.

Fix: register in inflight BEFORE attaching whenComplete handler, and
add .orTimeout(30, SECONDS) as a zombie safety net.
Add MetadataParser.extractReleaseDates() default method so adapters can
expose release timestamps from their metadata format. NpmMetadataParser
implements this by delegating to the existing ReleaseDateProvider.

After parsing, MetadataFilterService bulk-populates CooldownCache L1
with allowed=false for versions older than the cooldown period. This
eliminates DB/Valkey round-trips on the hot path for the majority of
versions that are well past the cooldown window.
…ecutor (H2)

Replace sequential version evaluation with parallel dispatch on a
dedicated 4-thread ContextualExecutorService-wrapped pool. Each
evaluateVersion() call is dispatched via CompletableFuture.supplyAsync()
on the bounded pool, then collected with CompletableFuture.allOf().

50 versions with L1-cached results now complete under 50 ms.
aydasraf added 27 commits April 17, 2026 21:14
…to request path (Phase 6)

Introduce CooldownAdapterBundle record and CooldownAdapterRegistry to hold
per-repo-type parser/filter/rewriter/detector/responseFactory components.
CooldownWiring registers all 7 adapters (maven, npm, pypi, docker, go,
composer; gradle aliased to maven) at startup via CooldownSupport.create().

BaseCachedProxySlice now looks up the per-adapter CooldownResponseFactory
from the registry when building 403 responses, falling back to the
deprecated CooldownResponses.forbidden() for unregistered types.

Also: CooldownResponseRegistry promoted to singleton, CooldownResponses
deprecated with @deprecated(forRemoval=true), NpmMetadataRequestDetector
created to complete the npm adapter bundle.
…se 7)

- MetadataFilterServiceIntegrationTest: end-to-end with Go adapter format,
  verifying filtered output, cache hit, SWR behaviour, invalidation
- CooldownAdapterRegistryTest: bundle registration, alias lookup, null
  rejection, overwrite, clear
- CooldownConcurrentFilterStampedeTest (@tag("Chaos")): 100 concurrent
  requests for same uncached metadata, parser runs <= 5 times (stampede
  dedup), all callers get consistent filtered bytes
…hase 8)

- New docs/cooldown-metadata-filtering.md: two-layer enforcement overview,
  7-adapter table, per-adapter metadata format details, performance
  characteristics (H1-H5), admin operations, configuration reference,
  package structure, test summary
- CHANGELOG-v2.2.0.md: added Cooldown Metadata Filtering section covering
  Phases 1-8 (package restructure, 5 performance hardenings, 7 adapter
  implementations, 403 response factories, admin hardening, bundle
  registration, integration + chaos tests)
- docs/analysis/v2.2.0-pr-description.md: updated summary and added
  cooldown phases checklist to PR body
Per v2.2 spec 'Change existing Hex entry: value: hex to value: hexpm',
removes the legacy 'hex' key from TECH_MAP, REPO_TYPE_FILTERS entries, and
techSetup mappings. SettingsView now emits 'hexpm-proxy' instead of
'hex-proxy' — matches the canonical family key ApiRoutingSlice normalizes to.
SearchView.vue's startsWith('hex') prefix is retained as it still matches
hexpm.
…r picker

- New gradle.md: describes the gradle / gradle-proxy / gradle-group
  family as a Maven-format alias with its own UI/API surface.
- go.md: adds go-proxy and go-group sections.
- ui-guide.md: documents the AutoComplete group-member picker, the
  type-compatibility rule, and the inline 'Create new' modal.
- index.md: links the new gradle guide.
… shape

Reduces per-call-site boilerplate when wiring RepositorySlices from
the deprecated GroupSlice to GroupResolver. Accepts the same
(SliceResolver, memberNames, port, depth, timeout, ...) tuple the
legacy class took and builds the MemberSlice list internally via a
static buildMembers() helper, then delegates to the existing
member-accepting constructor.
…ences

MdcPropagation was deleted in pantera-core per v2.2-target-architecture
section 4.4. Three test files and one analysis doc still mentioned the
deleted class by name. All textual references updated to the current
ContextualExecutor / TraceContextExecutor primitives; no semantic
changes (every reference was in javadoc/comments/DisplayName — never
a live call site).
…e-eval tests

Covers the two remaining cases from the cooldown-metadata-filtering
Task 15: a cooldown-duration policy change must invalidate the
FilteredMetadataCache (flipping block decisions for affected
versions); and a successful invalidate() call after an upstream
publish must force a re-parse so the newly-published version is
returned on the next query.
Two chaos scenarios from the cooldown-metadata-filtering plan Task 19:
a slow/unreachable L2 must not block reads served by a warm L1, and a
bounded L1 under high write cardinality (10x capacity) must evict
old entries without OOM.
Replaces the 4 'new GroupSlice(...)' instantiations in RepositorySlices
with 'new GroupResolver(...)' — npm-group, file/php-group,
maven-group, and the generic group-adapter case (gem/go/gradle/pypi/
docker). GroupResolver is now the sole production group-resolution
engine. Closes WI-04 wiring step from v2.2-target-architecture. The
deprecated GroupSlice class itself is removed in the follow-up commit.

All 984 pantera-main tests pass post-wiring.
- Delete GroupSlice.java (1338 LOC) — superseded by GroupResolver,
  which was wired in the previous commit.
- Delete GroupSliceTest, GroupSliceFlattenedResolutionTest,
  GroupSliceIndexRoutingTest, GroupSlicePerformanceTest — obsolete.
- Rename GroupSliceMetrics -> GroupResolverMetrics (Micrometer
  wrapper class; callers updated in VertxMain and GroupResolver).
- Update stale javadoc and inline comments across 15 files to
  point at GroupResolver instead of the deleted class.
- NegativeCacheSingleSourceTest: drop GroupSlice.java from the
  allowed-sites list.

Zero remaining compile dependencies on GroupSlice. Full test suite
green (4397 tests, 0 failures, 0 errors).
… retire stale GroupSlice javadoc

B2b: migrate all 12 production call-sites of the
@deprecated(forRemoval=true) CooldownResponses.forbidden(block) helper
to CooldownResponseRegistry.instance().get(repoType).forbidden(block).
CooldownResponses was deleted in the preceding commit.

Sites:
- files-adapter/FileProxySlice (repoType file-proxy; factory absent,
  path is unreachable because FileProxySlice wires Noop cooldown)
- npm-adapter/DownloadAssetSlice (repoType from field)
- pypi-adapter/ProxySlice (repoType from field)
- composer-adapter/CachedProxySlice x3, ProxyDownloadSlice
- go-adapter/CachedProxySlice
- pantera-core/BaseCachedProxySlice fallback — now throws
  IllegalStateException on missing factory (no silent fallback)
- pantera-main/DockerProxyCooldownSlice x3

CooldownWiring adds response-factory aliases so every repoType string
that reaches the registry resolves: npm-proxy, pypi-proxy,
docker-proxy, go-proxy, php, php-proxy.

Also lands the tail of A5: stale GroupSlice javadoc / inline comment
references updated across 16 files to point at GroupResolver. The
deleted class has no lingering textual footprint in production code.

Full test suite: 4397 tests, 0 failures, 0 errors.
Documents the GroupResolver wire-up, GroupSlice deletion,
CooldownResponses removal, hex->hexpm UI conformity, Task 15/19
test coverage, and the MdcPropagation reference cleanup.
…rage

Docker TrimmedDocker.trim() and SubStorage.list() both called
Pattern.compile per invocation — at 1000 req/s these sites amounted to
thousands of compile allocations/second. Pattern is now a final field
compiled once in the ctor. Regex semantics preserved exactly; all
existing unit tests pass.
…11 adapter sites

Direct adapter call sites previously used registry.get(repoType).forbidden(...)
which NPEs if the factory is missing — losing the descriptive repoType
context. Add a getOrThrow() helper that produces
IllegalStateException('No CooldownResponseFactory registered for
repoType: X'), migrate all 11 production sites across files / npm /
pypi / composer / go / docker adapters, and collapse
BaseCachedProxySlice's inline null-check to call getOrThrow directly.
…bian streams

XmlPrimaryChecksums and FilePackageHeader previously opened InputStreams
eagerly in their ctors; if the consuming method was never invoked the
stream leaked. Both now store only the Path and open inside the
consuming method under try-with-resources. rpm Gzip unpackTar now wraps
GzipCompressorInputStream in the same try-with as TarArchiveInputStream
so the native Inflater is released if the tar wrapper ctor throws.
Debian MultiPackages.merge wraps both GZIP streams in try-with-resources;
caller-owned outer streams protected by a non-closing wrapper adapter.
… WorkerExecutor

DbArtifactIndex executor used CallerRunsPolicy — under queue saturation
at 1000 req/s, the caller (potentially a Vert.x event-loop thread) would
run the JDBC query inline. Switch to AbortPolicy. GroupResolver already
maps the resulting RejectedExecutionException to Fault.IndexUnavailable
via the existing exceptionally(...) branch, which FaultTranslator returns
as 500 with X-Pantera-Fault: index-unavailable.

Chaos test DbArtifactIndexSaturationTest asserts overflow submissions
surface REE, never execute on a Vertx event-loop thread, and classify to
the expected typed fault.

Also deletes dead code in AsyncApiVerticle — api-workers WorkerExecutor
created but never referenced by any route (verified via grep).
Cross-cutting cancel-propagation: at 1000 req/s, client disconnects
mid-response previously did not cancel upstream Jetty fetches — bytes
kept streaming into dead sockets until the next write organically
failed, wasting upstream bandwidth and holding file handles.

Changes:
- VertxSliceServer: register closeHandler on request.connection(),
  exceptionHandler on response and request; capture reactive-streams
  Subscription via doOnSubscribe and cancel it on any disconnect signal.
  accept() signature extended with trailing AtomicReference<Runnable>
  cancelHook (private method).
- ArtifactHandler: capture Disposable returned by Flowable.subscribe on
  both download handlers; dispose on response closeHandler /
  exceptionHandler.
- StreamThroughCache, DiskCacheStorage: add doOnCancel matching existing
  doOnError — close channel + delete temp file on cancel.
- VertxRxFile.save: safety-net doOnError closes AsyncFile if upstream
  errors before the subscriber sees data.
- Http3Server: bound per-stream buffer via
  PANTERA_HTTP3_MAX_STREAM_BUFFER_BYTES (default 16 MB); reject on
  overflow per spec (spill-to-file is deferred — current HTTP/3 body
  handling is stub-level per existing code comments).
…efaults

Adds a CachedUsers-style decorator in front of LocalEnabledFilter that
caches the per-user 'enabled' flag in L1 Caffeine + L2 Valkey with
CacheInvalidationPubSub-backed cross-node eviction. At 1000 req/s the
previous synchronous JDBC hit per request could exhaust the 50-connection
Hikari pool under any DB latency spike; cache hit rate is expected >95%,
cutting JDBC pressure to ~once per user per TTL.

All cache settings honor the 3-tier precedence (env var -> YAML ->
compile default). Defaults land in GlobalCacheConfig:
  meta.caches.auth-enabled.l1.maxSize      = 10000
  meta.caches.auth-enabled.l1.ttlSeconds   = 300
  meta.caches.auth-enabled.l2.enabled      = true
  meta.caches.auth-enabled.l2.ttlSeconds   = 3600
  meta.caches.auth-enabled.l2.timeoutMs    = 100

UserHandler wires CachedLocalEnabledFilter.invalidate(username) on
putUser / deleteUser / enableUser / disableUser / alterPassword so admin
changes propagate to the cache. Pub/sub broadcasts the invalidation
across cluster nodes.

Hikari ArtifactDbFactory defaults tightened:
  connectionTimeout       5000 -> 3000 ms
  leakDetectionThreshold  300000 ->  5000 ms
Env-var overrides (PANTERA_DB_*) unchanged. Operators may now see
Hikari leak WARNs that were silent before — each one is a real
held-connection bug to triage.

Added ConfigDefaults.getBoolean() for future config sections.
CacheInvalidationPubSub gained subscribe(String namespace,
Consumer<String>) as a thin wrapper over the existing Cleanable-based
register(...) API.
…ataRegistry sanity cap

The previous 'lastKnownGood' ConcurrentHashMap was unbounded — under
high-cardinality workloads the stale-fallback store grew indefinitely.
Replace with a full 2-tier cache (L1 Caffeine + L2 Valkey) driven by the
new meta.caches.group-metadata-stale section in GlobalCacheConfig.

Design principle: the cache is an aid, never a breaker. Under realistic
cardinality no eviction ever fires — the bounds are a JVM-memory safety
net against pathological growth, not an expiry mechanism. Graceful
degradation on read (L1 -> L2 -> expired-primary-cache-entry -> miss)
preserves the 'stale forever' availability semantic even when both stale
tiers evict, because the primary-cache entry lingers past TTL in
Caffeine's internal map. Across JVM restarts L2 now survives (the old
CHM did not), strictly improving availability.

Config (every threshold env-var + YAML overridable; compile-time
fallback only when both absent):
  meta.caches.group-metadata-stale.l1.maxSize     = 100000
  meta.caches.group-metadata-stale.l1.ttlSeconds  = 2592000   # 30 days
  meta.caches.group-metadata-stale.l2.enabled     = true
  meta.caches.group-metadata-stale.l2.ttlSeconds  = 0         # Valkey LRU owns eviction
  meta.caches.group-metadata-stale.l2.timeoutMs   = 100

Also adds JobDataRegistry overflow detection: at 10000 entries (env
PANTERA_JOB_DATA_REGISTRY_MAX) emit an ECS error log naming a key
prefix so operators can find the leaking scheduler site. Never silently
drops — still accepts the entry. Lifecycle audit sweeper remains a P2
follow-up.

GroupMetadataCache public API is unchanged (getStale retained as a
@deprecated delegating alias). Existing callers and GroupMetadataCacheTest
continue to work. New GroupMetadataCacheStaleFallbackTest covers the
4-step degradation path.
…jectMappers

ArtifactHandler download paths copied every chunk via new byte[] +
buf.get(bytes) + Buffer.buffer(bytes). At 1000 req/s x 5 MB bodies x
64 KB chunks that produced ~80000 byte[] allocations/s straight to
garbage. Replace with Buffer.buffer(Unpooled.wrappedBuffer(buf)) — zero
copy, zero allocation. Heap-ByteBuffer wrap is GC-managed; Vert.x
releases on write completion.

Yaml2Json and Json2Yaml created a fresh ObjectMapper (and a fresh
YAMLMapper for Json2Yaml) on every call. Hoisted both to static final
JSON and YAML fields. Jackson feature configuration applied once at
static init — safe under JMM. Admin plane, not request-hot, but still
wrong.

Preserves Group A's Disposable capture + closeHandler wiring on both
ArtifactHandler download paths (verified post-edit at the expected
lines).
Replaces the reflective MetadataMerger-based merge path in
MavenGroupSlice with a new StreamingMetadataMerger using StAX (hardened
against XXE). The merger accumulates only the deduplicated <version>
TreeSet and the newest-wins scalars (<latest>, <release>,
<lastUpdated>, <snapshot>) — peak memory is O(unique versions), not
O(sum of member body sizes).

Per-member bodies are still buffered as byte[] on arrival (the async
fetch returns CompletableFuture<byte[]>), but each byte[] becomes
unreachable as soon as mergeMember(...) returns — the previous path
accumulated every member's full body in a ByteArrayOutputStream list
passed to the SAX-based reflective merger. Full wire-streaming would
require plumbing a publisher->InputStream adapter through Content and
is deferred.

Maven version ordering delegates to org.apache.maven.artifact.versioning.
ComparableVersion (already on the pantera-main classpath via the
maven-adapter transitive dep).

Malformed or truncated member bodies are skipped with a WARN
(event.reason=member_metadata_parse) — remaining members still merge
successfully.

Alert-only histogram pantera.maven.group.member_metadata_size_bytes
(tagged with repo_name) records per-member body size. No rejection at
any size: any cap introduced here could synthesize a client-facing 502
for legitimately large metadata, which would be a worse failure mode
than the original. The histogram surfaces outliers to ops without
breaking resolution.

Also replaces a 20-iteration 'String.format("%02x", b)' checksum hex
loop with java.util.HexFormat.of().formatHex(digest) (single
allocation per request; mirrors the existing ProxyCacheWriter.HEX
idiom).

MavenGroupSlice public API unchanged; MavenGroupSliceTest (8/8 green).
New StreamingMetadataMergerTest: 7 cases covering disjoint + overlapping
versions, max-scalar semantics, malformed-member skip, all-empty
minimal output.
…XY_PROTOCOL flag

When Pantera's HTTP/3 listener is fronted by an NLB (or any proxy-
protocol-v2 LB), the real client IP is carried in the PROXY prelude —
without it, getRemoteAddr() returns the LB IP. Mirrors the existing
Vert.x HTTP/1+2 use-proxy-protocol pattern in AsyncApiVerticle / VertxMain.

Prepends Jetty's ProxyConnectionFactory to the QuicheServerConnector's
factory varargs when the flag is true. Default false — zero behavior
change. Emits an INFO startup log event.action=http3_proxy_protocol_enabled
with url.port when enabled.

Env-only for now (PANTERA_HTTP3_PROXY_PROTOCOL); YAML path
meta.http3.proxyProtocol is not wired because Http3Server's public
ctor does not currently take a Settings object. Documented as
follow-up. Preserves Group A's MAX_STREAM_BUFFER_BYTES field + its
buffer-cap enforcement.
Follow-up to the CallerRunsPolicy -> AbortPolicy switch. When the index
executor's queue fills and AbortPolicy fires, CompletableFuture.supplyAsync
rethrows RejectedExecutionException SYNCHRONOUSLY before the caller
receives a future — meaning a caller on the Vert.x event loop would see
the raw exception propagate up the stack instead of getting a failed
CompletableFuture to chain onto.

Wrap supplyAsync in a try/catch(REE) that returns
CompletableFuture.failedFuture(ree) so callers always get a proper
future regardless of saturation state. GroupResolver's existing
exceptionally(...) path then maps the REE to Fault.IndexUnavailable
uniformly (whether the rejection happens sync or async).

Extract the JDBC body into a private locateByNameBody(String) helper
— no logic change, just hoisted out of the lambda so the outer
try/catch is straightforward.
…gelog

Documents the Groups A-H production-readiness pass from the Opus 4.7
audit (2026-04-18):

Admin guide:
- cache-configuration.md: consolidated meta.caches.* reference with
  3-tier override precedence (env -> YAML -> default).
- valkey-setup.md: maxmemory-policy=allkeys-lru requirement, retention
  sizing per cache section.
- database.md: Hikari fail-fast section with canary ramp instructions.
- deployment-nlb.md: PANTERA_HTTP3_PROXY_PROTOCOL flag for HTTP/3
  behind NLB.
- runbooks.md: new 5xx signals (X-Pantera-Fault: index-unavailable,
  storage-unavailable, deadline-exceeded, overload, upstream-integrity),
  AllProxiesFailed pass-through behavior change.
- v2.2-deployment-checklist.md: pre/during/post-deploy steps with
  specific metric thresholds.
- environment-variables.md: added auth / stale-cache / HTTP/3 /
  scheduler env vars; Hikari defaults updated to 3000/5000.

Developer guide:
- caching.md: canonical L1 Caffeine + L2 Valkey + pub/sub pattern with
  reference classes; 'cache is an aid, never a breaker' principle.
- fault-model.md: new emitter DbArtifactIndex -> Fault.IndexUnavailable.
- reactive-lifecycle.md: cancel-propagation contract; three-terminal-
  path pattern (complete/error/cancel) with CachingBlob.content as the
  canonical example.
- cooldown.md: prefer getOrThrow(repoType) over get(repoType); adapter
  factory registration is now a startup-time hard requirement.

User guide:
- response-headers.md: X-Pantera-Fault, X-Pantera-Proxies-Tried,
  X-Pantera-Stale, X-Pantera-Internal.
- error-reference.md: 500 index-unavailable (retry), 500
  storage-unavailable (retry), 502 upstream-integrity, 502
  AllProxiesFailed pass-through.
- streaming-downloads.md: server-side cancel propagation — no client
  action needed.

CHANGELOG-v2.2.0.md: top-of-file highlights mention of the Opus 4.7
audit plus a new Production-readiness hardening section with one
paragraph per group (A, B, C, D+E.3+E.4, E.2, E.1, G, H.1, H.2, H.3, F).
Existing content preserved.
…/superpowers/

- Merge CHANGELOG-v2.2.0.md into CHANGELOG.md as a new 'Version 2.2.0'
  section at the top, matching the existing emoji-section-header style
  used for 2.1.3 (Architectural / Performance / Bug fixes / Cleanup /
  Added / Changed / Deprecated / Observability / Security / Docs /
  Testing / Migration). Every bullet carries the [@aydasraf](...) attribution
  line. CHANGELOG-v2.2.0.md deleted — CHANGELOG.md is now the single
  source of truth.
- Untrack all docs/superpowers/ plans and specs from the working tree.
  These are local working notes only; .gitignore already excludes the
  directory for new files, but 9 files committed in earlier sessions
  are now removed from the index. History-rewrite to also purge them
  from prior commits will be handled separately (requires force-push
  to the shared 2.2.0 branch).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant