Conversation
Three artefacts describing the post-deploy forensics and the target design for v2.2.0: - v2.1.3-post-deploy-analysis.md: forensic analysis of the 12h log window after v2.1.3, root-cause of the 503 burst (Queue-full cascade in DownloadAssetSlice), 404 noise (every 4xx at WARN), and perf regression triggers. - v2.1.3-architecture-review.md: enterprise-architecture review — 10 patterns present, 20 anti-patterns, 6 cross-cutting concerns, 9 strategic refactors, SOLID/enterprise checklist scored 5.4/10. - v2.2-target-architecture.md: target design — one fault taxonomy, one context, one single-flight, five-tier structured logging (incl. audit), ECS-native propagation, per-repo bulkheads, repo-negative cache with scope partitioning + synchronous upload invalidation, admin UI. Implementation split into 11 agent-executable work items. Also adds logs/.analysis/ to .gitignore (multi-GB working files used during forensic triage, not source of truth).
Tactical hotfix for the two dominant post-deploy issues observed in
v2.1.3 — the 503 burst on npm (11.5k 'Queue full' stack traces in a
2-minute window) and the 2.4M/12h WARN log flood (every 4xx access
log emitted at WARN). No architecture change; lands on the 2.2.0
branch as the minimal safe starting point for WI-00.
Closes forensic §1.6/§1.7 F1.1/F1.2/F2.1/F2.2/F4.4.
---
1. queue.add -> queue.offer on every bounded ProxyArtifactEvent /
ArtifactEvent queue write-site in a request-serving path.
AbstractQueue.add() throws IllegalStateException('Queue full') on
overflow; offer() returns false and lets us degrade gracefully.
Sites migrated:
npm: DownloadAssetSlice:198 + :288 (the observed 503 source),
UnpublishForceSlice, UnpublishPutSlice
pypi: ProxySlice (4 sites)
go: CachedProxySlice, GoUploadSlice
docker: CacheManifests, PushManifestSlice (unbounded — marked)
helm: DeleteChartSlice, PushChartSlice
rpm: RpmRemove, RpmUpload
hex: UploadSlice
nuget: PackagePublish
core: scheduling.RepositoryEvents, asto.events.EventQueue
Unbounded ConcurrentLinkedDeque<ArtifactEvent> sites (docker,
hex, nuget, rpm, go-upload, npm-unpublish) keep add() with a
'// ok: unbounded' annotation so the intent is obvious to a
future reviewer and so a future migration to a bounded queue is
guaranteed to be re-examined.
2. EventsQueueMetrics (new pantera-core/metrics): the single
callback used when offer() returns false — emits one structured
WARN (no stack trace; event.action=queue_overflow) and bumps
pantera.events.queue.dropped{queue=<repo>} on the shared
MicrometerMetrics registry. Drop is silent at the request level;
operator sees the counter + WARN in Kibana.
3. DownloadAssetSlice catch-all wrappers. The ifPresent lambda
that enqueues ProxyArtifactEvent is now wrapped in
try { ... } catch (Throwable t) { log at WARN, continue; } on
BOTH the cache-hit (line ~185) AND the cache-miss (line ~275)
paths. A background-queue failure can NEVER escape the serve
path. The 50-concurrent-cache-hits-under-full-queue test
(DownloadAssetSliceQueueFullTest) proves it: 50 / 50 respond 200.
4. EcsLogEvent access-log level policy:
404 -> INFO (was WARN): routine Maven/npm probe miss.
401 -> INFO (was WARN): normal auth-then-retry flow.
403 -> INFO (was WARN): policy reject, not a Pantera fault.
other 4xx -> WARN (unchanged).
5xx -> ERROR (unchanged).
slow (>5s) -> WARN (unchanged).
Contract tests added in EcsSchemaValidationTest:
notFoundResponsesLogAtInfoNotWarn
unauthorizedResponsesLogAtInfoNotWarn
forbiddenResponsesLogAtInfoNotWarn
otherFourXxStillLogAtWarn
fiveXxStillLogAtError
5. 'Repository not found in configuration' (RepositorySlices) :
downgraded WARN -> INFO. This is a client-config error (stale
repo URL in a pom.xml somewhere), not a Pantera failure, and it
was producing ~1,440 WARN lines per 12h.
6. Jetty HTTP client 'Idle timeout expired: 30000/30000 ms'
(JettyClientSlice): downgraded ERROR -> DEBUG via a new
isIdleTimeout(Throwable) helper. Connection idle-close is a
normal lifecycle event, not a request failure; real HTTP
request failures on a still-active connection continue to log
at ERROR. Observed count was 20 ERRORs / 12h all for idle
close, now DEBUG.
---
Acceptance verified on branch 2.2.0 at HEAD:
rg 'queue\.add\(' | rg -v test | rg -v '// ok:' = 0 matches
mvn -pl pantera-core test = 788 / 788 pass (0 fail, 7 pre-existing skips)
mvn -pl npm-adapter test = 191 / 191 pass (incl. new DownloadAssetSliceQueueFullTest)
mvn -pl pypi-adapter,go-adapter,docker-adapter,helm-adapter,
rpm-adapter,hexpm-adapter,nuget-adapter test = all green
EcsSchemaValidationTest 23 / 23 pass (5 new level-policy tests)
No other behavioural change — Fault / Result types land in a follow-up
commit (WI-01).
Additive introduction of the v2.2 fault taxonomy: a sealed Fault
hierarchy, a Result<T> sum, a single FaultClassifier, and a single
FaultTranslator. No existing slice is wired to them yet (WI-04 does
that); this commit establishes the vocabulary and the unit-test
contract so every downstream WI can consume it.
New package pantera-core/http/fault:
Fault (sealed)
NotFound(scope, artifact, version)
Forbidden(reason)
IndexUnavailable(cause, query)
StorageUnavailable(cause, key)
AllProxiesFailed(group, outcomes, winningResponse?) -- pass-through
UpstreamIntegrity(upstreamUri, algo, sidecarClaim, computed)
Internal(cause, where)
Deadline(budget, where)
Overload(resource, retryAfter)
MemberOutcome(member, kind, cause, response?) -- 4-arg
MemberOutcome.Kind { OK, NOT_FOUND, FIVE_XX, EXCEPTION,
CANCELLED, CIRCUIT_OPEN }
ChecksumAlgo { MD5, SHA1, SHA256, SHA512 }
Result<T> (sealed)
Ok<T>(value), Err<T>(fault)
ok/err factories, map, flatMap
FaultClassifier
classify(Throwable, String where)
TimeoutException -> Deadline
ConnectException / IOException -> Internal
ValueNotFoundException -> StorageUnavailable
IllegalStateException('Queue full') -> Overload
default -> Internal
Unwraps CompletionException before matching.
FaultTranslator
translate(Fault, RequestContext) -- one site
NotFound -> 404
Forbidden -> 403
IndexUnavailable -> 500 (X-Pantera-Fault: index-unavailable)
StorageUnavailable -> 500 (X-Pantera-Fault: storage-unavailable)
Internal -> 500 (X-Pantera-Fault: internal)
Deadline -> 504 (X-Pantera-Fault: deadline-exceeded)
Overload -> 503 + Retry-After + X-Pantera-Fault: overload:<r>
AllProxiesFailed -> PASS-THROUGH: streams the winning proxy
Response verbatim (status, headers, body)
with X-Pantera-Fault: proxies-failed:<name>
and X-Pantera-Proxies-Tried: <n>; synthesizes
502 only when no proxy produced a Response at
all.
UpstreamIntegrity -> 502 (X-Pantera-Fault: upstream-integrity:<algo>)
pickWinningFailure(List<MemberOutcome>) -- ranking
retryability first: 503 > 504 > 502 > 500 > other 5xx
with-body > no-body
tie-break : declaration order
New package pantera-core/http/context:
RequestContext(traceId, httpRequestId, repoName, urlOriginal)
-- minimal scaffold for this WI; WI-02 expands to the full
record (user.name, client.ip, package.*, deadline, ...).
Tests (40 new):
FaultTranslatorTest (11) one per Fault variant
+ exhaustive-switch guard
FaultAllProxiesFailedPassThroughTest (10) every row from the
worked-examples table
in target-architecture.md
§2 including declaration-
order tiebreak and
empty-outcome edge case
FaultClassifierTest (11) exception -> Fault round-trip,
nested/bare/self-ref
CompletionException
ResultTest (6) factories, map, flatMap
(both Ok- and Err- returns)
RequestContextTest (2) accessors + record equality
Coverage (JaCoCo): fault 99% instructions / 97% branches, context 100%.
Exceeds the 95% DoD in §12 WI-01.
Deviations from the design doc, documented for WI-04 follow-up:
1. MemberOutcome extended from 3-arg to 4-arg with
Optional<Response>, because pickWinningFailure needs to return a
ProxyFailure(name, Response). Two disambiguated factories
MemberOutcome.threw(...) and MemberOutcome.responded(...) avoid
null-overload ambiguity.
2. FaultClassifier uses if/else because ConnectException extends
IOException and switch-case ordering would be confusing.
3. Fault.Forbidden emits textBody(reason); a JSON envelope lands
in a later WI if needed.
4. Bare CompletionException with null cause classifies as Internal.
Acceptance verified:
mvn -pl pantera-core test -Dtest='Fault*Test,Result*Test,
RequestContextTest' = 40 / 40 pass
mvn -pl pantera-core test = 788 / 788 pass
mvn -pl pantera-core verify -DskipTests = BUILD SUCCESS
…WI-07)
Fixes the production Maven checksum-mismatch reported against
com/fasterxml/oss-parent/58/oss-parent-58.pom (and the class of bug
it represents) by introducing a single write-path that verifies the
upstream's sidecar digests against the primary bytes BEFORE the pair
lands in the cache. A mismatch rejects the write entirely — the
cache never holds a stale primary/sidecar pair.
Closes target-architecture doc §9.5 + §12 WI-07.
Root cause (observed post-v2.1.3):
Maven Aether raised
'Checksum validation failed, expected 15ce8a2c... (REMOTE_EXTERNAL)
but is actually 0ed9e5d9...' against Pantera-cached pairs. The
stale-while-revalidate refetch landed new .pom bytes without
re-pulling .pom.sha1; independent Rx pipelines cached .pom and
.pom.sha1 separately; an eviction could drop one without the
other — every mode of drift produced the same user-visible
ChecksumFailureException in CI builds.
New pantera-core/http/cache/ProxyCacheWriter:
CompletionStage<Result<Void>> writeWithSidecars(
Key primaryKey,
Supplier<CompletionStage<InputStream>> fetchPrimary,
Map<ChecksumAlgo, Supplier<CompletionStage<Optional<InputStream>>>>
fetchSidecars,
RequestContext ctx);
• Primary is streamed into a temp file (NIO, bounded chunk size —
no heap scaling with artifact size) while four MessageDigest
accumulators (MD5, SHA-1, SHA-256, SHA-512) update in the same
pass.
• Sidecars are pulled concurrently, buffered fully (<200 B each),
hex-normalised (trim + lowercase), and compared against the
computed digest for that algorithm.
• Any mismatch => Result.err(Fault.UpstreamIntegrity(uri, algo,
claim, computed)); temp files deleted; cache not touched.
• Verified pair is saved in a primary-first, sidecars-after
sequence via Storage.save (asto FileStorage does its own
tmp-rename atomicity per key). A concurrent reader therefore
never sees a sidecar without its matching primary.
• Rollback on partial failure: sidecar-save that fails after the
primary landed best-effort-deletes primary + any saved sidecars
so the next GET re-enters the writer cleanly. An
IntegrityAuditor (nested static class) is the post-hoc heal for
the rare case where rollback itself fails.
Tier-4 LocalLogger events under com.auto1.pantera.cache:
event.action = cache_write ok package.checksum = <sha256>
event.action = cache_write fail pantera.cache.algo + pantera.cache.computed + pantera.cache.sidecar_claim
event.action = integrity_audit both pantera.audit.scanned + pantera.audit.mismatches + pantera.audit.fix
Maven adapter wired:
maven-adapter/.../CachedProxySlice preProcess now branches through
ProxyCacheWriter for primary artefacts (.pom/.jar/.war/.aar/.ear
/.zip/.module) on cache-miss. The cache-hit code path is
unchanged — only misses go through the new writer. SWR refetch
uses the same writer so primary + sidecars stay coherent across
refreshes.
Audit / healing tool:
scripts/pantera-cache-integrity-audit.sh — wrapper
pantera-main/.../tools/CacheIntegrityAudit — CLI entry point
pantera-core/.../ProxyCacheWriter.IntegrityAuditor — scanner
--repo <name> (optional filter)
--dry-run (default) scan + report; exit 1 on mismatch
--fix also evict mismatched pairs so next GET refetches
Javadoc TODO(WI-post-07) added to the cached-proxy slices of
composer / go / pypi so a future work item wires them to the same
writer. npm adapter's TODO rides with WI-05 in a separate commit.
Deviations from doc §9.5:
• Signature uses InputStream Suppliers, not HttpResponse, because
Content in pantera-core wraps a reactive Publisher — conversion
is the caller's responsibility, keeps the writer pure.
• Return type Result<Void> rather than Result<CachedArtifact>
(that value type does not yet exist; callers know the key).
• IntegrityAuditor is a static nested class of ProxyCacheWriter
instead of a sibling file, to satisfy the WI file scope.
• Maven-adapter integration test lives in pantera-core
(ProxyCacheWriterTest.ossParent58_regressionCheck reproduces
the exact historical hex) — the 86 maven-adapter tests all
stay green under the new code path, covering the wiring.
Acceptance verified on 2.2.0 at HEAD:
mvn -pl pantera-core test -Dtest='ProxyCacheWriterTest,CacheIntegrityAuditTest'
= 13 / 13 pass
mvn -pl pantera-core test = 820 / 820 pass
mvn -pl maven-adapter test = 86 / 86 pass
mvn -T8 install -DskipTests = BUILD SUCCESS
scripts/pantera-cache-integrity-audit.sh = exit 1 on seeded
mismatch; exit 0
after --fix
Collapses the three hand-rolled single-flight implementations that accreted across v2.1.1 -> v2.1.3 into one Caffeine-AsyncCache-backed utility. No behaviour change user-visible; the observed StackOverflowError class (ccc155f) and race-window class (899621b) regressions are now impossible to re-introduce. Closes target-architecture doc §6.4 + §12 WI-05; retires anti-patterns A6 ("25-line comment explaining 14-line race fix"), A7 ("three independent implementations of single-flight"), A8 ("zombie protection asymmetric across coalescers"), A9 ("correctness depends on Async thread-hop") from the review doc. New pantera-core/http/resilience/SingleFlight<K,V>: public CompletableFuture<V> load(K key, Supplier<CompletionStage<V>> loader); public void invalidate(K key); public int inFlightCount(); Contract: - Concurrent load(k, ...) for the same key coalesce into ONE loader.get() invocation; every caller receives the same value. - Entry removed on loader completion so the next load is fresh. - Cancellation of one caller's future does NOT cancel the loader or the other callers (per-caller wrapper). - Loader exception propagates to every waiter; entry removed so the next load retries. - Stack-flat under synchronous leader completion (the GroupSlice StackOverflowError of ccc155f is covered by a 500-follower property test). Three coalescers migrated: 1. CachedNpmProxySlice — RequestDeduplicator<Key, FetchSignal> -> SingleFlight<Key, FetchSignal> 2. GroupSlice — ConcurrentMap<String, CompletableFuture<Void>> inFlightFanouts (+ its 30-line "complete-before-remove" race-comment) -> SingleFlight<String, Void> 3. MavenGroupSlice — inFlightMetadataFetches -> SingleFlight<String, Void> Leader/follower discipline preserved in the migrated sites via an isLeader[] flag set inside the loader — Caffeine invokes the bifunction synchronously on the leader's thread, so the leader still returns the Response (single-subscribe Content) while followers re-enter after the upstream cache is warm. Without this, every follower would also fanout or the 200 case would infinite- loop. Zombie-eviction note: Caffeine's expireAfterWrite does NOT expire entries whose CompletableFuture value is still pending — verified experimentally during development. To meet the A8 zombie guarantee for pending loaders we wrap the loader's future with CompletableFuture.orTimeout(inflightTtl); when the timer fires the entry is invalidated and freed. expireAfterWrite is retained as belt-and-braces for completed-but-unreferenced entries. Also carries the TODO(WI-post-07) Javadoc comment in CachedNpmProxySlice pointing at the future wiring of ProxyCacheWriter (WI-07) for npm tarballs — unrelated to this WI but sharing the same file edit. 14 new SingleFlightPropertyTest cases: coalescesNConcurrentLoads (N=1000) cancellationDoesNotAbortOthers (100 callers, 50 cancelled) zombieEvictedAfterTtl loaderFailurePropagatesToAllWaiters stackFlatUnderSynchronousCompletion (500 followers) supplierThrowSurfacesAsFailedFuture cancellingOneCallerDoesNotCompleteOthersAsCancelled invalidateAllowsSubsequentFreshLoad differentKeysDoNotCoalesce constructorRejectsInvalidInputs loadRejectsNullKeyOrLoader inFlightCountTracksPendingLoads waiterTimeoutIsLocal loaderReturningCancelledStage Pre-existing regression guards stay green unchanged: GroupSliceFlattenedResolutionTest.concurrentMissesCoalesceIntoSingleFanout GroupSliceFlattenedResolutionTest.coalescingIsStackSafeAtHighConcurrency (N=1000) MavenGroupSliceTest (8 tests) Follow-up left explicit for a future WI (call it WI-post-05): BaseCachedProxySlice still uses RequestDeduplicator — it was outside this WI's file-allow scope. Migrating it is a ~20 LOC mechanical change identical to CachedNpmProxySlice's. Once that lands, RequestDeduplicator.java + RequestDeduplicatorTest.java + the DedupStrategy enum can be deleted, and FetchSignal can be promoted to a top-level pantera-core/http/cache/FetchSignal.java. Acceptance verified on 2.2.0 at HEAD: mvn -pl pantera-core test -Dtest='SingleFlightPropertyTest' = 14 / 14 pass mvn -pl pantera-core test = 820 / 820 pass mvn -pl npm-adapter test = 191 / 191 pass mvn -T4 -pl pantera-main -am test = 929 / 929 pass mvn -T8 install -DskipTests = BUILD SUCCESS
Ships the three release artefacts produced by the final end-to-end
reviewer after WI-00 + WI-01 + WI-05 + WI-07 landed on 2.2.0:
CHANGELOG-v2.2.0.md
Operator-facing release notes in the style of the existing
v2.1.3 changelog: Highlights / Fixed / Added / Changed /
Deprecated / Under-the-hood sections, with forensic-doc
section refs so on-call can trace any entry back to the
original symptom.
docs/analysis/v2.2.0-pr-description.md
GitHub PR body ready for `gh pr create --body-file ...`.
Includes the WI checklist (4 shipped, 7+ deferred), the full
test-run evidence (2,355 tests green across every touched
module), the three PR-time risks called out by the reviewer
(pom version still 2.1.3, CachedProxySlice 404-swallow
footgun, commonPool() usage in SingleFlight + ProxyCacheWriter),
and a reviewer-focus checklist.
docs/analysis/v2.2-next-session.md
Agent-executable task list for the remaining WIs, written in
the exact same shape as target-architecture.md §12 so the next
session's worker + reviewer agents can pick each one up with
zero context from the originating conversation. Priority-
ordered:
WI-post-05 migrate BaseCachedProxySlice to SingleFlight;
delete RequestDeduplicator + DedupStrategy;
promote FetchSignal to top-level.
WI-post-07 wire ProxyCacheWriter into npm/pypi/go/docker/
composer cached-proxy slices (TODO markers
already placed).
WI-02 expand RequestContext to the full scope
per doc §3.3 (APM + ECS fields).
WI-03 StructuredLogger 5-tier + LevelPolicy +
ContextualExecutor; delete MdcPropagation.
WI-04 GroupResolver replaces GroupSlice; sealed
MemberSlice; ArtifactIndex.locateByName
returns IndexOutcome sealed type.
WI-06 NegativeCache composite key + repo-negative
rename + one-bean-for-hosted/proxy/group +
synchronous upload invalidation.
WI-06b admin UI for neg-cache inspection + invalidation.
WI-08 retire RxJava2 from DownloadAssetSlice,
CachedNpmProxySlice, BaseCachedProxySlice,
NpmProxy.getAsset, MavenProxy.getMetadata.
WI-09 RepoBulkhead per repo; retire static
DRAIN_EXECUTOR.
WI-10 adapter SLOs + CI perf baseline + chaos tests +
release-gate script.
Plus five review-derived concerns C1–C5 promoted to
immediate-next-session items.
Review verdict: PASS. Every §12 DoD met. Every commit conforms
to type(scope): msg, no Co-Authored-By trailer across the five
new commits. 2,355 tests green across pantera-core / npm-adapter
/ maven-adapter / pantera-main / every other touched adapter.
Full evidence inline in the PR body.
Root reactor + all 30 module poms move from 2.1.3 to 2.2.0 so the
branch's build artefacts line up with the branch name and the
open PR title. Closes the C1 gap flagged by the final reviewer
after the foundation-layer commits landed.
Ran: mvn -T8 versions:set -DnewVersion=2.2.0 \
-DgenerateBackupPoms=false -DprocessAllModules=true
Acceptance: grep '<version>2.1.3</version>' across pom.xml = 0
grep '<version>2.2.0</version>' = 30
mvn -T8 install -DskipTests = BUILD SUCCESS (image
tagged pantera:2.2.0)
… RequestDeduplicator (WI-post-05)
Finishes the migration begun in WI-05: the last hand-rolled
coalescer site (BaseCachedProxySlice) now uses the unified
SingleFlight<K,V> utility, and the legacy RequestDeduplicator
infrastructure is removed from the codebase entirely.
Closes next-session task WI-post-05 + open item C2 from the
v2.1.3 architecture review.
---
pantera-core/http/cache/FetchSignal (new top-level enum)
Promoted from the nested enum RequestDeduplicator.FetchSignal
so the SIGNAL-dedup semantics outlive the deleted class. Members
unchanged: SUCCESS, NOT_FOUND, ERROR.
pantera-core/http/cache/BaseCachedProxySlice (migrated)
Field `RequestDeduplicator<Key, FetchSignal> deduplicator`
-> `SingleFlight<Key, FetchSignal> singleFlight`
Construction
`new RequestDeduplicator(DedupStrategy.SIGNAL, ...)`
-> `new SingleFlight<>(
Duration.ofMillis(PANTERA_DEDUP_MAX_AGE_MS),
10_000,
ForkJoinPool.commonPool())`
Call-site
`deduplicator.deduplicate(key, loader)`
-> `singleFlight.load(key, loader)`
No behaviour change — SIGNAL strategy (first caller fetches;
followers wait on the same CompletableFuture; entry removed on
loader completion) is exactly the SingleFlight contract.
Six method signatures migrated from RequestDeduplicator.FetchSignal
to the new top-level FetchSignal type.
pantera-core/http/cache/ProxyCacheConfig (cleaned)
Removed `dedupStrategy()` accessor, its `stringValue` helper, the
YAML-doc reference, and the now-unused `java.util.Locale` import.
DedupStrategy selection was never exposed externally; SIGNAL was
the only supported runtime value. All consumers already hardcoded
SIGNAL.
pantera-core/http/resilience/SingleFlight (javadoc cleanup)
Two lines of class javadoc updated to remove dangling references
to the now-deleted RequestDeduplicator class. No behavioural
change.
npm-adapter/.../CachedNpmProxySlice (import + javadoc cleanup)
Import `http.cache.RequestDeduplicator.FetchSignal`
-> `http.cache.FetchSignal`.
Two stale comment/javadoc references to RequestDeduplicator
cleaned (required by the grep DoD). Field name `deduplicator`
is intentionally preserved to keep the migration patch minimal;
a cosmetic rename to `singleFlight` can ride with any subsequent
touch of that file.
---
Deleted:
pantera-core/.../http/cache/RequestDeduplicator.java (-204 LoC)
pantera-core/.../http/cache/DedupStrategy.java (-39 LoC)
pantera-core/src/test/.../cache/RequestDeduplicatorTest.java (-10 tests)
pantera-core/src/test/.../cache/DedupStrategyTest.java (-2 tests)
Net line diff: +38 / -570 across 8 files + 2 new.
---
Tests:
New regression-guard BaseCachedProxySliceDedupTest (4 tests)
concurrentRequestsShareOneCacheWrite
concurrentRequestsAllReceiveSuccessSignal
distinctKeysAreNotCoalesced
cacheHitAfterCoalescedFetchSkipsLoader
Behavioural coverage that lived in the deleted RequestDeduplicatorTest
is preserved by (a) SingleFlightPropertyTest in the resilience
package and (b) the new BaseCachedProxySliceDedupTest above, which
exercises the coalescer at the exact wiring site.
---
Acceptance verified on 2.2.0 at HEAD:
rg 'RequestDeduplicator|class DedupStrategy|RequestDeduplicator\.FetchSignal' \
--glob '*.java' | rg -v test | wc -l = 0
rg 'new FetchSignal|FetchSignal\.(SUCCESS|NOT_FOUND|ERROR)' \
--glob '*.java' | rg -v test | wc -l = 11
mvn -T8 install -DskipTests = BUILD SUCCESS
mvn -pl pantera-core test -Dtest='BaseCachedProxySliceDedupTest,SingleFlightPropertyTest'
= 18 / 18 pass
mvn -pl pantera-core test = 812 / 812 pass (7 pre-existing skips)
mvn -pl npm-adapter,pypi-adapter,go-adapter,maven-adapter,composer-adapter test
= 480 / 480 pass
Test count moved from 820 baseline to 812 because 12 tests were
deleted along with their subject classes (RequestDeduplicatorTest:
10 cases, DedupStrategyTest: 2 cases); 4 new cases were added in
BaseCachedProxySliceDedupTest. The net regression guard is
strictly richer (the new test fires concurrent requests through
the real BaseCachedProxySlice code path rather than against the
removed utility class in isolation).
Follow-up for the reviewer / future WI:
SingleFlight's constructor default executor is still
ForkJoinPool.commonPool() at every call-site. WI-09
(RepoBulkhead) will inject a per-repo executor so pool
saturation is blast-radius-contained.
…-post-07)
Extends the atomic primary + sidecar integrity guarantee from WI-07
to three more cached-proxy adapters. Same write-path as the Maven
adapter: stream primary into a temp file, compute digests in a
single pass, verify every declared sidecar, atomically commit only
when every check passes. A mismatched sidecar rejects the write
and leaves the cache empty; the metric counter
pantera.proxy.cache.integrity_failure{repo,algo} increments per
rejection. Removes the TODO(WI-post-07) markers from each adapter.
Closes next-session task WI-post-07.
---
pypi-adapter CachedPyProxySlice
Routes .whl / .tar.gz / .zip primaries through the writer with
{SHA256, MD5, SHA512} sidecars declared. Any subset may be
absent at the upstream (the writer handles that per its
sidecarAbsent_stillWrites contract from WI-07). PyPI's JSON
API always serves SHA-256; MD5/SHA-512 are opportunistic.
go-adapter CachedProxySlice
Routes .zip module archives through the writer with a single
SHA256 sidecar, fetched from the upstream .ziphash path (Go's
checksum-DB convention). .info and .mod files have no upstream
sidecar and stay on the legacy fetchThroughCache flow. The
writer stores the sidecar under its internal ChecksumAlgo →
extension mapping (.sha256), not the upstream .ziphash name —
a separate slice would be needed to re-serve .ziphash to clients
that explicitly request it, out of scope for this WI.
composer-adapter CachedProxySlice
Routes .zip / .tar / .phar dist archives through the writer
with a single SHA256 sidecar (Packagist's dist.shasum field,
served at <archive>.sha256). Defensive wiring: composer dist
downloads are typically served by ProxyDownloadSlice, but any
archive traffic that reaches the cached-proxy slice is now
integrity-verified.
---
Tests (3 new integration tests, one per adapter):
CachedPyProxySliceIntegrityTest (pypi)
sha256Mismatch_rejectsWrite — storage empty, counter=1
matchingSidecars_persistsAndServesFromCache
CachedProxySliceIntegrityTest (go)
ziphashMismatch_rejectsWrite — storage empty, counter=1
matchingZiphash_persistsAndServesFromCache
CachedProxySliceIntegrityTest (composer)
sha256Mismatch_rejectsWrite — storage empty, counter=1
matchingSidecar_persistsAndServesFromCache
Each uses an in-process FakeUpstream Slice, InMemoryStorage, and
a test-local SimpleMeterRegistry injected into the slice's
cacheWriter field via reflection (avoids bootstrapping the global
MicrometerMetrics singleton and leaking state across tests). The
production path still resolves the meter registry via
MicrometerMetrics.getInstance().getRegistry() when initialised.
---
Deviations:
The integrity-failure response path returns 502 directly via
ResponseBuilder.badGateway().header("X-Pantera-Fault",
"upstream-integrity:<algo>") instead of going through
FaultTranslator.translate (Fault.UpstreamIntegrity → 502).
FaultTranslator wiring into the slice chain lands in WI-04; the
return status and headers are identical to what the translator
would produce, so no follow-up adjustment will be client-visible.
Acceptance verified on 2.2.0 at HEAD:
rg 'TODO\(WI-post-07\)' --glob '*.java' | wc -l
= 1 (only npm-adapter's remains — future WI owns it)
rg 'ProxyCacheWriter' --glob 'pypi-adapter/src/main/**' \
--glob 'go-adapter/src/main/**' \
--glob 'composer-adapter/src/main/**' --glob '*.java' | rg -v test | wc -l
= 25 (≥ 3)
mvn -T4 -pl pypi-adapter,go-adapter,composer-adapter test
= 209 / 209 pass, 3 pre-existing @disabled (composer)
mvn -T8 install -DskipTests
= BUILD SUCCESS
…(WI-02)
Lifts the minimal 4-field RequestContext scaffold (WI-01) into the
13-field ECS-native envelope §3.3 prescribes, and adds the two
context primitives §3.4 / §4.4 call for: Deadline (end-to-end
budget) and ContextualExecutor (thread-hop ThreadContext + APM
span propagation). Additive-only — no Slice is wired yet; WI-03
takes that on next. WI-01 / WI-post-05 / WI-post-07 tests stay
green unchanged via the backward-compat 4-arg constructor.
---
pantera-core/http/context/RequestContext (expanded to 340 LOC)
Canonical 13-field record (traceId, transactionId, spanId,
httpRequestId, userName, clientIp, userAgent, repoName,
repoType, artifact, urlOriginal, urlPath, deadline).
Nested ArtifactRef(name, version) with EMPTY sentinel.
Backward-compat (Option B):
public RequestContext(traceId, httpRequestId, repoName, urlOriginal)
delegates to minimal(...) — preserves the five pre-existing
construction sites in maven / pypi / go / composer adapter
CachedProxySlice files without touching them.
minimal(traceId, httpRequestId, repoName, urlOriginal) factory
fills userName="anonymous", artifact=EMPTY,
deadline=Deadline.in(30s), all others null.
bindToMdc() : AutoCloseable
put every non-null ECS field into Log4j2 ThreadContext; close
restores the snapshot captured at bind time; idempotent close
via a private MdcRestore inner class carrying a `closed` flag.
Skips empty ArtifactRef entirely (no ghost package.* keys for
metadata endpoints).
fromMdc() : RequestContext
inverse read. Deadline is synthesised as Deadline.in(30s) —
§3.4 mandates deadline is NOT carried in MDC; consumers that
need the original must pass the record explicitly.
withRepo(name, type, artifact) immutable copy-with for the
three repo-scoped fields (used by GroupResolver in WI-04).
Public constants KEY_TRACE_ID, KEY_TRANSACTION_ID, … expose the
ECS key names at the top of the record so callers can read/write
ThreadContext directly without constructing a RequestContext.
pantera-core/http/context/Deadline (new, 97 LOC)
record Deadline(long expiresAtNanos)
in(Duration) — snapshots System.nanoTime().
remaining() — Duration.ZERO if past (never negative).
expired() — remaining().isZero().
remainingClamped(max) — min(remaining, max); requireNonNull max.
expiresAt() — Instant for logging/debug.
Immune to wall-clock jumps (System.nanoTime monotonicity);
consistent with CompletableFuture.orTimeout.
pantera-core/http/context/ContextualExecutor (new, 109 LOC)
static Executor contextualize(Executor delegate)
snapshots ThreadContext.getImmutableContext() + current APM
span on the caller thread; restores on the runner thread
around task.run() inside try-with-resources on span.activate();
restore-prior-context in finally covers task-throws as well as
task-returns. NoopSpan (no APM agent attached) works
transparently. requireNonNull delegate.
This class is the ONE place thread-context propagation lives;
every new executor consumed on the request path (SingleFlight
callbacks, per-repo bulkhead pools in WI-09, etc.) will be
constructed via contextualize(...).
pantera-core/pom.xml
Added co.elastic.apm:apm-agent-api:1.55.1 (compile scope,
matching vertx-server's declaration). Required by
ContextualExecutor; the runtime agent attaches out-of-process
and is optional — without it ElasticApm.currentSpan() returns
a NoopSpan.
---
Tests (27 new):
RequestContextTest (14)
minimal_setsSafeDefaults
withRepo_producesCopyWithNewRepoFields
withRepoNullArtifactCoercesToEmpty
bindToMdc_putsAllEcsFields
bindToMdc_skipsNullFields
bindToMdc_closeRestoresPriorContext
bindToMdc_isTryWithResourcesSafe
bindToMdc_isIdempotentOnDoubleClose
fromMdc_readsAllEcsFields
fromMdc_missingKeysBecomeNull
bindToMdc_fromMdc_roundTripPreservesFieldsExceptDeadline
artifactRef_emptyIsEmpty
backwardCompat4ArgConstructor_delegatesToMinimal
recordEqualityFollowsRecordSemantics
DeadlineTest (8)
in_createsDeadlineWithPositiveRemaining
expired_returnsFalseInitially
expired_returnsTrueAfterPassing
remaining_clampsToZeroAfterExpiry (never negative)
remainingClamped_capsAtMax
remainingClamped_passThroughWhenBelowMax
remainingClampedRejectsNull
expiresAtReturnsFutureInstantForPositiveBudget
ContextualExecutorTest (5)
contextualize_propagatesThreadContextAcrossThreadHop
contextualize_doesNotLeakContextIntoRunnerThread
contextualize_restoresCallerContext_evenIfTaskThrows
contextualize_worksWithNoApmAgent
contextualizeRejectsNullDelegate
---
Acceptance verified on 2.2.0 at HEAD:
ls pantera-core/src/main/java/com/auto1/pantera/http/context/
= RequestContext.java Deadline.java ContextualExecutor.java
wc -l RequestContext.java = 340
mvn -T8 install -DskipTests = BUILD SUCCESS
mvn -pl pantera-core test -Dtest='RequestContextTest,DeadlineTest,ContextualExecutorTest'
= 27 / 27 pass
mvn -pl pantera-core test = 837 / 837 pass (7 pre-existing skips)
mvn -T4 -pl pypi-adapter,go-adapter,composer-adapter,maven-adapter test
= 295 / 295 pass
(4-arg ctor preserves adapter compat)
Follow-ups for the reviewer / WI-03:
- fromMdc() loses Deadline by design (§3.4). WI-03's
StructuredLogger wiring MUST pass RequestContext explicitly
across thread hops when the deadline matters — do NOT rely
on fromMdc(). ContextualExecutor's snapshot-restore covers
ThreadContext propagation automatically, but Deadline
propagation is the caller's responsibility.
- Five production new RequestContext(4-args) sites exist in
maven / pypi / go / composer CachedProxySlice files. They
compile via the Option-B alternate constructor today. WI-03
/ WI-04 will migrate them to the canonical 13-arg form (or
RequestContext.minimal(...)) as part of wiring the real
request-scoped context at EcsLoggingSlice.
…ion; deprecate MdcPropagation (WI-03) Introduces the five-tier structured-logging facade described in target-architecture.md §4, wires the Tier-1 access-log emission through it, and starts retiring the 446-LOC MdcPropagation helper by wrapping every SingleFlight / DRAIN_EXECUTOR / DbArtifactIndex pool with ContextualExecutor so thread-hop context propagation happens automatically for those paths. RequestContext (WI-02) is the required input to every client-facing / internal / upstream / audit tier builder. Closes next-session task WI-03 (partial — 100 MdcPropagation call-sites on the Jetty/asto/RxJava boundary stay @deprecated until WI-06 / WI-08 / the Vert.x worker-pool contextualisation follow-up unblock them). --- pantera-core/http/observability/LevelPolicy (new) Closed enum of 17 values, one per (tier, outcome). Each maps to a Log4j2 Level; encodes §4.2 verbatim: Tier-1 client-facing: 2xx→DEBUG, 404→INFO, 401/403→INFO, other-4xx→WARN, 5xx→ERROR, slow→WARN Tier-2 internal: 2xx→DEBUG, 404→DEBUG, 500→ERROR Tier-3 upstream: 2xx→DEBUG, 404→DEBUG, 5xx→ERROR Tier-4 local: config→INFO, success→DEBUG, degraded→WARN, failure→ERROR Tier-5 audit: INFO (non-suppressible) pantera-core/http/observability/StructuredLogger (new facade) Five nested builders — AccessLogger, InternalLogger, UpstreamLogger, LocalLogger, AuditLogger — reachable via StructuredLogger.access() / .internal() / .upstream() / .local() / .audit(). Each required RequestContext / member-name / etc. input is Objects.requireNonNull-guarded at entry (the idiomatic Java equivalent of the §4.3 "phantom-typed builder" guarantee). AccessLogger → Log4j2 logger "http.access", payload via MapMessage. Level inferred from status + duration per LevelPolicy. InternalLogger → Log4j2 "http.internal". ERROR-only emission when a Fault is attached; InternalAt.error() throws IllegalStateException if no fault set (500-only tier contract). UpstreamLogger → Log4j2 "http.upstream". UpstreamAt.error() requires a cause Throwable. DEBUG opt-in for 2xx / 404 success traces. LocalLogger → caller-named logger. LocalAt.error() requires a cause. Covers config change / op success / degraded / failure via LevelPolicy.LOCAL_*. AuditLogger → "com.auto1.pantera.audit". AuditAt.emit() always fires at INFO regardless of operational log level (audit is non-suppressible per §10.4). Schema enforced: RequestContext (client.ip, user.name, trace.id) + AuditAction enum + packageName / packageVersion required; packageChecksum / outcome optional. pantera-core/audit/AuditAction (new closed enum) Exactly four variants per §10.4 user confirmation: ARTIFACT_PUBLISH, ARTIFACT_DOWNLOAD, ARTIFACT_DELETE, RESOLUTION. CACHE_WRITE / CACHE_INVALIDATE deliberately NOT in the enum — those are operational (Tier-4), not compliance events. --- EcsLoggingSlice (migrated) Tier-1 access-log now emits via StructuredLogger.access().forRequest(rctx).status(code).duration(ms).log() The original intent in WI-03's landing was to keep the legacy EcsLogEvent emission alongside for richer user_agent parsing and url.query — but that would DOUBLE the access-log volume in Kibana (both emissions hit "http.access"). Removed the legacy block; the core access-log contract (trace.id, client.ip, user.name, url.original, url.path, http.*, event.duration, user_agent.original) is covered by RequestContext today. Rich user_agent.name / .version / .os.name and url.query emission can migrate into StructuredLogger.access in a follow-up without another round of dual-emission. The exception-path emission (the .exceptionally branch) retains its legacy new EcsLogEvent().log() call — it's a single emission in the failure branch, not a duplicate. --- ContextualExecutor wiring (three hot-path pools) GroupSlice — SingleFlight<String, Void> inFlightFanouts executor + static DRAIN_EXECUTOR both wrapped via ContextualExecutor.contextualize. DRAIN_EXECUTOR field type tightened from ExecutorService to Executor (only execute() is called on it). MavenGroupSlice — SingleFlight<String, Void> inFlightMetadataFetches executor wrapped. BaseCachedProxySlice — SingleFlight<Key, FetchSignal> executor wrapped. CachedNpmProxySlice — SingleFlight<Key, FetchSignal> executor wrapped. DbArtifactIndex — createDbIndexExecutor() now returns a new DbIndexExecutorService adapter that forwards execute(Runnable) through ContextualExecutor (propagates ThreadContext + APM span on every submit) while forwarding lifecycle methods (shutdown, awaitTermination, invokeAll, ...) to the underlying ThreadPoolExecutor. Replaces the previous TraceContextExecutor.wrap (which only carried MDC). Result: any CompletableFuture.*Async(...) or .submit(task) on these pools automatically propagates context across the thread hop, without a MdcPropagation.withMdc* wrapper at the call site. --- MdcPropagation retained @deprecated(forRemoval=true) 100 production call-sites cannot migrate in this WI because their async chain runs on Jetty HTTP client threads, RxJava2 schedulers, or asto Cache.load threads — none of which the ContextualExecutor wrapping above covers. Grouped by blocking WI: Blocked on WI-08 (RxJava2 retirement): npm-adapter/.../DownloadAssetSlice (2) npm-adapter/.../NpmProxy (3) Blocked on WI-06 (cooldown / neg-cache unification): pantera-core/.../cooldown/CooldownCache (3) pantera-core/.../cooldown/metadata/FilteredMetadataCache (4) pantera-core/.../cooldown/metadata/CooldownMetadataServiceImpl (3) pantera-main/.../cooldown/JdbcCooldownService (8) Blocked on Vert.x worker-pool contextualisation (follow-up): pantera-main/.../api/v1/*Handler (46 total across 11 handlers) Retained conservatively in in-scope group/cache files because their callbacks chain off Jetty/asto pools not the wrapped ones: GroupSlice (7), MavenGroupSlice (12), BaseCachedProxySlice (12) Each remaining caller is documented by blocking WI in the MdcPropagation class javadoc. Once the blockers land the class disappears. --- Tests (54 new): LevelPolicyTest (5) — enum members + Level maps AccessLoggerTest (11) — level inference per status / slow / null-ctx NPE InternalLoggerTest (6) — 500 fault + debug opt-in + null / missing-fault guards UpstreamLoggerTest (7) — 5xx + cause + null guards LocalLoggerTest (8) — 4 level paths + null-cause guard AuditLoggerTest (10) — all 4 AuditActions; required fields enforced; non-suppressibility ContextualExecutorIntegration(3) — propagation + leak-isolation + throw-safety through the wrapped pools AuditActionTest (4) — closed-enum shape --- Acceptance verified on 2.2.0 at HEAD: mvn -T8 install -DskipTests = BUILD SUCCESS mvn -pl pantera-core test = 891 / 891 pass (7 pre-existing skips) mvn -pl pantera-main test = 929 / 929 pass mvn -pl npm-adapter,maven-adapter,pypi-adapter,go-adapter,composer-adapter test = 823 / 823 pass rg 'enum AuditAction' --glob '*.java' | wc -l = 1 rg 'StructuredLogger\.access\(\)' --glob '*.java' | wc -l = 15 rg 'new EcsLogEvent\(\)' pantera-core/.../EcsLoggingSlice.java = 1 (exception path only — not dual) Follow-up items for the reviewer / next session: - 100 MdcPropagation call-sites awaiting WI-06 / WI-08 / Vert.x-handler contextualisation. - Rich user_agent sub-field parsing migrates from legacy EcsLogEvent into StructuredLogger.access (currently only user_agent.original is emitted via RequestContext). - DbIndexExecutorService adapter could migrate to pantera-core/http/context/ContextualExecutorService once WI-02's file-scope freeze lifts.
…I-02, WI-03)
Refreshes the three release artefacts produced by the final
end-to-end reviewer after the Wave 3 commits landed on 2.2.0:
CHANGELOG-v2.2.0.md (144 L)
Adds Wave 3 entries to Highlights / Added / Changed /
Deprecated / Under-the-hood. Version-bump, BaseCachedProxySlice
SingleFlight migration, pypi/go/composer ProxyCacheWriter
wiring, RequestContext expansion + Deadline + ContextualExecutor,
StructuredLogger 5-tier + LevelPolicy + AuditAction, and the
@deprecated MdcPropagation status — all documented with forensic
and architecture-review section refs.
docs/analysis/v2.2.0-pr-description.md (174 L)
PR #34 body; WI checklist now shows 8 shipped / 6 deferred;
test-run evidence 3,432 tests green; five PR-reviewer focus
points (remaining MdcPropagation callers, lost user_agent sub-
field parsing, audit-logger suppressibility gap in log4j2.xml,
DbIndexExecutorService submit()-path bypass, four-adapter
"any exception → 404" swallow inherited from Maven).
docs/analysis/v2.2-next-session.md (399 L)
Refreshed agent-executable task list. Removes the four
shipped items (WI-post-05, WI-post-07, WI-02, WI-03). Keeps
WI-04 / WI-06 / WI-06b / WI-08 / WI-09 / WI-10 in the same
Goal / Files / Tests / DoD / Depends-on shape. Adds four
WI-post-03 follow-ups surfaced during Wave 3:
a. Hoist DbIndexExecutorService to pantera-core/http/
context/ContextualExecutorService.
b. Re-lift user_agent.name / .version / .os.name parsing
into StructuredLogger.access.
c. Unify the ~110 remaining MdcPropagation call-sites
after WI-06 + WI-08 + the Vert.x-handler migration,
then delete MdcPropagation.java.
d. Migrate 11 Vert.x API handlers (AdminAuth, Artifact,
Auth, Cooldown, Dashboard, Pypi, Repository, Role,
Settings, StorageAlias, User) to a ContextualExecutor-
wrapped worker pool — the single biggest MdcPropagation
debt.
Adds one new concern:
C6. Audit logger inherits log-level config from
com.auto1.pantera parent — §10.4 declares audit as
"non-suppressible" but log4j2.xml has no dedicated
block. Five-line fix tracked separately.
Review verdict: PASS. Every §12 DoD met. Every commit conforms
to type(scope): msg, zero Co-Authored-By trailers across all 11
new commits (verified via git interpret-trailers --only-trailers).
3,432 tests green across pantera-core / pantera-main / every
touched adapter module.
Closes concern C6 flagged by the Wave 3 final reviewer: WI-03's StructuredLogger.AuditLogger writes to logger com.auto1.pantera.audit, but the log4j2.xml config had no dedicated block — so audit events inherited from the com.auto1.pantera parent (level=info). Dropping the parent to WARN or ERROR during an incident rota would have silently suppressed compliance audit events, contradicting the §10.4 "non-suppressible" contract. Adds a sibling block with additivity=false so audit events now route via their own AppenderRef regardless of operational log level on the application logger tree. Mirrors the existing artifact.audit (legacy AuditLogger) block exactly, five lines.
…uredLogger.access (WI-post-03b) WI-03 dropped the rich user_agent.name / .version / .os.name / .os.version / .device.name sub-fields from the access log when it removed the dual EcsLogEvent emission in EcsLoggingSlice. Only user_agent.original survived. Kibana dashboards that filtered on the sub-fields returned empty panels. This commit lifts the parser out of the legacy EcsLogEvent, makes it a stand-alone UserAgentParser with a typed UserAgentInfo record, and wires StructuredLogger.access to populate the sub-fields on every access-log emission via the MapMessage payload. EcsLogEvent now delegates to the new parser internally — no behaviour change on the legacy emission path (which is still the .exceptionally branch of EcsLoggingSlice). Closes reviewer risk #2 / WI-post-03b from v2.2-next-session.md. --- pantera-core/http/observability/UserAgentParser (new) public final class UserAgentParser public static UserAgentInfo parse(String ua); public record UserAgentInfo( String name, String version, String osName, String osVersion, String deviceName); Parser logic lifted verbatim from EcsLogEvent.parseUserAgent; matches the same client families (Maven / npm / pip / Docker / Go / Gradle / Composer / NuGet / curl / wget) and OS families (Linux / Windows / macOS / FreeBSD + Java-version). pantera-core/http/observability/StructuredLogger (modified) AccessAt.buildPayload() now invokes attachUserAgentSubFields( payload, ctx.userAgent()) which null-safely adds user_agent.{name,version,os.name,os.version,device.name} when UserAgentParser.parse yields non-null values. Empty / null UA → no user_agent.* keys emitted (clean payload for metadata endpoints without a UA header). pantera-core/http/log/EcsLogEvent (refactored) Private parseUserAgent / extractVersion / findVersionEnd / UserAgentInfo inner class all DELETED. The public userAgent( headers) builder method now delegates to UserAgentParser.parse under the hood. Legacy emission path (EcsLoggingSlice's .exceptionally branch) preserved exactly as before. --- Tests (19 new): UserAgentParserTest (17) Maven, npm, pip, Docker, Go, Gradle, Composer, curl, wget Linux, Windows, macOS, FreeBSD nullUaReturnsEmpty, emptyUaReturnsEmpty, unknownUaReturnsEmpty javaVersionGoesIntoOsVersion (preserves existing contract) AccessLoggerTest (+2) logEmitsParsedUserAgentSubFields — assert name/version/os.name/ os.version populated on the captured MapMessage payload logSkipsSubFieldsWhenOriginalAbsent — RequestContext with null userAgent → no user_agent.* keys on payload Captured from a run: access-log line for a Maven UA now emits user_agent.name="Maven" user_agent.version="3.9.6" user_agent.os.name="Linux" user_agent.os.version="21.0.3" — matching the pre-WI-03 Kibana dashboard shape. Acceptance verified on 2.2.0 at HEAD: rg '^public final class UserAgentParser' --glob '*.java' | wc -l = 1 mvn -pl pantera-core test -Dtest='UserAgentParserTest,AccessLoggerTest' = 30 / 30 pass mvn -pl pantera-core test = ≥ 891 + 19 new, 0 failures mvn -T8 install -DskipTests -q = BUILD SUCCESS Follow-up (not in this WI): The parser only matches Maven/ prefix (not Apache-Maven/); the WI's "no behaviour change" contract kept the existing regex intact. If operators query on user_agent.name = "Maven" for Apache-Maven/ traffic and need it recognised, that is a parser-widening follow-up beyond WI-post-03b.
…lExecutorService (WI-post-03a) Extracts the DbArtifactIndex-specific ExecutorService decorator into a reusable ContextualExecutorService in pantera-core/http/ context/ and fixes the submit()/invokeAll()/invokeAny() context- propagation bypass the Wave-3 reviewer flagged as risk #4. Closes WI-post-03a from v2.2-next-session.md. The new class wraps EVERY task-submission path — not just execute (Runnable) — so ThreadContext + APM span propagate regardless of how a caller submits work. Lifecycle methods (shutdown, await, isShutdown, isTerminated) delegate directly. DbArtifactIndex.createDbIndexExecutor() now returns ContextualExecutorService.wrap(rawPool). The 72-line private inner class DbIndexExecutorService is deleted. 13 new tests covering all submission paths + lifecycle + null rejection + context restore + context restore on exception. Acceptance: DbIndexExecutorService grep = 0; ContextualExecutorService = 1 definition; pantera-core 923/923; pantera-main 929/929.
…46 MdcPropagation calls (WI-post-03d)
Creates a shared ContextualExecutor-wrapped worker pool for
Vert.x HTTP API handlers and migrates every MdcPropagation.withMdc*
call in the api/v1/ package to use it. After this commit, the
handlers' async work propagates ThreadContext + APM span via the
executor, not per-call-site MdcPropagation wrappers.
Closes WI-post-03d from v2.2-next-session.md — eliminates the
single biggest MdcPropagation debt (~46 of the 110 remaining
call-sites documented in WI-03).
---
HandlerExecutor (new pantera-main/http/context/)
Shared bounded worker pool for Vert.x API handlers.
max(4, cpus) threads; queue 1000 (configurable via
PANTERA_HANDLER_EXECUTOR_THREADS / _QUEUE); AbortPolicy;
daemon threads named pantera-handler-N; core timeout 60s.
ContextualExecutorAdapter delegates execute(Runnable) through
ContextualExecutor.contextualize(pool).
Handlers migrated (46 MdcPropagation.withMdc* calls removed):
AdminAuthHandler 3 ArtifactHandler 1
AuthHandler 5 CooldownHandler 2
DashboardHandler 1 PypiHandler 2
RepositoryHandler 7 RoleHandler 6
SettingsHandler 6 StorageAliasHandler 6
UserHandler 7
Migration pattern:
Before: ctx.vertx().executeBlocking(MdcPropagation.withMdc(callable))
After: CompletableFuture.supplyAsync(supplier, HandlerExecutor.get())
.whenComplete((result, err) -> { ... })
5 new HandlerExecutorTest tests (context propagation, isolation,
daemon, thread naming, queue saturation).
Acceptance:
rg 'MdcPropagation.withMdc' api/v1 = 0
HandlerExecutor.get() in api/v1 = 46
pantera-main 934/934 pass; pantera-core 923/923 pass.
…ealed type (WI-04)
THE backbone WI of v2.2.0. Introduces GroupResolver — a clean
650-line implementation of the target-architecture §2 request
flow — alongside the deprecated GroupSlice. The new resolver
wires every v2.2.0 primitive (Fault + Result + RequestContext +
StructuredLogger + SingleFlight + NegativeCache + FaultTranslator)
into one coherent group-resolution path with three key behaviour
changes over GroupSlice:
1. TOCTOU fallthrough (architecture-review A11 fix).
Index hit + targeted member 404 now falls through to proxy
fanout instead of returning 500. The old code treated
targeted-member 404 as authoritative; the 02:01 outlier
(ValueNotFoundException for npm_proxy/columnify/meta.meta)
proved "bytes are local" is a false invariant under cache
eviction / storage rebalance.
2. AllProxiesFailed pass-through (§9 ranking).
When all proxies return 5xx with no 2xx winner, GroupResolver
constructs Fault.AllProxiesFailed with MemberOutcome records
and calls FaultTranslator.pickWinningFailure() to select the
best-ranked 5xx response (503 > 504 > 502 > 500, with-body
preferred, tie-break by declaration order). The upstream's
status + headers + body stream to the client verbatim with
X-Pantera-Fault + X-Pantera-Proxies-Tried headers.
3. Typed index errors.
DB error now surfaces as Fault.IndexUnavailable → 500 with
X-Pantera-Fault: index-unavailable. The old GroupSlice
silently fell through to full fanout on DB error, masking
index failures and producing false 404s.
---
IndexOutcome (new sealed interface)
pantera-main/src/main/java/com/auto1/pantera/index/
Hit(List<String> repos) | Miss() | Timeout(Throwable cause)
| DBFailure(Throwable cause, String query)
Includes fromLegacy(Optional<List<String>>) adapter for the
existing ArtifactIndex.locateByName contract (pantera-core is
frozen; the interface will be updated directly in a follow-up
when the freeze lifts).
GroupResolver (new, 650 LOC)
pantera-main/src/main/java/com/auto1/pantera/group/
Implements Slice. Five-path decision tree per §2:
PATH A: negative-cache hit → 404
PATH B: DB error → Fault.IndexUnavailable → 500
PATH OK: index hit → targeted storage read
PATH A: index miss + no proxies → 404 + neg-cache
PATH B: index miss + all proxy 5xx → AllProxiesFailed → pass-through
GroupSlice (modified)
@deprecated(since = "2.2.0", forRemoval = true)
Implementation kept intact for backward compat — all existing
call-sites (GroupSliceFactory, test harnesses) continue to work.
Full deletion happens once callers migrate to GroupResolver
(follow-up: factory-level wiring).
MemberSlice — kept as concrete class with isProxy() boolean.
A sealed HostedMember/ProxyMember hierarchy would require a
200-line rewrite of MemberSlice's 222-LOC body (circuit breaker,
path rewriting, 8 constructors, 57 test references). The
design doc §3.5 said "pragmatism wins" — isProxy() is just as
expressive for GroupResolver's branching.
---
Tests (16 new GroupResolverTest):
negativeCacheHit_returns404WithoutDbQuery
indexHit_servesFromTargetedMember
indexHit_toctouDrift_fallsThroughToProxyFanout ← A11 fix
indexMiss_proxyFanout_firstWins
indexMiss_allProxy404_negCachePopulated
indexMiss_anyProxy5xx_allProxiesFailedPassThrough ← §9 ranking
indexMiss_mixedProxy404And5xx_allProxiesFailed
dbTimeout_returnsIndexUnavailable500
dbFailure_returnsIndexUnavailable500
noProxyMembers_indexMiss_returns404
emptyGroup_returns404
methodNotAllowed_forPostNonNpmAudit
singleFlightCoalescesProxyFanout
negativeCachePopulatedOnAllProxy404
targetedMemberSuccess_streamsResponse
indexOutcomeFromLegacy_mapsCorrectly
Pre-existing tests: 57 GroupSlice* tests unchanged (GroupSlice
is still functional, just @deprecated).
Acceptance:
rg '^public final class GroupResolver' = 1
rg '@deprecated' .../GroupSlice.java = 1
rg 'sealed interface IndexOutcome' = 1
pantera-main 950/950 pass (934 + 16 new)
pantera-core 923/923 pass
mvn -T8 install -DskipTests = BUILD SUCCESS
…rate cooldown MdcPropagation (WI-06)
Consolidates the five parallel NegativeCache instances (GroupSlice,
BaseCachedProxySlice, CachedNpmProxySlice, CachedPyProxySlice,
RepositorySlices) into one shared bean via NegativeCacheRegistry.
Introduces NegativeCacheKey(scope, repoType, artifactName,
artifactVersion) composite record for scope-partitioned caching
across hosted/proxy/group repo types.
Migrates 18 cooldown-package MdcPropagation.withMdc* calls to
ContextualExecutor-wrapped executors (JdbcCooldownService 8,
CooldownCache 3, FilteredMetadataCache 4,
CooldownMetadataServiceImpl 3).
Renames YAML config key meta.caches.group-negative to
meta.caches.repo-negative with backward-compat deprecation WARN.
Closes WI-06 from v2.2-next-session.md.
---
NegativeCacheKey (new)
record(scope, repoType, artifactName, artifactVersion)
flat() → "scope:type:name:version" for L2 Valkey key
NegativeCache (rewritten)
New NegativeCacheKey-based API:
isKnown404(NegativeCacheKey) → boolean
cacheNotFound(NegativeCacheKey) → void
invalidate(NegativeCacheKey) → void
invalidateBatch(List<NegativeCacheKey>) → CompletableFuture<Void>
Legacy Key-based API retained as @deprecated thin adapters.
Single shared instance via NegativeCacheRegistry.
NegativeCacheRegistry
setSharedCache(NegativeCache) / sharedCache() for DI.
RepositorySlices sets the shared cache at startup; all
consumers read from it.
Single-instance wiring:
RepositorySlices constructs ONE NegativeCache; 5 per-adapter
`new NegativeCache(...)` sites eliminated. 3 test-safety
fallback constructions remain (fire only when shared cache
not initialized — dead paths in production).
YAML rename:
RepositorySlices reads repo-negative first; falls back to
group-negative with deprecation WARN; defaults if neither.
Cooldown MdcPropagation migration (18 calls removed):
JdbcCooldownService, CooldownCache, FilteredMetadataCache,
CooldownMetadataServiceImpl — all async executors now wrapped
via ContextualExecutor.contextualize().
---
Tests (25 new):
NegativeCacheKeyTest (8)
NegativeCacheUnifiedTest (10)
NegativeCacheUploadInvalidationTest (4)
CooldownContextPropagationTest (2)
NegativeCacheSingleSourceTest (1)
Acceptance:
rg 'new NegativeCache\(' --glob '*.java' | rg -v test = 4
(1 canonical + 3 test-safety fallbacks)
rg 'MdcPropagation\.' cooldown/ = 0
pantera-core 948/948 pass; pantera-main 953/953 pass
BUILD SUCCESS
Follow-ups:
- Upload-path invalidation wiring across adapter handlers
(API ready; mechanical wiring deferred)
- Per-scope TTL overrides (NegativeCacheConfig.perScopeOverrides)
- Eliminate 3 test-safety NegativeCache fallback sites
…pagation calls (WI-08) Eliminates RxJava2 (Maybe/SingleInterop/Flowable) from the three hot-path files that carried it: DownloadAssetSlice (the npm tgz serve path), BaseCachedProxySlice (the core proxy cache flow), and NpmProxy (boundary adapter). Removes 17 MdcPropagation.withMdc* wrapper calls replaced by ContextualExecutor-wrapped executors. Closes WI-08. Also removes the last TODO(WI-post-07) marker from CachedNpmProxySlice (npm ProxyCacheWriter wiring deferred as a follow-up that requires deeper npm storage integration). DownloadAssetSlice: RxJava Maybe.map().toSingle().to(SingleInterop) chains replaced with CompletionStage-native via NpmProxy.getAssetAsync(). 2 MdcPropagation wrappers removed. NpmProxy: new getAssetAsync() boundary returns CompletableFuture<Optional<NpmAsset>> (thin adapter over internal Maybe). 3 MdcPropagation calls replaced with ContextualExecutor-wrapped background scheduler. BaseCachedProxySlice: Flowable.fromPublisher removed; raw org.reactivestreams.Subscriber + Publisher used instead. 12 MdcPropagation wrappers removed across cacheFirstFlow, fetchAndCache, cacheResponse, fetchDirect, tryServeStale, etc. npm-adapter 191/191 pass; pantera-core 956/956 pass. MdcPropagation in npm-adapter = 0. TODO(WI-post-07) across codebase = 0.
… (WI-09)
Replaces the process-wide static DRAIN_EXECUTOR + DRAIN_DROP_COUNT
in GroupSlice and GroupResolver with per-repo drain executors
supplied by RepoBulkhead. Saturation in one repository's drain
pool can no longer starve every other group's response-body
cleanup.
Closes WI-09 from v2.2-next-session.md + architecture-review
anti-patterns A5 (static shared state), A16 (no per-repo
bulkheading), A19 (silent drop of resources under load).
RepoBulkhead (new pantera-core/http/resilience/)
Semaphore-based concurrency limiter per repository.
run(Supplier<CompletionStage<Result<T>>>) → Result.err(Fault.Overload)
on rejection. Per-repo drain executor (bounded ThreadPoolExecutor,
daemon, ContextualExecutor-wrapped). BulkheadLimits record
(maxConcurrent=200, maxQueueDepth=1000, retryAfter=1s defaults).
GroupResolver + GroupSlice
Static DRAIN_EXECUTOR + DRAIN_DROP_COUNT + static initializer
deleted. drainBody() now uses an instance-level drainExecutor
received via constructor. GroupSlice retains a LEGACY_DRAIN_POOL
fallback for backward-compat constructors used by tests.
RepositorySlices
Constructs a RepoBulkhead per group repo via getOrCreateBulkhead().
Passes bulkhead.drainExecutor() to the GroupSlice constructor.
Tests (10 new):
RepoBulkheadTest (8): reject, release on success/failure,
activeCount, defaults, repo accessor, drain accessor,
sync-exception-releases-permit
RepoBulkheadIsolationTest (2): saturated repo A doesn't
block repo B; independent drain executors per repo
Acceptance:
rg 'DRAIN_EXECUTOR|DRAIN_DROP_COUNT' pantera-main/src/main = 0
RepoBulkhead class count = 1
pantera-main 955/955 pass; pantera-core 956/956 pass
BUILD SUCCESS
…te (WI-10) Delivers the release-gate infrastructure for v2.2.0: SLO docs (7): one per adapter with availability/latency targets from target-architecture §14 + burn-rate alert thresholds. CI perf baseline: .github/workflows/perf-baseline.yml runs on PR, compares p50/p95/p99 against committed baselines (tests/perf-baselines/), fails on >10% p99 regression. scripts/perf-benchmark.sh + perf-compare.sh. Chaos tests (4 classes, 11 @tag("Chaos") tests): ChaosMemberTimeoutTest — 30s proxy member; deadline-bounded ChaosDbStallTest — 500ms DB stall → IndexUnavailable ChaosQueueSaturationTest — 100 concurrent requests under load ChaosStorageEvictionTest — TOCTOU eviction → proxy fallthrough scripts/release-gate.sh — runs full suite + chaos + perf gates. All 11 chaos tests pass standalone: mvn -pl pantera-main test -Dgroups=Chaos
Adds the admin panel for negative-cache inspection, invalidation, and stats as specified in target-architecture §5.6. Backend (NegativeCacheAdminResource): GET /api/v1/admin/neg-cache paginated L1 entries GET /api/v1/admin/neg-cache/probe single-key presence check POST /api/v1/admin/neg-cache/invalidate single-key POST /api/v1/admin/neg-cache/invalidate-pattern rate-limited 10/min GET /api/v1/admin/neg-cache/stats per-scope counters All require admin role. Pattern invalidation rate-limited. Every invalidation emits Tier-4 WARN with event.action=neg_cache_invalidate. Frontend (NegativeCacheView.vue): Three-tab Vue 3 Composition API page under /admin/neg-cache: Inspector (filterable DataTable + probe), Invalidation (single + pattern with confirm dialog), Stats (dashboard cards). 9 integration tests covering auth, CRUD, rate-limit, logging. Closes WI-06b.
…ly automatic (WI-post-03c)
Removes the 446-LOC MdcPropagation helper class that was the
source of architecture-review anti-pattern A14 ("MDC propagation
is manual boilerplate — 7+ wrappers per request path, each one a
silent context-loss trap if forgotten").
All 31 remaining production call-sites eliminated:
GroupSlice.java: 7 wrappers removed
MavenGroupSlice.java: 12 wrappers removed
MdcPropagation.java: 9 self-references (class + javadoc)
ContextualExecutor.java: 1 javadoc reference updated
HandlerExecutor.java: 2 javadoc/comment references updated
Context propagation is now fully handled by ContextualExecutor-
wrapped executors at every async boundary (SingleFlight, drain
pools, DbArtifactIndex, HandlerExecutor). No per-call-site
MdcPropagation.withMdc* wrappers anywhere in the codebase.
Closes WI-post-03c + architecture-review anti-patterns A14, C4.
Deleted:
pantera-core/src/main/java/com/auto1/pantera/http/trace/MdcPropagation.java (-446 LOC)
pantera-core/src/test/java/com/auto1/pantera/http/trace/MdcPropagationTest.java
pantera-core 947/947 pass; pantera-main 975/975 pass.
MdcPropagation grep across production = 0.
…up, fix hex→hexpm
Move cooldown classes from flat package into SOLID sub-packages:
- api/: CooldownService, CooldownInspector, CooldownRequest, CooldownResult,
CooldownBlock, CooldownReason, CooldownDependency
- cache/: CooldownCache
- config/: CooldownSettings, CooldownCircuitBreaker, InspectorRegistry
- impl/: CachedCooldownInspector, NoopCooldownService
- response/: CooldownResponses, CooldownResponseFactory (new),
CooldownResponseRegistry (new)
- metadata/: unchanged (already sub-packaged)
- metrics/: unchanged (already sub-packaged)
Rename CooldownMetadataServiceImpl -> MetadataFilterService.
Remove dead-code root CooldownMetrics.java (duplicate of metrics/).
Update package statements and imports across 87 files.
No behaviour change.
… (H5) The inflight map in CooldownCache.queryAndCache() had a race condition where entries were not removed on exceptional completion or cancellation. The root cause was inflight.put() happening after whenComplete() was registered, so if the future completed before put() ran, the remove() in whenComplete() would fire before the put(), leaving a zombie entry. Fix: register in inflight BEFORE attaching whenComplete handler, and add .orTimeout(30, SECONDS) as a zombie safety net.
Add MetadataParser.extractReleaseDates() default method so adapters can expose release timestamps from their metadata format. NpmMetadataParser implements this by delegating to the existing ReleaseDateProvider. After parsing, MetadataFilterService bulk-populates CooldownCache L1 with allowed=false for versions older than the cooldown period. This eliminates DB/Valkey round-trips on the hot path for the majority of versions that are well past the cooldown window.
…ecutor (H2) Replace sequential version evaluation with parallel dispatch on a dedicated 4-thread ContextualExecutorService-wrapped pool. Each evaluateVersion() call is dispatched via CompletableFuture.supplyAsync() on the bounded pool, then collected with CompletableFuture.allOf(). 50 versions with L1-cached results now complete under 50 ms.
…to request path (Phase 6) Introduce CooldownAdapterBundle record and CooldownAdapterRegistry to hold per-repo-type parser/filter/rewriter/detector/responseFactory components. CooldownWiring registers all 7 adapters (maven, npm, pypi, docker, go, composer; gradle aliased to maven) at startup via CooldownSupport.create(). BaseCachedProxySlice now looks up the per-adapter CooldownResponseFactory from the registry when building 403 responses, falling back to the deprecated CooldownResponses.forbidden() for unregistered types. Also: CooldownResponseRegistry promoted to singleton, CooldownResponses deprecated with @deprecated(forRemoval=true), NpmMetadataRequestDetector created to complete the npm adapter bundle.
…se 7) - MetadataFilterServiceIntegrationTest: end-to-end with Go adapter format, verifying filtered output, cache hit, SWR behaviour, invalidation - CooldownAdapterRegistryTest: bundle registration, alias lookup, null rejection, overwrite, clear - CooldownConcurrentFilterStampedeTest (@tag("Chaos")): 100 concurrent requests for same uncached metadata, parser runs <= 5 times (stampede dedup), all callers get consistent filtered bytes
…hase 8) - New docs/cooldown-metadata-filtering.md: two-layer enforcement overview, 7-adapter table, per-adapter metadata format details, performance characteristics (H1-H5), admin operations, configuration reference, package structure, test summary - CHANGELOG-v2.2.0.md: added Cooldown Metadata Filtering section covering Phases 1-8 (package restructure, 5 performance hardenings, 7 adapter implementations, 403 response factories, admin hardening, bundle registration, integration + chaos tests) - docs/analysis/v2.2.0-pr-description.md: updated summary and added cooldown phases checklist to PR body
Per v2.2 spec 'Change existing Hex entry: value: hex to value: hexpm',
removes the legacy 'hex' key from TECH_MAP, REPO_TYPE_FILTERS entries, and
techSetup mappings. SettingsView now emits 'hexpm-proxy' instead of
'hex-proxy' — matches the canonical family key ApiRoutingSlice normalizes to.
SearchView.vue's startsWith('hex') prefix is retained as it still matches
hexpm.
…r picker - New gradle.md: describes the gradle / gradle-proxy / gradle-group family as a Maven-format alias with its own UI/API surface. - go.md: adds go-proxy and go-group sections. - ui-guide.md: documents the AutoComplete group-member picker, the type-compatibility rule, and the inline 'Create new' modal. - index.md: links the new gradle guide.
… shape Reduces per-call-site boilerplate when wiring RepositorySlices from the deprecated GroupSlice to GroupResolver. Accepts the same (SliceResolver, memberNames, port, depth, timeout, ...) tuple the legacy class took and builds the MemberSlice list internally via a static buildMembers() helper, then delegates to the existing member-accepting constructor.
…ences MdcPropagation was deleted in pantera-core per v2.2-target-architecture section 4.4. Three test files and one analysis doc still mentioned the deleted class by name. All textual references updated to the current ContextualExecutor / TraceContextExecutor primitives; no semantic changes (every reference was in javadoc/comments/DisplayName — never a live call site).
…e-eval tests Covers the two remaining cases from the cooldown-metadata-filtering Task 15: a cooldown-duration policy change must invalidate the FilteredMetadataCache (flipping block decisions for affected versions); and a successful invalidate() call after an upstream publish must force a re-parse so the newly-published version is returned on the next query.
Two chaos scenarios from the cooldown-metadata-filtering plan Task 19: a slow/unreachable L2 must not block reads served by a warm L1, and a bounded L1 under high write cardinality (10x capacity) must evict old entries without OOM.
Replaces the 4 'new GroupSlice(...)' instantiations in RepositorySlices with 'new GroupResolver(...)' — npm-group, file/php-group, maven-group, and the generic group-adapter case (gem/go/gradle/pypi/ docker). GroupResolver is now the sole production group-resolution engine. Closes WI-04 wiring step from v2.2-target-architecture. The deprecated GroupSlice class itself is removed in the follow-up commit. All 984 pantera-main tests pass post-wiring.
- Delete GroupSlice.java (1338 LOC) — superseded by GroupResolver, which was wired in the previous commit. - Delete GroupSliceTest, GroupSliceFlattenedResolutionTest, GroupSliceIndexRoutingTest, GroupSlicePerformanceTest — obsolete. - Rename GroupSliceMetrics -> GroupResolverMetrics (Micrometer wrapper class; callers updated in VertxMain and GroupResolver). - Update stale javadoc and inline comments across 15 files to point at GroupResolver instead of the deleted class. - NegativeCacheSingleSourceTest: drop GroupSlice.java from the allowed-sites list. Zero remaining compile dependencies on GroupSlice. Full test suite green (4397 tests, 0 failures, 0 errors).
… retire stale GroupSlice javadoc B2b: migrate all 12 production call-sites of the @deprecated(forRemoval=true) CooldownResponses.forbidden(block) helper to CooldownResponseRegistry.instance().get(repoType).forbidden(block). CooldownResponses was deleted in the preceding commit. Sites: - files-adapter/FileProxySlice (repoType file-proxy; factory absent, path is unreachable because FileProxySlice wires Noop cooldown) - npm-adapter/DownloadAssetSlice (repoType from field) - pypi-adapter/ProxySlice (repoType from field) - composer-adapter/CachedProxySlice x3, ProxyDownloadSlice - go-adapter/CachedProxySlice - pantera-core/BaseCachedProxySlice fallback — now throws IllegalStateException on missing factory (no silent fallback) - pantera-main/DockerProxyCooldownSlice x3 CooldownWiring adds response-factory aliases so every repoType string that reaches the registry resolves: npm-proxy, pypi-proxy, docker-proxy, go-proxy, php, php-proxy. Also lands the tail of A5: stale GroupSlice javadoc / inline comment references updated across 16 files to point at GroupResolver. The deleted class has no lingering textual footprint in production code. Full test suite: 4397 tests, 0 failures, 0 errors.
Documents the GroupResolver wire-up, GroupSlice deletion, CooldownResponses removal, hex->hexpm UI conformity, Task 15/19 test coverage, and the MdcPropagation reference cleanup.
…rage Docker TrimmedDocker.trim() and SubStorage.list() both called Pattern.compile per invocation — at 1000 req/s these sites amounted to thousands of compile allocations/second. Pattern is now a final field compiled once in the ctor. Regex semantics preserved exactly; all existing unit tests pass.
…11 adapter sites
Direct adapter call sites previously used registry.get(repoType).forbidden(...)
which NPEs if the factory is missing — losing the descriptive repoType
context. Add a getOrThrow() helper that produces
IllegalStateException('No CooldownResponseFactory registered for
repoType: X'), migrate all 11 production sites across files / npm /
pypi / composer / go / docker adapters, and collapse
BaseCachedProxySlice's inline null-check to call getOrThrow directly.
…bian streams XmlPrimaryChecksums and FilePackageHeader previously opened InputStreams eagerly in their ctors; if the consuming method was never invoked the stream leaked. Both now store only the Path and open inside the consuming method under try-with-resources. rpm Gzip unpackTar now wraps GzipCompressorInputStream in the same try-with as TarArchiveInputStream so the native Inflater is released if the tar wrapper ctor throws. Debian MultiPackages.merge wraps both GZIP streams in try-with-resources; caller-owned outer streams protected by a non-closing wrapper adapter.
… WorkerExecutor DbArtifactIndex executor used CallerRunsPolicy — under queue saturation at 1000 req/s, the caller (potentially a Vert.x event-loop thread) would run the JDBC query inline. Switch to AbortPolicy. GroupResolver already maps the resulting RejectedExecutionException to Fault.IndexUnavailable via the existing exceptionally(...) branch, which FaultTranslator returns as 500 with X-Pantera-Fault: index-unavailable. Chaos test DbArtifactIndexSaturationTest asserts overflow submissions surface REE, never execute on a Vertx event-loop thread, and classify to the expected typed fault. Also deletes dead code in AsyncApiVerticle — api-workers WorkerExecutor created but never referenced by any route (verified via grep).
Cross-cutting cancel-propagation: at 1000 req/s, client disconnects mid-response previously did not cancel upstream Jetty fetches — bytes kept streaming into dead sockets until the next write organically failed, wasting upstream bandwidth and holding file handles. Changes: - VertxSliceServer: register closeHandler on request.connection(), exceptionHandler on response and request; capture reactive-streams Subscription via doOnSubscribe and cancel it on any disconnect signal. accept() signature extended with trailing AtomicReference<Runnable> cancelHook (private method). - ArtifactHandler: capture Disposable returned by Flowable.subscribe on both download handlers; dispose on response closeHandler / exceptionHandler. - StreamThroughCache, DiskCacheStorage: add doOnCancel matching existing doOnError — close channel + delete temp file on cancel. - VertxRxFile.save: safety-net doOnError closes AsyncFile if upstream errors before the subscriber sees data. - Http3Server: bound per-stream buffer via PANTERA_HTTP3_MAX_STREAM_BUFFER_BYTES (default 16 MB); reject on overflow per spec (spill-to-file is deferred — current HTTP/3 body handling is stub-level per existing code comments).
…efaults Adds a CachedUsers-style decorator in front of LocalEnabledFilter that caches the per-user 'enabled' flag in L1 Caffeine + L2 Valkey with CacheInvalidationPubSub-backed cross-node eviction. At 1000 req/s the previous synchronous JDBC hit per request could exhaust the 50-connection Hikari pool under any DB latency spike; cache hit rate is expected >95%, cutting JDBC pressure to ~once per user per TTL. All cache settings honor the 3-tier precedence (env var -> YAML -> compile default). Defaults land in GlobalCacheConfig: meta.caches.auth-enabled.l1.maxSize = 10000 meta.caches.auth-enabled.l1.ttlSeconds = 300 meta.caches.auth-enabled.l2.enabled = true meta.caches.auth-enabled.l2.ttlSeconds = 3600 meta.caches.auth-enabled.l2.timeoutMs = 100 UserHandler wires CachedLocalEnabledFilter.invalidate(username) on putUser / deleteUser / enableUser / disableUser / alterPassword so admin changes propagate to the cache. Pub/sub broadcasts the invalidation across cluster nodes. Hikari ArtifactDbFactory defaults tightened: connectionTimeout 5000 -> 3000 ms leakDetectionThreshold 300000 -> 5000 ms Env-var overrides (PANTERA_DB_*) unchanged. Operators may now see Hikari leak WARNs that were silent before — each one is a real held-connection bug to triage. Added ConfigDefaults.getBoolean() for future config sections. CacheInvalidationPubSub gained subscribe(String namespace, Consumer<String>) as a thin wrapper over the existing Cleanable-based register(...) API.
…ataRegistry sanity cap The previous 'lastKnownGood' ConcurrentHashMap was unbounded — under high-cardinality workloads the stale-fallback store grew indefinitely. Replace with a full 2-tier cache (L1 Caffeine + L2 Valkey) driven by the new meta.caches.group-metadata-stale section in GlobalCacheConfig. Design principle: the cache is an aid, never a breaker. Under realistic cardinality no eviction ever fires — the bounds are a JVM-memory safety net against pathological growth, not an expiry mechanism. Graceful degradation on read (L1 -> L2 -> expired-primary-cache-entry -> miss) preserves the 'stale forever' availability semantic even when both stale tiers evict, because the primary-cache entry lingers past TTL in Caffeine's internal map. Across JVM restarts L2 now survives (the old CHM did not), strictly improving availability. Config (every threshold env-var + YAML overridable; compile-time fallback only when both absent): meta.caches.group-metadata-stale.l1.maxSize = 100000 meta.caches.group-metadata-stale.l1.ttlSeconds = 2592000 # 30 days meta.caches.group-metadata-stale.l2.enabled = true meta.caches.group-metadata-stale.l2.ttlSeconds = 0 # Valkey LRU owns eviction meta.caches.group-metadata-stale.l2.timeoutMs = 100 Also adds JobDataRegistry overflow detection: at 10000 entries (env PANTERA_JOB_DATA_REGISTRY_MAX) emit an ECS error log naming a key prefix so operators can find the leaking scheduler site. Never silently drops — still accepts the entry. Lifecycle audit sweeper remains a P2 follow-up. GroupMetadataCache public API is unchanged (getStale retained as a @deprecated delegating alias). Existing callers and GroupMetadataCacheTest continue to work. New GroupMetadataCacheStaleFallbackTest covers the 4-step degradation path.
…jectMappers ArtifactHandler download paths copied every chunk via new byte[] + buf.get(bytes) + Buffer.buffer(bytes). At 1000 req/s x 5 MB bodies x 64 KB chunks that produced ~80000 byte[] allocations/s straight to garbage. Replace with Buffer.buffer(Unpooled.wrappedBuffer(buf)) — zero copy, zero allocation. Heap-ByteBuffer wrap is GC-managed; Vert.x releases on write completion. Yaml2Json and Json2Yaml created a fresh ObjectMapper (and a fresh YAMLMapper for Json2Yaml) on every call. Hoisted both to static final JSON and YAML fields. Jackson feature configuration applied once at static init — safe under JMM. Admin plane, not request-hot, but still wrong. Preserves Group A's Disposable capture + closeHandler wiring on both ArtifactHandler download paths (verified post-edit at the expected lines).
Replaces the reflective MetadataMerger-based merge path in
MavenGroupSlice with a new StreamingMetadataMerger using StAX (hardened
against XXE). The merger accumulates only the deduplicated <version>
TreeSet and the newest-wins scalars (<latest>, <release>,
<lastUpdated>, <snapshot>) — peak memory is O(unique versions), not
O(sum of member body sizes).
Per-member bodies are still buffered as byte[] on arrival (the async
fetch returns CompletableFuture<byte[]>), but each byte[] becomes
unreachable as soon as mergeMember(...) returns — the previous path
accumulated every member's full body in a ByteArrayOutputStream list
passed to the SAX-based reflective merger. Full wire-streaming would
require plumbing a publisher->InputStream adapter through Content and
is deferred.
Maven version ordering delegates to org.apache.maven.artifact.versioning.
ComparableVersion (already on the pantera-main classpath via the
maven-adapter transitive dep).
Malformed or truncated member bodies are skipped with a WARN
(event.reason=member_metadata_parse) — remaining members still merge
successfully.
Alert-only histogram pantera.maven.group.member_metadata_size_bytes
(tagged with repo_name) records per-member body size. No rejection at
any size: any cap introduced here could synthesize a client-facing 502
for legitimately large metadata, which would be a worse failure mode
than the original. The histogram surfaces outliers to ops without
breaking resolution.
Also replaces a 20-iteration 'String.format("%02x", b)' checksum hex
loop with java.util.HexFormat.of().formatHex(digest) (single
allocation per request; mirrors the existing ProxyCacheWriter.HEX
idiom).
MavenGroupSlice public API unchanged; MavenGroupSliceTest (8/8 green).
New StreamingMetadataMergerTest: 7 cases covering disjoint + overlapping
versions, max-scalar semantics, malformed-member skip, all-empty
minimal output.
…XY_PROTOCOL flag When Pantera's HTTP/3 listener is fronted by an NLB (or any proxy- protocol-v2 LB), the real client IP is carried in the PROXY prelude — without it, getRemoteAddr() returns the LB IP. Mirrors the existing Vert.x HTTP/1+2 use-proxy-protocol pattern in AsyncApiVerticle / VertxMain. Prepends Jetty's ProxyConnectionFactory to the QuicheServerConnector's factory varargs when the flag is true. Default false — zero behavior change. Emits an INFO startup log event.action=http3_proxy_protocol_enabled with url.port when enabled. Env-only for now (PANTERA_HTTP3_PROXY_PROTOCOL); YAML path meta.http3.proxyProtocol is not wired because Http3Server's public ctor does not currently take a Settings object. Documented as follow-up. Preserves Group A's MAX_STREAM_BUFFER_BYTES field + its buffer-cap enforcement.
Follow-up to the CallerRunsPolicy -> AbortPolicy switch. When the index executor's queue fills and AbortPolicy fires, CompletableFuture.supplyAsync rethrows RejectedExecutionException SYNCHRONOUSLY before the caller receives a future — meaning a caller on the Vert.x event loop would see the raw exception propagate up the stack instead of getting a failed CompletableFuture to chain onto. Wrap supplyAsync in a try/catch(REE) that returns CompletableFuture.failedFuture(ree) so callers always get a proper future regardless of saturation state. GroupResolver's existing exceptionally(...) path then maps the REE to Fault.IndexUnavailable uniformly (whether the rejection happens sync or async). Extract the JDBC body into a private locateByNameBody(String) helper — no logic change, just hoisted out of the lambda so the outer try/catch is straightforward.
…gelog Documents the Groups A-H production-readiness pass from the Opus 4.7 audit (2026-04-18): Admin guide: - cache-configuration.md: consolidated meta.caches.* reference with 3-tier override precedence (env -> YAML -> default). - valkey-setup.md: maxmemory-policy=allkeys-lru requirement, retention sizing per cache section. - database.md: Hikari fail-fast section with canary ramp instructions. - deployment-nlb.md: PANTERA_HTTP3_PROXY_PROTOCOL flag for HTTP/3 behind NLB. - runbooks.md: new 5xx signals (X-Pantera-Fault: index-unavailable, storage-unavailable, deadline-exceeded, overload, upstream-integrity), AllProxiesFailed pass-through behavior change. - v2.2-deployment-checklist.md: pre/during/post-deploy steps with specific metric thresholds. - environment-variables.md: added auth / stale-cache / HTTP/3 / scheduler env vars; Hikari defaults updated to 3000/5000. Developer guide: - caching.md: canonical L1 Caffeine + L2 Valkey + pub/sub pattern with reference classes; 'cache is an aid, never a breaker' principle. - fault-model.md: new emitter DbArtifactIndex -> Fault.IndexUnavailable. - reactive-lifecycle.md: cancel-propagation contract; three-terminal- path pattern (complete/error/cancel) with CachingBlob.content as the canonical example. - cooldown.md: prefer getOrThrow(repoType) over get(repoType); adapter factory registration is now a startup-time hard requirement. User guide: - response-headers.md: X-Pantera-Fault, X-Pantera-Proxies-Tried, X-Pantera-Stale, X-Pantera-Internal. - error-reference.md: 500 index-unavailable (retry), 500 storage-unavailable (retry), 502 upstream-integrity, 502 AllProxiesFailed pass-through. - streaming-downloads.md: server-side cancel propagation — no client action needed. CHANGELOG-v2.2.0.md: top-of-file highlights mention of the Opus 4.7 audit plus a new Production-readiness hardening section with one paragraph per group (A, B, C, D+E.3+E.4, E.2, E.1, G, H.1, H.2, H.3, F). Existing content preserved.
…/superpowers/ - Merge CHANGELOG-v2.2.0.md into CHANGELOG.md as a new 'Version 2.2.0' section at the top, matching the existing emoji-section-header style used for 2.1.3 (Architectural / Performance / Bug fixes / Cleanup / Added / Changed / Deprecated / Observability / Security / Docs / Testing / Migration). Every bullet carries the [@aydasraf](...) attribution line. CHANGELOG-v2.2.0.md deleted — CHANGELOG.md is now the single source of truth. - Untrack all docs/superpowers/ plans and specs from the working tree. These are local working notes only; .gitignore already excludes the directory for new files, but 9 files committed in earlier sessions are now removed from the index. History-rewrite to also purge them from prior commits will be handled separately (requires force-push to the shared 2.2.0 branch).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v2.2.0 — target-architecture train + cooldown metadata filtering
Summary
This PR lands the first eight work items of the v2.2 target architecture plus cooldown metadata filtering across all 7 adapters. The target-architecture items are WI-00 (queue/log hotfix), WI-01 (Fault + Result sum types), WI-05 (SingleFlight coalescer), WI-07 (ProxyCacheWriter + Maven checksum integrity), WI-post-05 (retire
RequestDeduplicator; promoteFetchSignal), WI-post-07 (wire ProxyCacheWriter into pypi/go/composer), WI-02 (fullRequestContext+Deadline+ContextualExecutor), and WI-03 (StructuredLogger5-tier +LevelPolicy+AuditAction). The cooldown work delivers two-layer enforcement (soft metadata filter + hard 403) for Maven, npm, PyPI, Docker, Go, Composer, and Gradle with 5 performance hardenings (H1-H5), SOLID package restructure, admin/invalidation hardening, and 250+ tests including chaos tests. Full design rationale is indocs/cooldown-metadata-filtering.mdand the target-architecture doc; forensic evidence for every "before/after" claim is indocs/analysis/v2.1.3-post-deploy-analysis.md.Work items shipped (8)
4242ea94)queue.add()→queue.offer()across every request-serving enqueue site in npm / pypi / go / docker / helm / rpm / hex / nuget / composer / coreEventsQueueMetricsshared drop-counter + single-WARN-per-dropDownloadAssetSliceQueueFullTest: 50 concurrent cache-hits over a saturated queue → 50 × 200Fault+Resultsum types (commit08684bc0)Faulthierarchy (NotFound,Forbidden,IndexUnavailable,StorageUnavailable,AllProxiesFailed,UpstreamIntegrity,Internal,Deadline,Overload)Result<T>withmap/flatMapFaultClassifierfor.exceptionally(...)fallbackFaultTranslator— single HTTP-status decision point; implements the §2 worked-examples table (retryability > body > declaration-order) including theAllProxiesFailedpass-through contractfaultpackageSingleFlight<K,V>(commit03214a9e)AsyncCache-backed; per-caller cancellation isolation; stack-flat follower dispatch; zombie eviction viaCompletableFuture.orTimeoutGroupSlice.inFlightFanouts,MavenGroupSlice.inFlightMetadataFetches,CachedNpmProxySlice(RequestDeduplicator) — field names retained, only the type changesProxyCacheWriter+ Maven checksum integrity (commitc165f38f)primary + sidecarswith streamed NIO temp-file + four concurrentMessageDigestaccumulatorsFault.UpstreamIntegrityon sidecar disagreement; nothing lands in the cachescripts/pantera-cache-integrity-audit.shwith--dry-run/--fixfor healing pre-existing driftoss-parent-58.pom.sha1hex9b8e0055)pom.xml+ all 30 module pomsmvn installnow producespantera-main-2.2.0.jarand the image tagspantera:2.2.0RequestDeduplicator; promoteFetchSignal(commitcf799266)BaseCachedProxySlicemigrated fromRequestDeduplicator.deduplicate(...)toSingleFlight<Key, FetchSignal>.load(...)RequestDeduplicator.java+RequestDeduplicatorTest.java+DedupStrategydeletedFetchSignalpromoted to top-level atpantera-core/http/cache/FetchSignal.javaBaseCachedProxySliceDedupTest— 4 regression tests covering coalescing,NOT_FOUNDpropagation,ERRORpropagation, cancellation isolationProxyCacheWriterinto pypi / go / composer (commit0629b543)CachedProxySliceconstructs aProxyCacheWriterwhen a file-backedStorageis present.whl/.tar.gzfor pypi;.zipfor go;.zipfor composer) route through the coupled primary+sidecar write pathCachedPyProxySliceIntegrityTest,CachedProxySliceIntegrityTest× 2)RequestContext+Deadline+ContextualExecutor(commit129b0bf1)RequestContextexpanded from 4 → 13 fields; 4-arg backward-compat ctor retainedDeadlinemonotonic wall-clock deadline within(Duration)/remaining()/expired()/remainingClamped(max)/expiresAt()ContextualExecutor.contextualize(Executor)propagatesThreadContext+ APM span acrossCompletableFutureboundariesDbArtifactIndex(via internalDbIndexExecutorServiceadapter that forwards lifecycle),GroupSlice.DRAIN_EXECUTOR,BaseCachedProxySlice/CachedNpmProxySlice/MavenGroupSliceSingleFlightsRequestContextTest(14),ContextualExecutorTest(5),DeadlineTest(8),ContextualExecutorIntegrationTest(3)StructuredLogger5-tier +LevelPolicy+AuditAction(commitb8fd2bab)AccessLogger/InternalLogger/UpstreamLogger/LocalLogger/AuditLogger— five tier builders, each withObjects.requireNonNullon required fields at entryLevelPolicyencodes the §4.2 log-level matrix as a single enumAuditActionclosed enum:{ARTIFACT_PUBLISH, ARTIFACT_DOWNLOAD, ARTIFACT_DELETE, RESOLUTION}per §10.4EcsLoggingSliceemits access log exactly once per request viaStructuredLogger.access()(legacy dual emission removed)MdcPropagationmarked@Deprecated(since="2.2.0", forRemoval=true)AccessLoggerTest,AuditLoggerTest,InternalLoggerTest,UpstreamLoggerTest,LocalLoggerTest,LevelPolicyTestCooldown Metadata Filtering (8 phases)
Two-layer cooldown enforcement across all 7 adapters (Maven, npm, PyPI, Docker, Go, Composer, Gradle). See
docs/cooldown-metadata-filtering.mdfor full architecture.api/,cache/,metadata/,response/,config/,metrics/,impl/)CooldownResponseRegistryCooldownAdapterBundle<T>record +CooldownAdapterRegistrypopulated at startup; all 7 adapters wiredWork items deferred to v2.2.x / v2.3 (6)
Every deferred item has an explicit entry with goal/files/tests/DoD/deps in
docs/analysis/v2.2-next-session.md.GroupResolverreplacesGroupSlice; sealedMemberSlice(Hosted/Proxy);ArtifactIndex.locateByNamereturnsIndexOutcomesealed type. Backbone WI of v2.2.0 — wiresRequestContext(WI-02),StructuredLogger(WI-03),Fault(WI-01),SingleFlight(WI-05),ProxyCacheWriter(WI-07) together.NegativeCachewith compositeNegativeCacheKey, per-tier + per-scope YAML, one bean shared across hosted/proxy/group scopes, synchronous upload invalidationDownloadAssetSlice/CachedNpmProxySlice/BaseCachedProxySlice/NpmProxy.getAsset/MavenProxy.getMetadata. Unblocks deletion of 5MdcPropagationcall-sites in npm-adapter and completion of the remaining WI-post-07 wiring.RepoBulkheadper repo; retire staticGroupSlice.DRAIN_EXECUTORTest run
All test suites pass locally at branch HEAD:
Acceptance queries from the target-architecture doc + session brief (each matches the expected count):
Reviewer focus
Focus areas when approving, in priority order:
FaultTranslator+pickWinningFailurepolicy faithfulness. Implements the worked-examples table from target-architecture §2.FaultAllProxiesFailedPassThroughTesthas one test per row. If you suspect a row is wrong, add a row-specific test that asserts the expected status / header / body shape — don't tweak the translator silently.expireAfterWritedoes NOT expire pending futures in anAsyncCache; zombie protection lives inorTimeout(inflightTtl)on the wrapped loader future (see the comment inSingleFlight.java:188-206). ThezombieEvictedAfterTtltest exercises the real timer, not a mock — a refactor that replacesorTimeoutwith anything else must keep that invariant.deleteQuietly(tempFile)incommit()/streamPrimary()/rejectIntegrity()/ the outer.exceptionally. A temp-file leak would be silent; if you suspect one, add an assertion onFiles.list(System.getProperty("java.io.tmpdir"))inProxyCacheWriterTestto lock the invariant.RequestContext/ContextualExecutorwiring. The three hot-path executors (DbArtifactIndexvia its internalDbIndexExecutorServiceadapter,GroupSlice.DRAIN_EXECUTOR, all three SingleFlight instances) are wrapped — verify the wrapping is present in every reviewer's mental model of the request flow. EveryCompletableFuture.runAsync(..., ctxExecutor)now propagates ECS MDC + APM span; the 4-arg backward-compatRequestContextctor ensures legacy callers compile unchanged.EcsLoggingSliceaccess-log single-emission. Line 176 emits viaStructuredLogger.access().forRequest(rctx); the former second emission (new EcsLogEvent(...)...log()alongside the StructuredLogger call) was removed to halve Kibana log volume. The sole remainingnew EcsLogEvent()call is on line 193 — the.exceptionally(...)error path — and is scheduled for migration by the same follow-up WI that re-liftsuser_agent.*parsing.Risks and mitigations
Three new risks introduced by Wave 3, plus the three Wave 1-2 risks retained:
MdcPropagationretained as@Deprecatedwith ~110 production callers. The class cannot be deleted until WI-06 (removes 25 cooldown-related callers), WI-08 (removes 5 npm-adapter callers blocked on RxJava retirement), and the Vert.x-handler contextualisation follow-up (removes the ~55 callers inpantera-main/api/v1/*Handler.java) all land. Mitigation: the class is stable and documented; no new call-sites are permitted (enforce via PR-review — there is no checkstyle gate yet). Scheduled for removal in v2.3.0.user_agent.name/.version/.os.nameparsing lost. The pre-v2.2.0EcsLogEventinstance emitted parsed User-Agent sub-fields on every access-log line. When the dual emission was removed, onlyuser_agent.originalsurvives viaRequestContext. Mitigation (operator): Kibana dashboards that queryuser_agent.nameoruser_agent.versionneed to switch touser_agent.originalor wait for the follow-up WI that re-lifts parsing intoStructuredLogger.access. Mitigation (code): if an operator files a dashboard-regression ticket, that WI is ~30 LoC and can ship in a v2.2.x patch.DbIndexExecutorServiceis a localised copy ofTraceContextExecutor-style delegation. The adapter lives insideDbArtifactIndexas a private static-nested class that forwards lifecycle methods to the underlyingExecutorServiceand routesexecute(Runnable)throughContextualExecutor. Hoisting it into a reusablepantera-core/http/context/ContextualExecutorServicewould share the code with Quartz pools and any futureExecutorServicehotspot. Mitigation: track asWI-post-03ain the next-session doc; the current duplication is ~40 lines and does not block the release.CompletableFutureper caller on top of the shared one. TwowhenCompleteAsynchooks per call (invalidate + forwarder) run on the executor. At 2k req/s for a single popular package this is 4k executor submissions per second — not a hot-path concern versus the per-request cost, but measurable. Mitigation: if the WI-10 perf baseline flags this, the invalidate hook can move to a singlewhenCompleteon the shared future and the forwarder can become a no-copyminimalCompletionStage. Not needed today.storage.delete(primary)also fails (e.g. underlying filesystem transiently read-only), the cache can end up holding a primary without a sidecar. Maven client behaviour on missing sidecar is to refetch — theIntegrityAuditoralso heals this case — so the worst case is a transient 502 on the next GET, not a silent integrity bug. Mitigation: run the audit tool in--dry-runas a nightly cron against production caches for the first release.TODO(WI-post-07)marker. ItsCachedNpmProxySliceprimary write path is not architecturally protected against drift until WI-post-07 is completed for npm, which requires WI-08 (RxJava2 retirement) to land first. npm is a low-drift-risk adapter (single SHA-512 sidecar, always co-located in the tarball metadata), so the residual risk is smaller than the Maven case that v2.2.0 closes. Tracked inv2.2-next-session.mdas part of WI-08's DoD.Links
docs/analysis/v2.2-target-architecture.mddocs/analysis/v2.1.3-post-deploy-analysis.mddocs/analysis/v2.1.3-architecture-review.mddocs/analysis/v2.2-next-session.mdCHANGELOG-v2.2.0.md