Stabilize git-upload-pack cache key by normalizing volatile fields#113
Merged
Stabilize git-upload-pack cache key by normalizing volatile fields#113
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses CI flakiness in the proxy disk cache for git-based dependency fetching (notably Nix flakes) by stabilizing cache keys for POST .../git-upload-pack requests whose bodies vary between runs.
Changes:
- Add git-upload-pack-specific body normalization for cache-key generation by hashing only extracted
wantOIDs. - Add unit tests asserting cache-key stability across differing
havelines and agent strings, and scoping normalization toPOST .../git-upload-pack.
Show a summary per file
| File | Description |
|---|---|
internal/cache/handlers.go |
Normalizes cache key body hashing for git-upload-pack POSTs using extracted/sorted want OIDs. |
internal/cache/handlers_test.go |
Adds tests covering normalized vs non-normalized cache-key behavior for git-upload-pack requests. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 2
There was a problem hiding this comment.
Copilot's findings
Comments suppressed due to low confidence (1)
internal/cache/handlers.go:90
- normalizeGitBody currently hashes only the want object IDs and drops all negotiated capabilities (beyond just the variable
agent=). In git-upload-pack, capabilities (e.g., side-band/side-band-64k, thin-pack, ofs-delta, filter=blob:none, protocol v2 features) can change the response encoding/content; collapsing different capability sets onto the same cache key can serve an incompatible cached response to a client. Consider including a normalized capability set in the hash input (e.g., keep the first want line’s capabilities but strip onlyagent=and ignorehave/done), so cache entries remain correct across clients/versions.
// normalizeGitBody extracts only the "want" object IDs from a git-upload-pack
// POST body, producing a stable hash input. The "have" lines and capability/agent
// strings vary between runs (different git versions, different local state) and
// are excluded so that the cache key remains consistent across CI runs.
func normalizeGitBody(data []byte) []byte {
matches := gitUploadPackWantRegex.FindAllSubmatch(data, -1)
if len(matches) == 0 {
return nil
}
wants := make([]string, len(matches))
for i, match := range matches {
wants[i] = string(match[1])
}
sort.Strings(wants)
return []byte(strings.Join(wants, "\n"))
- Files reviewed: 2/2 changed files
- Comments generated: 2
There was a problem hiding this comment.
Copilot's findings
Comments suppressed due to low confidence (1)
internal/cache/handlers.go:115
normalizeGitBodyusesbytes.TrimSpace, which can drop trailing/leading newlines and even completely erase an all-flush body after length stripping, increasing the chance of cache-key collisions. Since this output is only fed to SHA-256, it’s safer to avoid trimming and hash the exact normalized bytes after removing the intended fields.
func normalizeGitBody(data []byte) []byte {
normalized := gitHaveLineRegex.ReplaceAll(data, nil)
normalized = gitAgentRegex.ReplaceAll(normalized, nil)
normalized = pktLineLengthRegex.ReplaceAll(normalized, nil)
return bytes.TrimSpace(normalized)
}
- Files reviewed: 2/2 changed files
- Comments generated: 2
9258466 to
88a0085
Compare
- Introduced `pktline.go` and `pktline_test.go` for handling pkt-line format. - Added functions to parse and encode pkt-lines, supporting special packets like flush, delim, and response-end. - Created tests to validate parsing of various pkt-line scenarios including empty input, malformed data, and real-world examples from git repositories. - Moved pkt-line functionality into a dedicated package to streamline the gitproto implementation. - Implemented `upload_pack.go` and `upload_pack_test.go` to manage upload-pack requests and normalize request bodies, ensuring consistent cache keys by stripping volatile fields. - Removed legacy `pktline` package to consolidate functionality within `gitproto`.
…lization and validation
…ion and improve performance
…t to handle case-insensitivity and whitespace in Content-Type
…alization and clarity
… function to check payload prefixes
JamieMagee
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Stabilize the disk cache key for
POST .../git-upload-packso thatruns differing only in per-process metadata produce a cache hit instead of
a fresh upstream call.
Why
Nix smoke tests are flaky because nix resolves flake dependencies via the git
smart-HTTP protocol rather than a registry. Each fetch POST carries
agent=git/x.y.zand (Git 2.36+)
session-id=<uuid>— fields that drift across runs and acrossworker images. The cache key is a SHA-256 of the request body, so any drift
forces a real upstream call → rate limit → flake. The fix needs to be narrow
enough that it can't return a stale or incomplete pack.
How
A new
internal/gitprotopackage exposes two helpers:IsUploadPackRequest— gate that requires POST + path suffix/git-upload-packContent-Type: application/x-git-upload-pack-request(parsed viamime.ParseMediaTypeso RFC 7231 case/whitespace variants match).
NormalizeUploadPackBody— parses the pkt-line body, drops volatile capabilitytokens, re-encodes (recomputing length prefixes so different agent string
lengths still hash equal), and on parse failure returns the input unchanged
so callers fall back to opaque hashing.
internal/cache/handlers.goadds a single conditional inkey()that swaps thehash input — never the outbound request body — when the gate matches.
What is normalized:
agent=andsession-id=pkt-lines (v2 capability section)agent=…andsession-id=…tokens on v1 want linesWhat is preserved (anything that can change the upstream response):
want,have, capabilities,command=,deepen/shallow/filter,ref-prefix,object-format, all framing packetshavelines are deliberately preserved: they drive object negotiation, socollapsing them across requests with different client object sets could serve
an incomplete pack.
Tests
internal/gitproto: pkt-line parser/encoder including malformed inputs and around-trip; v1 and v2 normalization including agent length drift, session-id
drift, ref-prefix preservation, prefix-vs-substring discrimination.
internal/cache: integration tests that the gate fires for real upload-packPOSTs, that haves do not collapse, that malformed bodies fall back to raw
hashing, and that lookalike non-git POSTs are not normalized.
Verified locally with
go test -race -shuffle=on -count=2 ./....