feat: restore git mirrors from S3 snapshots on cold start#170
Conversation
Allows clients to request shallow snapshots at a specific git depth
(e.g., /git/{repo}/snapshot.tar.zst?depth=100). This produces much
smaller snapshots for large repositories — a depth-100 snapshot of a
multi-GB repo is typically under 1GB compressed, versus 13GB+ for a
full snapshot.
The depth parameter controls the git clone depth used when generating
the snapshot. Full snapshots (no depth parameter) continue to work as
before. Each requested depth gets its own cache key and periodic refresh
job, so snapshots stay fresh without rebuilding depths that aren't used.
On first request for a given depth, the snapshot is generated on-demand
and a periodic refresh job is scheduled. Subsequent requests serve from
cache.
ad8db59 to
e224296
Compare
inez
left a comment
There was a problem hiding this comment.
Workstations seems to currently try to clone with depth=100 but here, if I understand correctly, restored snapshot (if exist) would be of depth=0?
depth 0 is full depth, which is what workstations request from cachew too. before caching we were using depth-limited cloning but we get full clones now that said the depth stuff was a vestige of a prior approach we didnt decide to move forward with (supporting depth-limited snapshots) so i just removed it entirely |
e224296 to
71364e3
Compare
| // tryRestoreSnapshot attempts to restore a mirror from an S3 snapshot. Returns | ||
| // true if the restore succeeded and the repo is ready to serve. | ||
| func (s *Strategy) tryRestoreSnapshot(ctx context.Context, repo *gitclone.Repository) bool { | ||
| if s.cache == nil { |
There was a problem hiding this comment.
no this was a guard amp added to get tests to succeed. its updating with a test cache instead
71364e3 to
a068c87
Compare
Cold-starting pods (new/restarted/scaled) previously had to run a full git clone --mirror while proxying all requests to GitHub, which takes minutes for large repos. S3 snapshots already exist (created periodically by warm pods) but were never used during cold start. startClone() now attempts snapshot.Restore() from the tiered cache before falling back to git clone --mirror. On success, a catch-up fetch is scheduled via the job scheduler to cover any staleness. Changes: - Add Repository.MarkRestored() to transition StateEmpty -> StateReady after an external restore, applying configureMirror and registerMaintenance (matching Clone's behavior). - Add Strategy.tryRestoreSnapshot() which downloads and extracts the depth-0 snapshot, then calls MarkRestored. On any failure, cleans up and returns false so startClone falls through to the existing clone path. Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019cd9c4-9869-75b1-bc83-50484949b25b
a068c87 to
f9b5bf4
Compare
Problem
Cold-starting pods (new/restarted/scaled) have no local git mirror. They proxy all requests to GitHub while running a background
git clone --mirror, which takes minutes for large repos. S3 snapshots already exist (created periodically by warm pods) but are never used during cold start.Solution
startClone()now attemptssnapshot.Restore()from the tiered cache before falling back togit clone --mirror. On success, a catch-up fetch is scheduled via the job scheduler to cover any staleness from the snapshot interval.Changes
internal/gitclone/manager.goRepository.MarkRestored(ctx)— transitionsStateEmpty → StateReadyafter an external restore. AppliesconfigureMirror(bitmap/MIDX/commit-graph/pack tuning) andregisterMaintenance, matchingClone()'s behavior and locking protocol. Reverts toStateEmptyon failure.internal/strategy/git/git.goStrategy.tryRestoreSnapshot()— downloads and extracts the depth-0 snapshot viasnapshot.Restore, then callsMarkRestored. On any failure, cleans up the path and returns false sostartClonefalls through to the existing clone path.startClone()— tries snapshot restore first. On success: cleans up spools, schedules a catch-up fetch, and schedules periodic snapshot/repack jobs. On failure: falls through to the existinggit clone --mirrorpath unchanged.internal/strategy/git/snapshot.gonolint:gosecannotations for pre-existingos.RemoveAll/os.MkdirAllcalls on controlled paths (surfaced by the newsnapshotimport).What stays the same
serveWithSpoolstill handles in-flight requests during restore