Skip to content

feat: restore git mirrors from S3 snapshots on cold start#170

Merged
worstell merged 2 commits intomainfrom
worstell/snapshot-restore-cold-start
Mar 10, 2026
Merged

feat: restore git mirrors from S3 snapshots on cold start#170
worstell merged 2 commits intomainfrom
worstell/snapshot-restore-cold-start

Conversation

@worstell
Copy link
Copy Markdown
Contributor

Problem

Cold-starting pods (new/restarted/scaled) have no local git mirror. They proxy all requests to GitHub while running a background git clone --mirror, which takes minutes for large repos. S3 snapshots already exist (created periodically by warm pods) but are never used during cold start.

Solution

startClone() now attempts snapshot.Restore() from the tiered cache before falling back to git clone --mirror. On success, a catch-up fetch is scheduled via the job scheduler to cover any staleness from the snapshot interval.

Changes

internal/gitclone/manager.go

  • Add Repository.MarkRestored(ctx) — transitions StateEmpty → StateReady after an external restore. Applies configureMirror (bitmap/MIDX/commit-graph/pack tuning) and registerMaintenance, matching Clone()'s behavior and locking protocol. Reverts to StateEmpty on failure.

internal/strategy/git/git.go

  • Add Strategy.tryRestoreSnapshot() — downloads and extracts the depth-0 snapshot via snapshot.Restore, then calls MarkRestored. On any failure, cleans up the path and returns false so startClone falls through to the existing clone path.
  • Update startClone() — tries snapshot restore first. On success: cleans up spools, schedules a catch-up fetch, and schedules periodic snapshot/repack jobs. On failure: falls through to the existing git clone --mirror path unchanged.

internal/strategy/git/snapshot.go

  • Add nolint:gosec annotations for pre-existing os.RemoveAll/os.MkdirAll calls on controlled paths (surfaced by the new snapshot import).

What stays the same

  • serveWithSpool still handles in-flight requests during restore
  • Scheduler queue serialization prevents restore+clone races
  • No changes to snapshot creation, tiered cache, or warm-pod behavior
  • tar.zst format preserved (exact disk layout with bitmaps/MIDX)

Allows clients to request shallow snapshots at a specific git depth
(e.g., /git/{repo}/snapshot.tar.zst?depth=100). This produces much
smaller snapshots for large repositories — a depth-100 snapshot of a
multi-GB repo is typically under 1GB compressed, versus 13GB+ for a
full snapshot.

The depth parameter controls the git clone depth used when generating
the snapshot. Full snapshots (no depth parameter) continue to work as
before. Each requested depth gets its own cache key and periodic refresh
job, so snapshots stay fresh without rebuilding depths that aren't used.

On first request for a given depth, the snapshot is generated on-demand
and a periodic refresh job is scheduled. Subsequent requests serve from
cache.
@worstell worstell requested a review from a team as a code owner March 10, 2026 22:34
@worstell worstell requested review from jrobotham-square and removed request for a team March 10, 2026 22:34
@worstell worstell force-pushed the worstell/snapshot-restore-cold-start branch from ad8db59 to e224296 Compare March 10, 2026 22:41
Copy link
Copy Markdown
Contributor

@inez inez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workstations seems to currently try to clone with depth=100 but here, if I understand correctly, restored snapshot (if exist) would be of depth=0?

@worstell
Copy link
Copy Markdown
Contributor Author

worstell commented Mar 10, 2026

Workstations seems to currently try to clone with depth=100 but here, if I understand correctly, restored snapshot (if exist) would be of depth=0?

depth 0 is full depth, which is what workstations request from cachew too. before caching we were using depth-limited cloning but we get full clones now

that said the depth stuff was a vestige of a prior approach we didnt decide to move forward with (supporting depth-limited snapshots) so i just removed it entirely

@worstell worstell requested a review from inez March 10, 2026 22:55
@worstell worstell force-pushed the worstell/snapshot-restore-cold-start branch from e224296 to 71364e3 Compare March 10, 2026 23:01
Comment thread internal/strategy/git/git.go Outdated
// tryRestoreSnapshot attempts to restore a mirror from an S3 snapshot. Returns
// true if the restore succeeded and the repo is ready to serve.
func (s *Strategy) tryRestoreSnapshot(ctx context.Context, repo *gitclone.Repository) bool {
if s.cache == nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever be nil?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this was a guard amp added to get tests to succeed. its updating with a test cache instead

Comment thread internal/strategy/git/git.go
@worstell worstell force-pushed the worstell/snapshot-restore-cold-start branch from 71364e3 to a068c87 Compare March 10, 2026 23:08
Cold-starting pods (new/restarted/scaled) previously had to run a full
git clone --mirror while proxying all requests to GitHub, which takes
minutes for large repos. S3 snapshots already exist (created periodically
by warm pods) but were never used during cold start.

startClone() now attempts snapshot.Restore() from the tiered cache before
falling back to git clone --mirror. On success, a catch-up fetch is
scheduled via the job scheduler to cover any staleness.

Changes:
- Add Repository.MarkRestored() to transition StateEmpty -> StateReady
  after an external restore, applying configureMirror and
  registerMaintenance (matching Clone's behavior).
- Add Strategy.tryRestoreSnapshot() which downloads and extracts the
  depth-0 snapshot, then calls MarkRestored. On any failure, cleans up
  and returns false so startClone falls through to the existing clone
  path.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019cd9c4-9869-75b1-bc83-50484949b25b
@worstell worstell force-pushed the worstell/snapshot-restore-cold-start branch from a068c87 to f9b5bf4 Compare March 10, 2026 23:09
@worstell worstell enabled auto-merge (squash) March 10, 2026 23:10
@worstell worstell merged commit c805810 into main Mar 10, 2026
5 checks passed
@worstell worstell deleted the worstell/snapshot-restore-cold-start branch March 10, 2026 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants