Skip to content

fix: resolve git fetch/pull returning stale refs#192

Merged
worstell merged 3 commits intomainfrom
fix-git-fetch-stale-refs
Mar 16, 2026
Merged

fix: resolve git fetch/pull returning stale refs#192
worstell merged 3 commits intomainfrom
fix-git-fetch-stale-refs

Conversation

@worstell
Copy link
Copy Markdown
Contributor

@worstell worstell commented Mar 16, 2026

Problem

Workstations running git fetch / git pull through cachew either return stale data or hang.

Root cause 1 — stale refs not forwarded: When ls-remote detects the mirror is behind upstream, the request was still served from the local mirror with stale refs. The background fetch meant to freshen the mirror called backgroundFetch, which re-checks NeedsFetch(15m) and silently skips if a fetch completed recently (e.g. post-restore on pod startup). The mirror never catches up.

Root cause 2 — repack blocking: git repack -adb recompresses ALL objects into a single pack. On large repositories this runs 15+ minutes, during which git http-backend (upload-pack) hangs because pack files are being rewritten underneath it.

Fix

  1. Forward info/refs to upstream when refs are stale. When ls-remote confirms the mirror is behind, the info/refs request is forwarded directly to upstream GitHub so the client gets fresh data immediately. A background fetch is scheduled to freshen the mirror for subsequent requests. The fetch bypasses the NeedsFetch(15m) time guard since ls-remote already confirmed staleness. If the subsequent upload-pack hits the mirror before the fetch completes, the existing "not our ref" fallback forwards to upstream.

  2. Switch to geometric repacking (--geometric=2 -d) instead of full repack (-adb). Geometric repacking only merges packs when small packs accumulate — near-instant in steady state vs 15+ minutes for full repack. A 10-minute timeout with process group kill is retained as a safety net.

  3. Keep synchronous fetch for snapshot requests. Workstation creation needs fresh data to generate correct snapshots, so the fetch remains synchronous on that path.

@worstell worstell requested a review from a team as a code owner March 16, 2026 21:21
@worstell worstell requested review from joshfriend and removed request for a team March 16, 2026 21:21
Comment on lines -698 to -699
r.mu.RLock()
defer r.mu.RUnlock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why locking is removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git handles its own file locking during repack so this wasn't needed

cmd := exec.CommandContext(repackCtx, "git", "-C", r.path, "repack", "-d", "--geometric=2", "--write-midx", "--write-bitmap-index")
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
cmd.Cancel = func() error {
return syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this will be called always at the end (defer), but process might be already gone by then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancel only gets invoked while the process is still running so this shouldn't be an issue

Two fixes for git fetch/pull not working from workstations:

1. ensureRefsUpToDate now fetches directly when ls-remote confirms stale
   refs, bypassing the NeedsFetch time guard that was silently skipping
   the fetch even after staleness was detected.

2. Switch from full repack (-adb) to geometric repacking (--geometric=2).
   Full repack recompresses ALL objects and ran 15+ minutes on large
   repositories, blocking git http-backend. Geometric repacking only
   merges when small packs accumulate, completing near-instantly in steady
   state. A 10-minute timeout with process group kill is added as a safety
   net.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019cf844-5ed1-74ae-bc57-27d068380f2a
@worstell worstell requested a review from inez March 16, 2026 21:42
@worstell worstell force-pushed the fix-git-fetch-stale-refs branch from 1e6e75a to eb8cf2a Compare March 16, 2026 21:43
submitFetch is called when ls-remote has already confirmed the mirror
is stale. It was calling backgroundFetch which re-checks NeedsFetch(15m)
and silently drops the fetch if one happened recently (e.g. post-restore).
This caused the mirror to never catch up, making every request hit the
upstream fallback path.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019cf8a0-d586-7109-9e69-9cb7d9b27f6c
Comment thread internal/strategy/git/backend.go Outdated
if err := s.backgroundFetch(ctx, repo); err != nil {
logger.WarnContext(ctx, "Synchronous fetch failed", "upstream", repo.UpstreamURL(), "error", err)
}
logging.FromContext(ctx).WarnContext(ctx, "Failed to check upstream refs", "error", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intentionally swallowing the error here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be returned, updated

@worstell worstell enabled auto-merge (squash) March 16, 2026 22:58
@worstell worstell merged commit 1d1a5ed into main Mar 16, 2026
6 checks passed
@worstell worstell deleted the fix-git-fetch-stale-refs branch March 16, 2026 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants