fix: stale refs after snapshot restore and server restart#180
Conversation
Two codepaths could serve stale git refs: 1. After snapshot restore: MarkRestored() set the repo to StateReady immediately, but the catch-up fetch was only scheduled asynchronously. First clients would get stale refs from the snapshot. 2. After server restart with existing repos on disk: DiscoverExisting() set repos to StateReady with no fetch at all. Repos sat stale until the first request triggered an async maybeBackgroundFetch. Fix both by running a synchronous fetch before the repo starts serving: - In startClone: fetch synchronously after snapshot restore instead of scheduling an async backgroundFetch - In strategy init: fetch each discovered repo at startup before accepting requests Added diagnostic logging of ref SHAs before and after each fetch to aid debugging. Co-authored-by: Claude Code <noreply@anthropic.com> Ai-assisted: true
| slog.String("error", err.Error())) | ||
| } | ||
| for _, repo := range existing { | ||
| logger.InfoContext(ctx, "Running startup fetch for existing repo", |
There was a problem hiding this comment.
any idea what impact this'll have on startup times for cachew?
(they already seem to be really long from what I can see)
There was a problem hiding this comment.
Frankly I don't know, but I think it's critical to have this functionality (open to different implementation) as currently we would be (or actually are) just serving a stale content.
| slog.String("error", err.Error()), | ||
| slog.Duration("duration", time.Since(start))) | ||
| } else { | ||
| logger.InfoContext(ctx, "Startup fetch completed for existing repo", |
There was a problem hiding this comment.
I'm wondering if the Info logging will be a bit too verbose for the happy case paths.
There was a problem hiding this comment.
Having it was actually very helpful in discovering that there are two different code paths here. Given that this is not used much, and this logging was and might be very helpful I'm inclined to keep it till it becomes too annoying (once cachew is stable and widely adopted).
| slog.String("upstream", repo.UpstreamURL()), | ||
| slog.String("error", err.Error())) | ||
| } else { | ||
| logger.InfoContext(ctx, "Post-fetch refs for existing repo", |
There was a problem hiding this comment.
This definitely shouldn't be at info
Two codepaths could serve stale git refs: 1. After snapshot restore: MarkRestored() set the repo to StateReady immediately, but the catch-up fetch was only scheduled asynchronously. First clients would get stale refs from the snapshot. 2. After server restart with existing repos on disk: DiscoverExisting() set repos to StateReady with no fetch at all. Repos sat stale until the first request triggered an async maybeBackgroundFetch. Fix both by running a synchronous fetch before the repo starts serving: - In startClone: fetch synchronously after snapshot restore instead of scheduling an async backgroundFetch - In strategy init: fetch each discovered repo at startup before accepting requests Added diagnostic logging of ref SHAs before and after each fetch to aid debugging. Co-authored-by: Claude Code <noreply@anthropic.com>
Two codepaths could serve stale git refs:
After snapshot restore: MarkRestored() set the repo to StateReady immediately, but the catch-up fetch was only scheduled asynchronously. First clients would get stale refs from the snapshot.
After server restart with existing repos on disk: DiscoverExisting() set repos to StateReady with no fetch at all. Repos sat stale until the first request triggered an async maybeBackgroundFetch.
Fix both by running a synchronous fetch before the repo starts serving:
Added diagnostic logging of ref SHAs before and after each fetch to aid debugging.