Skip to content

fix: stale refs after snapshot restore and server restart#180

Merged
inez merged 1 commit intomainfrom
inez.sync.fetch
Mar 12, 2026
Merged

fix: stale refs after snapshot restore and server restart#180
inez merged 1 commit intomainfrom
inez.sync.fetch

Conversation

@inez
Copy link
Copy Markdown
Contributor

@inez inez commented Mar 12, 2026

Two codepaths could serve stale git refs:

  1. After snapshot restore: MarkRestored() set the repo to StateReady immediately, but the catch-up fetch was only scheduled asynchronously. First clients would get stale refs from the snapshot.

  2. After server restart with existing repos on disk: DiscoverExisting() set repos to StateReady with no fetch at all. Repos sat stale until the first request triggered an async maybeBackgroundFetch.

Fix both by running a synchronous fetch before the repo starts serving:

  • In startClone: fetch synchronously after snapshot restore instead of scheduling an async backgroundFetch
  • In strategy init: fetch each discovered repo at startup before accepting requests

Added diagnostic logging of ref SHAs before and after each fetch to aid debugging.

Two codepaths could serve stale git refs:

1. After snapshot restore: MarkRestored() set the repo to StateReady
   immediately, but the catch-up fetch was only scheduled asynchronously.
   First clients would get stale refs from the snapshot.

2. After server restart with existing repos on disk: DiscoverExisting()
   set repos to StateReady with no fetch at all. Repos sat stale until
   the first request triggered an async maybeBackgroundFetch.

Fix both by running a synchronous fetch before the repo starts serving:
- In startClone: fetch synchronously after snapshot restore instead of
  scheduling an async backgroundFetch
- In strategy init: fetch each discovered repo at startup before
  accepting requests

Added diagnostic logging of ref SHAs before and after each fetch to
aid debugging.

Co-authored-by: Claude Code <noreply@anthropic.com>
Ai-assisted: true
@inez inez requested a review from a team as a code owner March 12, 2026 00:15
@inez inez requested review from worstell and removed request for a team March 12, 2026 00:15
slog.String("error", err.Error()))
}
for _, repo := range existing {
logger.InfoContext(ctx, "Running startup fetch for existing repo",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea what impact this'll have on startup times for cachew?
(they already seem to be really long from what I can see)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly I don't know, but I think it's critical to have this functionality (open to different implementation) as currently we would be (or actually are) just serving a stale content.

slog.String("error", err.Error()),
slog.Duration("duration", time.Since(start)))
} else {
logger.InfoContext(ctx, "Startup fetch completed for existing repo",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the Info logging will be a bit too verbose for the happy case paths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it was actually very helpful in discovering that there are two different code paths here. Given that this is not used much, and this logging was and might be very helpful I'm inclined to keep it till it becomes too annoying (once cachew is stable and widely adopted).

@alecthomas alecthomas changed the title Fix stale refs after snapshot restore and server restart fix: stale refs after snapshot restore and server restart Mar 12, 2026
@inez inez merged commit 9cba62d into main Mar 12, 2026
6 checks passed
@inez inez deleted the inez.sync.fetch branch March 12, 2026 00:33
slog.String("upstream", repo.UpstreamURL()),
slog.String("error", err.Error()))
} else {
logger.InfoContext(ctx, "Post-fetch refs for existing repo",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely shouldn't be at info

worstell pushed a commit that referenced this pull request Mar 12, 2026
Two codepaths could serve stale git refs:

1. After snapshot restore: MarkRestored() set the repo to StateReady
immediately, but the catch-up fetch was only scheduled asynchronously.
First clients would get stale refs from the snapshot.

2. After server restart with existing repos on disk: DiscoverExisting()
set repos to StateReady with no fetch at all. Repos sat stale until the
first request triggered an async maybeBackgroundFetch.

Fix both by running a synchronous fetch before the repo starts serving:
- In startClone: fetch synchronously after snapshot restore instead of
scheduling an async backgroundFetch
- In strategy init: fetch each discovered repo at startup before
accepting requests

Added diagnostic logging of ref SHAs before and after each fetch to aid
debugging.

Co-authored-by: Claude Code <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants