Skip to content

Conversation

@tyrielv
Copy link
Contributor

@tyrielv tyrielv commented Dec 9, 2025

When prefetching the commit graph for the first time with trust-pack-indexes=false, the first pack file is the largest and takes the longest to index. In my test case it takes ~2 minutes to fetch all the pack files, ~13 minutes to index the first pack file, and less than 1 minute to index the rest.

A previous pull request parallelized the pack file indexing. This pull request makes it so as soon as the smaller, more recent pack files are ready they are moved into the live pack folder and can be accessed by the user's operations.

A marker file is created when a pack file is moved into the live pack folder ahead of an older pack file still indexing. If the prefetch is interrupted, the next prefetch will ignore pack files which have the marker when determining the timestamp for the prefetch, which ensures that any missing pack files from the interrupted prefetch are redownloaded. The marker file is deleted once all the previous pack files are ready.

After a completed prefetch, any markers (and the pack and index files they mark) left over from previous incomplete prefetches are deleted.

When prefetching the commit graph for the first time with
`trust-pack-indexes=false`, the first pack file is the largest and takes
the longest to index. In my test case, it takes ~2 minutes to fetch all
the pack files, ~13 minutes to index the first pack file, and less than
1 minute to index the rest.

A previous pull request parallelized the pack file indexing. This pull
request makes it so as soon as the smaller, more recent pack files are
ready they are moved into the live pack folder and can be accessed by
the user's operations.

A marker file is created when a pack file is moved into the live pack
folder ahead of an older pack file still indexing. If the prefetch is
interrupted, the next prefetch will ignore pack files which have the
marker when determining the timestamp for the prefetch, which ensures
that any missing pack files from the interrupted prefetch are
redownloaded. The marker file is deleted once all the previous pack
files are ready.

After a completed prefetch, any markers (and the pack and index files
they mark) left over from previous incomplete prefetches are deleted.

this.fileSystem.TryDeleteFile(idxPath, metadataKey: nameof(idxPath), metadata: metadata);
this.fileSystem.TryDeleteFile(packPath, metadataKey: nameof(packPath), metadata: metadata);
this.fileSystem.TryDeleteFile(markerPath, metadataKey: nameof(markerPath), metadata: metadata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the return value of TryDeleteFile for the pack and index file before deleting the marker file?

If the delete of the pack & index were to fail, but the delete of the marker succeeded, I think we might be at risk of TryGetMaxGoodPrefetchTimestamp finding the wrong timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to check the deletion of the pack before deleting the index and marker, since deleting index or marker but not pack will have logic consequences but deleting the pack but not the others will just keep taking some file space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated again to delete index first per Johannes' advice

this.fileSystem.TryDeleteFile(markerFilePath);
throw ex;
}
await previousPackTask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm my understanding, the await previousPackTask; will ensure that we always delete the marker files from oldest to newest? In other words, we'd never have the situation where we delete the marker file for a new pack file while the oldest is still be processed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Comment on lines 519 to 558
this.fileSystem.MoveAndOverwriteFile(packTempPath, finalPackPath);
this.fileSystem.MoveAndOverwriteFile(idxTempPath, finalIdxPath);
this.fileSystem.MoveAndOverwriteFile(sourcePackPath, targetPackPath);
this.fileSystem.MoveAndOverwriteFile(sourceIdxPath, targetIdxPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slightly safer way would be to delete the target .idx file first, then overwrite the target .pack file, then move the .idx file.

This would plug a (probably tiny) race condition: Git assumes that the presence of an .idx file guarantees the presence of a corresponding .pack file. If both exist, but the .pack file does not match the .idx file, I don't think that Git will realize that, and will try to read bogus slices from the .pack file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 570 to 571
this.fileSystem.TryDeleteFile(sourceIdxPath, metadataKey: nameof(sourceIdxPath), metadata: metadata);
this.fileSystem.TryDeleteFile(targetIdxPath, metadataKey: nameof(targetIdxPath), metadata: metadata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of reducing race conditions, maybe delete the target .idx file first? That will prevent Git from trying to access the associated packfile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@tyrielv tyrielv force-pushed the tyrielv/partialprefetchprogress branch from 7050895 to 40f5b49 Compare December 10, 2025 16:52
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my feedback!

@tyrielv tyrielv merged commit f877d10 into microsoft:master Dec 10, 2025
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants