-
Notifications
You must be signed in to change notification settings - Fork 461
Make ready pack files available during prefetch #1888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ready pack files available during prefetch #1888
Conversation
When prefetching the commit graph for the first time with `trust-pack-indexes=false`, the first pack file is the largest and takes the longest to index. In my test case, it takes ~2 minutes to fetch all the pack files, ~13 minutes to index the first pack file, and less than 1 minute to index the rest. A previous pull request parallelized the pack file indexing. This pull request makes it so as soon as the smaller, more recent pack files are ready they are moved into the live pack folder and can be accessed by the user's operations. A marker file is created when a pack file is moved into the live pack folder ahead of an older pack file still indexing. If the prefetch is interrupted, the next prefetch will ignore pack files which have the marker when determining the timestamp for the prefetch, which ensures that any missing pack files from the interrupted prefetch are redownloaded. The marker file is deleted once all the previous pack files are ready. After a completed prefetch, any markers (and the pack and index files they mark) left over from previous incomplete prefetches are deleted.
GVFS/GVFS.Common/Git/GitObjects.cs
Outdated
|
|
||
| this.fileSystem.TryDeleteFile(idxPath, metadataKey: nameof(idxPath), metadata: metadata); | ||
| this.fileSystem.TryDeleteFile(packPath, metadataKey: nameof(packPath), metadata: metadata); | ||
| this.fileSystem.TryDeleteFile(markerPath, metadataKey: nameof(markerPath), metadata: metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the return value of TryDeleteFile for the pack and index file before deleting the marker file?
If the delete of the pack & index were to fail, but the delete of the marker succeeded, I think we might be at risk of TryGetMaxGoodPrefetchTimestamp finding the wrong timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to check the deletion of the pack before deleting the index and marker, since deleting index or marker but not pack will have logic consequences but deleting the pack but not the others will just keep taking some file space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated again to delete index first per Johannes' advice
| this.fileSystem.TryDeleteFile(markerFilePath); | ||
| throw ex; | ||
| } | ||
| await previousPackTask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm my understanding, the await previousPackTask; will ensure that we always delete the marker files from oldest to newest? In other words, we'd never have the situation where we delete the marker file for a new pack file while the oldest is still be processed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
| this.fileSystem.MoveAndOverwriteFile(packTempPath, finalPackPath); | ||
| this.fileSystem.MoveAndOverwriteFile(idxTempPath, finalIdxPath); | ||
| this.fileSystem.MoveAndOverwriteFile(sourcePackPath, targetPackPath); | ||
| this.fileSystem.MoveAndOverwriteFile(sourceIdxPath, targetIdxPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slightly safer way would be to delete the target .idx file first, then overwrite the target .pack file, then move the .idx file.
This would plug a (probably tiny) race condition: Git assumes that the presence of an .idx file guarantees the presence of a corresponding .pack file. If both exist, but the .pack file does not match the .idx file, I don't think that Git will realize that, and will try to read bogus slices from the .pack file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
GVFS/GVFS.Common/Git/GitObjects.cs
Outdated
| this.fileSystem.TryDeleteFile(sourceIdxPath, metadataKey: nameof(sourceIdxPath), metadata: metadata); | ||
| this.fileSystem.TryDeleteFile(targetIdxPath, metadataKey: nameof(targetIdxPath), metadata: metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of reducing race conditions, maybe delete the target .idx file first? That will prevent Git from trying to access the associated packfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
7050895 to
40f5b49
Compare
dscho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my feedback!
When prefetching the commit graph for the first time with
trust-pack-indexes=false, the first pack file is the largest and takes the longest to index. In my test case it takes ~2 minutes to fetch all the pack files, ~13 minutes to index the first pack file, and less than 1 minute to index the rest.A previous pull request parallelized the pack file indexing. This pull request makes it so as soon as the smaller, more recent pack files are ready they are moved into the live pack folder and can be accessed by the user's operations.
A marker file is created when a pack file is moved into the live pack folder ahead of an older pack file still indexing. If the prefetch is interrupted, the next prefetch will ignore pack files which have the marker when determining the timestamp for the prefetch, which ensures that any missing pack files from the interrupted prefetch are redownloaded. The marker file is deleted once all the previous pack files are ready.
After a completed prefetch, any markers (and the pack and index files they mark) left over from previous incomplete prefetches are deleted.