Draft
Conversation
Previously, staging files created by advanced writes persisted on disk indefinitely, even after being successfully flushed to remote. This could fill the disk in long-running mounts (e.g. HuggingFace Spaces). Add staging file GC at three points: - flush_batch post-commit: clean inodes get staging files removed - flush_batch filter: stale re-enqueued inodes (fsync case) cleaned - release(): last handle close on a clean inode triggers cleanup Safety mechanisms to prevent races: - has_open_handles check skips GC for NFS long-lived handles - staging_lock serialization prevents races with open_advanced_write - dirty re-check after lock acquisition catches concurrent writers - open_readonly fallback handles GC racing with read-only opens Also adds StagingDir::try_remove() to deduplicate the remove-file ignore-NotFound pattern across 4 call sites.
POSIX Compliance (pjdfstest) |
Benchmark Results |
341a164 to
038f46b
Compare
When set, staging files are garbage-collected after flush only when disk usage exceeds the limit. When under the limit (or 0/unlimited), staging files persist as a local read-after-write cache. This preserves fast read-after-write performance for general use while letting Spaces operators cap disk usage (e.g. --max-staging-size 50G). - Add --max-staging-size CLI flag (default: 0 = unlimited = no GC) - Track staging bytes via AtomicU64 on StagingDir (add on write/open, subtract on remove, stat before delete for accuracy) - GC in flush_batch and release() gated on is_over_limit()
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StagingDir::try_remove()to deduplicate the remove-file/ignore-NotFound pattern