feat: sparse writes with range_upload, zero-download write path#41
Open
feat: sparse writes with range_upload, zero-download write path#41
Conversation
POSIX Compliance (pjdfstest) |
Benchmark Results |
Sparse staging: open for write creates a sparse file (set_len) instead of downloading the original CAS content. Dirty byte ranges are tracked in SparseWriteState and only modified regions are uploaded via range_upload. Key changes: - Sparse staging file on open (no CAS download) - SparseWriteState tracks dirty ranges with O(log n) merge - fill_sparse_holes reads CAS data on demand for read-after-write - flush_generation counter prevents stale flush from clearing dirty state - Rename re-enqueues dirty files for flush at new path - setattr truncate/grow handled via clip_to_size + gap tracking - write past original_size automatically tracks gap as dirty - file.metadata() guard against concurrent truncate vs write race New xet-core API (DirtyInput with AsyncRead per range): - range_upload builds DirtyInput per dirty range from staging file - upload_ranges handles truncation boundary from CAS directly - No download needed for any write/truncate path Testing: - 245 unit tests (47 new for sparse writes, flush races, edge cases) - fsx: 50k random ops (staging) + 100 paranoid CAS round-trip ops - xfstests: generic/quick suite with FUSE patches (167 pass) - pjdfstest: 8789 POSIX syscall tests - Integration smoke tests: mid-file edit, append, truncate, multi-write, large file (512KB) CAS round-trip
NFS handle pool opens handles read-only for reads. When a WRITE RPC arrives for an existing CAS file, the handle lacks a staging file and VFS rejects with EBADF. Fix: try write with existing handle, on EBADF evict it and reopen writable (creating sparse staging).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the full-download write path with sparse staging files and dirty range tracking. When opening an existing file for write, we create a sparse staging file (
set_len()) instead of downloading CAS content. Only the dirty byte ranges are tracked and uploaded.How it works
pwrite()to staging file,SparseWriteState::track_write()records dirty ranges (O(log n) merge with binary search)pread()from staging,fill_sparse_holes()downloads CAS data on demand for non-dirty regionsclip_to_size()trims dirty ranges on shrink,track_write()marks extension as dirty on growrange_uploadcomposes a new CAS file from stable prefix/suffix + re-chunked dirty regions viaupload_ranges()in xet-coreFor a 200MB file with a 1KB mid-file edit, this downloads 0 bytes on open and uploads only the dirty chunks (not 200MB).
Key components
SparseWriteState(inode.rs): tracksoriginal_hash,original_size, sorteddirty_rangesvecfill_sparse_holes(mod.rs): downloads CAS data for sparse holes in read bufferrange_upload(xet.rs): buildsDirtyInputvec from dirty ranges, each with its own async readerFlushEntry/FlushSuccessstructs (flush.rs): replace raw tuples for flush pipelineflush_generationcounter: prevents stale flush from clearing dirty state after concurrent writes/renamesTests
Dependencies
adrien/combined-hf-mount(PR #717) which addsupload_ranges()API for composing files from CAS prefix/suffix + dirty regions