Skip to content

feat: garbage-collect staging files after flush#82

Draft
XciD wants to merge 2 commits intomainfrom
feat/staging-gc
Draft

feat: garbage-collect staging files after flush#82
XciD wants to merge 2 commits intomainfrom
feat/staging-gc

Conversation

@XciD
Copy link
Copy Markdown
Member

@XciD XciD commented Mar 27, 2026

Summary

  • Staging files created by advanced writes now get cleaned up after successful flush, preventing disk fill in long-running mounts (e.g. HuggingFace Spaces)
  • GC happens at three points: post-commit in flush_batch, stale-signal filter in flush_batch, and on last handle close in release()
  • Safety: has_open_handles skips NFS long-lived handles, staging_lock serializes with open_advanced_write, dirty re-check after lock catches concurrent writers
  • Adds StagingDir::try_remove() to deduplicate the remove-file/ignore-NotFound pattern
  • open_readonly handles GC racing with read-only opens (falls back to remote CAS)

Previously, staging files created by advanced writes persisted on disk
indefinitely, even after being successfully flushed to remote. This
could fill the disk in long-running mounts (e.g. HuggingFace Spaces).

Add staging file GC at three points:
- flush_batch post-commit: clean inodes get staging files removed
- flush_batch filter: stale re-enqueued inodes (fsync case) cleaned
- release(): last handle close on a clean inode triggers cleanup

Safety mechanisms to prevent races:
- has_open_handles check skips GC for NFS long-lived handles
- staging_lock serialization prevents races with open_advanced_write
- dirty re-check after lock acquisition catches concurrent writers
- open_readonly fallback handles GC racing with read-only opens

Also adds StagingDir::try_remove() to deduplicate the remove-file
ignore-NotFound pattern across 4 call sites.
@github-actions
Copy link
Copy Markdown

POSIX Compliance (pjdfstest)

============================================================
  pjdfstest POSIX Compliance Results
------------------------------------------------------------
  Files: 130/130 passed    Tests: 832 total (0 subtests failed)
  Result: PASS
------------------------------------------------------------
  Category               Passed    Total   Status
  -------------------- -------- -------- --------
  chflags                     5        5       OK
  chmod                       8        8       OK
  chown                       6        6       OK
  ftruncate                  13       13       OK
  granular                    5        5       OK
  mkdir                       9        9       OK
  open                       19       19       OK
  posix_fallocate             1        1       OK
  rename                     10       10       OK
  rmdir                      11       11       OK
  symlink                    10       10       OK
  truncate                   13       13       OK
  unlink                     11       11       OK
  utimensat                   9        9       OK
============================================================

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 27, 2026

Benchmark Results

============================================================
  Benchmark — 50MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                    267.3 MB/s     256.2 MB/s
  Sequential re-read                1444.5 MB/s    2319.3 MB/s
  Range read (1MB@25MB)               31.3 ms         0.2 ms
  Random reads (100x4KB avg)          35.1 ms         0.0 ms
  Sequential write (FUSE)           1357.7 MB/s
  Close latency (CAS+Hub)            0.090 s
  Write end-to-end                   393.7 MB/s
  Dedup write                       1633.9 MB/s
  Dedup close latency                0.261 s
  Dedup end-to-end                   171.7 MB/s
============================================================
============================================================
  Benchmark — 200MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                   1009.2 MB/s    1018.1 MB/s
  Sequential re-read                1706.0 MB/s    2369.4 MB/s
  Range read (1MB@25MB)               35.1 ms         0.2 ms
  Random reads (100x4KB avg)          34.2 ms         0.0 ms
  Sequential write (FUSE)           1526.0 MB/s
  Close latency (CAS+Hub)            0.713 s
  Write end-to-end                   237.1 MB/s
  Dedup write                       1594.6 MB/s
  Dedup close latency                0.086 s
  Dedup end-to-end                   946.0 MB/s
============================================================
============================================================
  Benchmark — 500MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                   1579.8 MB/s    1452.7 MB/s
  Sequential re-read                1723.2 MB/s    2489.4 MB/s
  Range read (1MB@25MB)               29.0 ms         0.2 ms
  Random reads (100x4KB avg)          33.1 ms         0.0 ms
  Sequential write (FUSE)           1374.2 MB/s
  Close latency (CAS+Hub)            0.171 s
  Write end-to-end                   935.5 MB/s
  Dedup write                       1392.7 MB/s
  Dedup close latency                2.396 s
  Dedup end-to-end                   181.5 MB/s
============================================================
============================================================
  fio Benchmark Results
------------------------------------------------------------
  Job                        FUSE MB/s   NFS MB/s  FUSE IOPS   NFS IOPS
  ------------------------- ---------- ---------- ---------- ----------
  seq-read-100M                  392.2      353.4                      
  seq-reread-100M               2272.7        7.0                      
  rand-read-4k-100M                0.1        0.1         19         15
  seq-read-5x10M                 769.2      714.3                      
  rand-read-10x1M                  0.1        0.1         31         37
  Random Read Latency           FUSE avg      NFS avg
  ------------------------- ------------ ------------
  rand-read-4k-100M           53612.2 us   66981.3 us
  rand-read-10x1M             32513.5 us   27232.6 us
============================================================

@XciD XciD force-pushed the feat/staging-gc branch 2 times, most recently from 341a164 to 038f46b Compare March 27, 2026 19:56
When set, staging files are garbage-collected after flush only when
disk usage exceeds the limit. When under the limit (or 0/unlimited),
staging files persist as a local read-after-write cache.

This preserves fast read-after-write performance for general use while
letting Spaces operators cap disk usage (e.g. --max-staging-size 50G).

- Add --max-staging-size CLI flag (default: 0 = unlimited = no GC)
- Track staging bytes via AtomicU64 on StagingDir (add on write/open,
  subtract on remove, stat before delete for accuracy)
- GC in flush_batch and release() gated on is_over_limit()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant