Skip to content

feat: add --direct-io CLI flag for FUSE page cache bypass#28

Merged
XciD merged 10 commits intomainfrom
feat/direct-io-flag
Mar 9, 2026
Merged

feat: add --direct-io CLI flag for FUSE page cache bypass#28
XciD merged 10 commits intomainfrom
feat/direct-io-flag

Conversation

@XciD
Copy link
Copy Markdown
Member

@XciD XciD commented Mar 9, 2026

Summary

  • Add --direct-io CLI flag (default: off) that sets FOPEN_DIRECT_IO on file open/create
  • When enabled, every read goes through the FUSE handler, bypassing the kernel page cache
  • When enabled, the prefetch buffer becomes forward-only: consumed bytes are drained after serving, preventing re-reads from hitting the buffer (must refetch from CAS)
  • Without the flag (default), the kernel page cache handles cross-call caching and the prefetch buffer retains data for re-reads

Useful for benchmarking real CAS throughput without page cache or buffer inflation. Not recommended for production (disables efficient mmap caching for safetensors workloads).

The bench script (PR #25) will use HF_DIRECT_IO=1 to enable this during fio runs.

Add opt-in FOPEN_DIRECT_IO support via --direct-io flag. When enabled,
every read/write goes through the FUSE handler instead of being served
from the kernel page cache. Useful for benchmarking real CAS throughput.

Not recommended for production: disables efficient mmap caching, which
is critical for safetensors/transformers workloads where tensors are
memory-mapped and re-read across forward passes.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 9, 2026

POSIX Compliance (pjdfstest)

============================================================
  pjdfstest POSIX Compliance Results
------------------------------------------------------------
  Files: 130/130 passed    Tests: 832 total (0 subtests failed)
  Result: PASS
------------------------------------------------------------
  Category               Passed    Total   Status
  -------------------- -------- -------- --------
  chflags                     5        5       OK
  chmod                       8        8       OK
  chown                       6        6       OK
  ftruncate                  13       13       OK
  granular                    5        5       OK
  mkdir                       9        9       OK
  open                       19       19       OK
  posix_fallocate             1        1       OK
  rename                     10       10       OK
  rmdir                      11       11       OK
  symlink                    10       10       OK
  truncate                   13       13       OK
  unlink                     11       11       OK
  utimensat                   9        9       OK
============================================================

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 9, 2026

Benchmark Results

============================================================
  Benchmark — 50MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                    203.4 MB/s     216.8 MB/s
  Sequential re-read                1357.7 MB/s    2475.1 MB/s
  Range read (1MB@25MB)                8.1 ms         0.2 ms
  Random reads (100x4KB avg)           8.0 ms         0.0 ms
  Sequential write (FUSE)           1188.5 MB/s
  Close latency (CAS+Hub)            0.095 s
  Write end-to-end                   365.8 MB/s
  Dedup write                       1476.8 MB/s
  Dedup close latency                0.070 s
  Dedup end-to-end                   481.1 MB/s
============================================================
============================================================
  Benchmark — 200MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                   1284.7 MB/s    1107.4 MB/s
  Sequential re-read                1641.3 MB/s    2426.1 MB/s
  Range read (1MB@25MB)                8.7 ms         0.2 ms
  Random reads (100x4KB avg)           8.5 ms         0.0 ms
  Sequential write (FUSE)           1332.9 MB/s
  Close latency (CAS+Hub)            0.127 s
  Write end-to-end                   721.5 MB/s
  Dedup write                       1369.5 MB/s
  Dedup close latency                0.075 s
  Dedup end-to-end                   902.8 MB/s
============================================================
============================================================
  Benchmark — 500MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                   1705.7 MB/s    1524.7 MB/s
  Sequential re-read                1696.1 MB/s    2459.1 MB/s
  Range read (1MB@25MB)                8.9 ms         0.2 ms
  Random reads (100x4KB avg)           8.5 ms         0.0 ms
  Sequential write (FUSE)           1280.5 MB/s
  Close latency (CAS+Hub)            0.106 s
  Write end-to-end                  1007.2 MB/s
  Dedup write                       1239.6 MB/s
  Dedup close latency                0.083 s
  Dedup end-to-end                  1028.0 MB/s
============================================================
============================================================
  fio Benchmark Results
------------------------------------------------------------
  Job                        FUSE MB/s   NFS MB/s  FUSE IOPS   NFS IOPS
  ------------------------- ---------- ---------- ---------- ----------
  seq-read-100M                  440.5      324.7                      
  seq-reread-100M               2083.3     1219.5                      
  rand-read-4k-100M                0.4        0.4        109        109
  seq-read-5x10M                 943.4      531.9                      
  rand-read-10x1M                181.6       62.3      46487      15947
  Random Read Latency           FUSE avg      NFS avg
  ------------------------- ------------ ------------
  rand-read-4k-100M            9168.4 us    9144.9 us
  rand-read-10x1M                20.8 us      61.7 us
============================================================

XciD added 9 commits March 9, 2026 15:50
When direct_io is enabled, the prefetch buffer drains consumed bytes
after serving reads. Re-reads at already-consumed offsets must refetch
from CAS, preventing the buffer from acting as a read cache.

Without direct_io (default), the buffer retains data for re-reads,
which works well with the kernel page cache handling cross-call caching.
- Pass args.direct_io to VirtualFs::new (missing arg caused build failure)
- Disable seek window in forward-only mode (try_serve_seek returns None,
  drain_to_seek skips seek population) so re-reads truly refetch
- Guard drain on to_read > 0 so zero-length reads are no-ops
- open()/create(): only set FOPEN_DIRECT_IO on read-only opens to avoid
  surfacing EBADF on read-after-write with streaming write handles
- setup(): pass direct_io=false when is_nfs, since NFS has no equivalent
  and it would misleadingly enable forward-only prefetch
- open()/create(): only skip DIRECT_IO for simple streaming write handles;
  advanced-write O_RDWR handles are local files and support reads
- setup(): log warning when --direct-io is used with NFS (no equivalent)
Only O_RDWR opens in simple (non-advanced) mode need DIRECT_IO skipped
(streaming handles can't read). O_WRONLY and create() can safely use it.
Mirror the open() guard: create() with O_RDWR in simple mode produces a
streaming write-only handle, so DIRECT_IO would surface EBADF on reads.
Request FUSE_DIRECT_IO_ALLOW_MMAP in init() when --direct-io is active,
so mmap() works on files with FOPEN_DIRECT_IO (Linux 6.6+). Without
this, safetensors and other mmap-based readers fail with EINVAL.
Log a warning instead of silently proceeding when the kernel doesn't
support FUSE_DIRECT_IO_ALLOW_MMAP, so users know mmap may fail.
@XciD XciD marked this pull request as ready for review March 9, 2026 19:42
@XciD XciD merged commit 80fbce1 into main Mar 9, 2026
4 checks passed
@XciD XciD deleted the feat/direct-io-flag branch March 23, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant