feat: fio benchmark infrastructure mirroring mountpoint-s3 by XciD · Pull Request #25 · huggingface/hf-mount

XciD · 2026-03-08T13:16:18Z

Summary

fio benchmark infrastructure mirroring mountpoint-s3's scripts/fs_bench.sh
All 29 fio jobs from mountpoint-s3: read (17), write (2), mix (3), read_latency (2), write_latency (1), create (4)
Unified scripts/fs_bench.sh supporting multiple mount modes via env vars:
- HF_MOUNT_BACKEND=fuse|nfs
- HF_NO_DISK_CACHE=1 (comparable to mountpoint-s3 without --cache)
- HF_ADVANCED_WRITES=1
- HF_JOB_NAME_FILTER=small for quick CI runs
- HF_CATEGORIES=read,write to run specific categories
Results stored per mode in results/<mode>/output.json
Removed old bench tests (tests/bench.rs, tests/fio_bench.rs) superseded by fio
Added NFS variant of pjdfstest (test_pjdfstest_nfs)
Removed outdated doc/BENCHMARKING.md

github-actions · 2026-03-08T13:20:57Z

Benchmark Results (fio small, FUSE)

Job	Throughput
random_read_four_threads_direct_io_small_file	10538 MiB/s
random_read_four_threads_small_file	5028 MiB/s
random_read_direct_io_small_file	1978 MiB/s
random_read_small_file	747 MiB/s
sequential_read_four_threads_direct_io_small_file	12182 MiB/s
sequential_read_four_threads_small_file	9163 MiB/s
sequential_read_direct_io_small_file	1837 MiB/s
sequential_read_small_file	2067 MiB/s
time_to_first_byte_read_small_file	15 milliseconds

github-actions · 2026-03-08T13:21:28Z

POSIX Compliance (pjdfstest)

============================================================
  pjdfstest POSIX Compliance Results
------------------------------------------------------------
  Files: 130/130 passed    Tests: 832 total (0 subtests failed)
  Result: PASS
------------------------------------------------------------
  Category               Passed    Total   Status
  -------------------- -------- -------- --------
  chflags                     5        5       OK
  chmod                       8        8       OK
  chown                       6        6       OK
  ftruncate                  13       13       OK
  granular                    5        5       OK
  mkdir                       9        9       OK
  open                       19       19       OK
  posix_fallocate             1        1       OK
  rename                     10       10       OK
  rmdir                      11       11       OK
  symlink                    10       10       OK
  truncate                   13       13       OK
  unlink                     11       11       OK
  utimensat                   9        9       OK
============================================================
============================================================
  pjdfstest POSIX Compliance Results
------------------------------------------------------------
  Files: 130/130 passed    Tests: 832 total (0 subtests failed)
  Result: PASS
------------------------------------------------------------
  Category               Passed    Total   Status
  -------------------- -------- -------- --------
  chflags                     5        5       OK
  chmod                       8        8       OK
  chown                       6        6       OK
  ftruncate                  13       13       OK
  granular                    5        5       OK
  mkdir                       9        9       OK
  open                       19       19       OK
  posix_fallocate             1        1       OK
  rename                     10       10       OK
  rmdir                      11       11       OK
  symlink                    10       10       OK
  truncate                   13       13       OK
  unlink                     11       11       OK
  utimensat                   9        9       OK
============================================================

## Summary - Add `--direct-io` CLI flag (default: off) that sets `FOPEN_DIRECT_IO` on file open/create - When enabled, every read goes through the FUSE handler, bypassing the kernel page cache - When enabled, the prefetch buffer becomes forward-only: consumed bytes are drained after serving, preventing re-reads from hitting the buffer (must refetch from CAS) - Without the flag (default), the kernel page cache handles cross-call caching and the prefetch buffer retains data for re-reads Useful for benchmarking real CAS throughput without page cache or buffer inflation. Not recommended for production (disables efficient mmap caching for safetensors workloads). The bench script (PR #25) will use `HF_DIRECT_IO=1` to enable this during fio runs.

Add the same fio workloads and benchmark scripts as mountpoint-s3/scripts/, enabling direct throughput and latency comparisons. fio job files (scripts/fio/) are copied verbatim from mountpoint-s3: - read/: seq_read, rand_read, 4-thread, direct-IO, small-file variants - write/: seq_write, seq_write_direct - read_latency/: ttfb, ttfb_small Scripts: - scripts/fs_bench.sh: throughput benchmark (mirrors fs_bench.sh from mountpoint-s3). Creates a temp bucket, writes files write-through via hf-mount, then runs fio read and write jobs. Set HF_BENCH_BUCKET to reuse a pre-existing bucket. - scripts/fs_latency_bench.sh: TTFB latency benchmark (mirrors fs_latency_bench.sh). CI (.github/workflows/bench_perf.yml): - Triggers on push to main (publishes to gh-pages) and PRs labeled 'performance'. - Uses benchmark-action/github-action-benchmark for historical throughput and latency charts. - CI uses HF_JOB_NAME_FILTER=small and iterations=3 for fast runs; remove for full 100G comparison. doc/BENCHMARKING.md documents the workloads, how to run, and the comparison methodology.

Allows benchmarking without the on-disk xorb cache, comparable to mountpoint-s3 without --cache. Reads go directly to CAS on each FUSE miss (OS page cache still applies). The FileDownloadSession API already supported None for chunk_cache; this wires it up via a CLI flag and a HF_NO_DISK_CACHE=1 env var in the benchmark scripts.

Copy mix_1r4w, mix_2r2w, mix_4r1w from mountpoint-s3 and add run_benchmarks mix to fs_bench.sh.

Remove tests/bench.rs, tests/fio_bench.rs, and tests/common/bench.rs (superseded by scripts/fs_bench.sh fio benchmarks). Add scripts/posix_test.sh as a standalone pjdfstest runner with the same exclusion logic and regression baselines as tests/pjdfstest.rs.

Remove scripts/posix_test.sh (keeping cargo test). Rename test_pjdfstest to test_pjdfstest_fuse and add test_pjdfstest_nfs that runs the same POSIX conformance suite over NFS mount.

Add missing fio jobs from mountpoint-s3: create (100/1k/10k/100k files) and write_latency (1B file TTFB). Merge fs_latency_bench.sh into fs_bench.sh. Support FUSE/NFS backend, cache/no-cache, and advanced-writes via env vars (HF_MOUNT_BACKEND, HF_NO_DISK_CACHE, HF_ADVANCED_WRITES). Results are stored per mode in results/<mode>/.

- ci.yml bench job: use fs_bench.sh instead of removed cargo bench tests - bench_perf.yml: use unified script for both throughput and latency, fix results path (results/fuse/output.json)

Read benchmarks now write actual data through FUSE before measuring read throughput, instead of using fio --create_only=1 which only creates empty/sparse files. This ensures reads hit CAS and measure real download performance. - New run_read_category(): writes data in phase 1, reads on fresh mount in phase 2 - make_write_job(): generates a write job from a read job (strips time_based, direct, swaps rw) - read_latency also uses real data instead of create_only

Every read now goes through our FUSE handler instead of being served from the kernel page cache. This gives accurate benchmark numbers for CAS throughput and matches mountpoint-s3's behavior. Also fix fs_bench.sh: add --uid/--gid for sudo mounts, use create_on_open=1 for populate, and fail on mount errors.

…for prefilled buckets

…FS tests

…tency results

…omments

github-actions · 2026-03-14T19:09:34Z

Throughput Benchmark Results

Job	Throughput
random_read_four_threads_direct_io_small_file	11787 MiB/s
random_read_four_threads_small_file	5288 MiB/s
random_read_direct_io_small_file	1425 MiB/s
random_read_small_file	726 MiB/s
sequential_read_four_threads_direct_io_small_file	13421 MiB/s
sequential_read_four_threads_small_file	9646 MiB/s
sequential_read_direct_io_small_file	1886 MiB/s
sequential_read_small_file	1910 MiB/s

XciD force-pushed the feat/fio-bench-infra branch from f64a647 to 52ebe69 Compare March 8, 2026 19:57

XciD mentioned this pull request Mar 9, 2026

feat: add --direct-io CLI flag for FUSE page cache bypass #28

Merged

XciD force-pushed the feat/fio-bench-infra branch 3 times, most recently from 8a6ab43 to e81e8fd Compare March 9, 2026 21:21

XciD added 11 commits March 12, 2026 15:11

feat: add mixed read/write fio benchmark jobs

f1373cc

Copy mix_1r4w, mix_2r2w, mix_4r1w from mountpoint-s3 and add run_benchmarks mix to fs_bench.sh.

refactor: remove posix_test.sh, add NFS pjdfstest

a68ab30

Remove scripts/posix_test.sh (keeping cargo test). Rename test_pjdfstest to test_pjdfstest_fuse and add test_pjdfstest_nfs that runs the same POSIX conformance suite over NFS mount.

chore: remove outdated BENCHMARKING.md

9f5a76c

fix: update CI workflows for unified fs_bench.sh

0441e54

- ci.yml bench job: use fs_bench.sh instead of removed cargo bench tests - bench_perf.yml: use unified script for both throughput and latency, fix results path (results/fuse/output.json)

chore: use m5dn-24xlarge runner, increase fio timeout, skip populate …

7b97a88

…for prefilled buckets

XciD force-pushed the feat/fio-bench-infra branch from a199dd5 to 7b97a88 Compare March 12, 2026 14:11

XciD added 2 commits March 12, 2026 15:25

fix: handle stale pjdfstest directory on self-hosted runners

89530e9

fix: add file lock to prevent pjdfstest build race between FUSE and N…

84ff7ae

…FS tests

XciD added the performance Trigger performance benchmarks label Mar 12, 2026

XciD added 6 commits March 12, 2026 15:55

feat: switch CI to prod, add workflow_dispatch for full 100G benchmarks

c93a257

chore: remove redundant bench job from CI (handled by bench_perf.yml)

61a5933

refactor: remove label trigger, add job summary for throughput and la…

571ed99

…tency results

chore: also run performance benchmarks on pull requests

75dcf8a

feat: run perf benchmarks on PRs, post throughput and latency as PR c…

67ae1a5

…omments

fix: read token_file during initial auth in from_source

cea0dbe

XciD force-pushed the main branch from 700989c to 5244d3a Compare March 31, 2026 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fio benchmark infrastructure mirroring mountpoint-s3#25

feat: fio benchmark infrastructure mirroring mountpoint-s3#25
XciD wants to merge 19 commits intomainfrom
feat/fio-bench-infra

XciD commented Mar 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

XciD commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results (fio small, FUSE)

Uh oh!

github-actions bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

POSIX Compliance (pjdfstest)

Uh oh!

github-actions bot commented Mar 14, 2026

Throughput Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

XciD commented Mar 8, 2026 •

edited

Loading

github-actions bot commented Mar 8, 2026 •

edited

Loading

github-actions bot commented Mar 8, 2026 •

edited

Loading