fix: prevent deadlock when shutdown is called concurrently by XciD · Pull Request #89 · huggingface/hf-mount

XciD · 2026-04-01T16:34:48Z

Summary

Fix a deadlock where destroy() and the signal handler both call FlushManager::shutdown() concurrently, causing the process to hang forever and dirty files to never be uploaded.

Root cause

When a CSI-managed pod terminates, the CSI driver does fuseUnmount(source) (triggering destroy() -> flush) then Delete(pod) (SIGTERM) nearly simultaneously. The signal handler calls shutdown() which deadlocks on self.handle.lock() because destroy()'s shutdown() holds the MutexGuard across runtime.block_on(handle) (the guard stays alive inside the if let body).

Production logs show 53ms between destroy starting flush and SIGTERM arriving. Process hung indefinitely with 2 dirty files never uploaded.

Fix

Single-shot shutdown via AtomicBool: first caller runs the flush, concurrent callers return immediately
Drop MutexGuard before blocking: let handle = lock.take(); drop(lock); block_on(handle) instead of if let Some(h) = lock.take() { block_on(h) } which holds the guard
Signal handler returns instead of process::exit(1) when unmount fails, letting the normal shutdown path complete

Tests

215 unit tests pass. The deadlock was timing-dependent (requires SIGTERM during in-flight flush) and was observed in CI but not reproducible on EC2.

github-actions · 2026-04-01T18:00:23Z

POSIX Compliance (pjdfstest)

============================================================
  pjdfstest POSIX Compliance Results
------------------------------------------------------------
  Files: 130/130 passed    Tests: 832 total (0 subtests failed)
  Result: PASS
------------------------------------------------------------
  Category               Passed    Total   Status
  -------------------- -------- -------- --------
  chflags                     5        5       OK
  chmod                       8        8       OK
  chown                       6        6       OK
  ftruncate                  13       13       OK
  granular                    5        5       OK
  mkdir                       9        9       OK
  open                       19       19       OK
  posix_fallocate             1        1       OK
  rename                     10       10       OK
  rmdir                      11       11       OK
  symlink                    10       10       OK
  truncate                   13       13       OK
  unlink                     11       11       OK
  utimensat                   9        9       OK
============================================================

Two fixes for a deadlock where destroy() and the signal handler both call FlushManager::shutdown() concurrently: 1. Make shutdown() single-shot with std::sync::Once. The first caller runs the full flush. Concurrent callers block until it completes, then return without re-running. 2. In the signal handler, return instead of process::exit(1) when unmount_fuse() fails, since this means destroy() is already handling the flush via the normal shutdown path. Root cause: the CSI driver does fuseUnmount (triggering destroy/flush) then Delete(pod) (sending SIGTERM) nearly simultaneously. The signal handler's shutdown() call deadlocked on the handle mutex held by destroy()'s in-flight flush (MutexGuard stayed alive across block_on inside the if-let body). With Once, the second caller simply waits for the first to finish.

github-actions · 2026-04-01T18:05:49Z

Benchmark Results

============================================================
  Benchmark — 50MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                    197.8 MB/s     259.2 MB/s
  Sequential re-read                1529.3 MB/s    2349.1 MB/s
  Range read (1MB@25MB)               32.1 ms         0.2 ms
  Random reads (100x4KB avg)          34.5 ms         0.0 ms
  Sequential write (FUSE)           1487.2 MB/s
  Close latency (CAS+Hub)            0.139 s
  Write end-to-end                   289.7 MB/s
  Dedup write                       1745.2 MB/s
  Dedup close latency                0.077 s
  Dedup end-to-end                   472.7 MB/s
============================================================
============================================================
  Benchmark — 200MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                   1064.4 MB/s     927.6 MB/s
  Sequential re-read                1665.9 MB/s    2345.3 MB/s
  Range read (1MB@25MB)               33.6 ms         0.2 ms
  Random reads (100x4KB avg)          34.1 ms         0.0 ms
  Sequential write (FUSE)           1268.8 MB/s
  Close latency (CAS+Hub)            0.108 s
  Write end-to-end                   751.6 MB/s
  Dedup write                       1590.7 MB/s
  Dedup close latency                0.110 s
  Dedup end-to-end                   849.9 MB/s
============================================================
============================================================
  Benchmark — 500MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                   1350.2 MB/s    1455.8 MB/s
  Sequential re-read                1758.4 MB/s    2348.0 MB/s
  Range read (1MB@25MB)               31.5 ms         0.2 ms
  Random reads (100x4KB avg)          34.3 ms         0.0 ms
  Sequential write (FUSE)           1472.3 MB/s
  Close latency (CAS+Hub)            0.119 s
  Write end-to-end                  1090.2 MB/s
  Dedup write                       1474.9 MB/s
  Dedup close latency                0.124 s
  Dedup end-to-end                  1080.3 MB/s
============================================================
============================================================
  fio Benchmark Results
------------------------------------------------------------
  Job                        FUSE MB/s   NFS MB/s  FUSE IOPS   NFS IOPS
  ------------------------- ---------- ---------- ---------- ----------
  seq-read-100M                  456.6      473.9                      
  seq-reread-100M               2272.7       11.1                      
  rand-read-4k-100M                0.1        0.1         19         19
  seq-read-5x10M                 769.2      892.9                      
  rand-read-10x1M                  0.1        0.1         35         36
  Random Read Latency           FUSE avg      NFS avg
  ------------------------- ------------ ------------
  rand-read-4k-100M           53690.7 us   53240.2 us
  rand-read-10x1M             28698.0 us   27700.8 us
============================================================

Spawns two threads calling VirtualFs::shutdown() simultaneously (simulating destroy() + signal handler race). Asserts both complete within 10s. Without the Once fix, this would deadlock on the FlushManager handle mutex.

XciD force-pushed the fix/signal-race-data-loss branch from e060168 to 2a3a1c6 Compare April 1, 2026 16:42

XciD mentioned this pull request Apr 1, 2026

fix: let hf-mount handle its own shutdown to prevent flush data loss huggingface/hf-csi-driver#20

Closed

XciD force-pushed the fix/signal-race-data-loss branch from 2a3a1c6 to b826b87 Compare April 1, 2026 17:41

XciD changed the title ~~fix: prevent data loss when signal races with destroy() flush~~ fix: prevent deadlock when shutdown is called concurrently Apr 1, 2026

XciD force-pushed the fix/signal-race-data-loss branch 3 times, most recently from e01c73f to f917625 Compare April 1, 2026 17:57

XciD force-pushed the fix/signal-race-data-loss branch from f917625 to 6c2ad07 Compare April 1, 2026 18:01

test: add concurrent shutdown deadlock regression test

3f466e7

Spawns two threads calling VirtualFs::shutdown() simultaneously (simulating destroy() + signal handler race). Asserts both complete within 10s. Without the Once fix, this would deadlock on the FlushManager handle mutex.

XciD marked this pull request as ready for review April 1, 2026 18:14

XciD merged commit 8af7656 into main Apr 1, 2026
1 check passed

XciD deleted the fix/signal-race-data-loss branch April 1, 2026 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent deadlock when shutdown is called concurrently#89

fix: prevent deadlock when shutdown is called concurrently#89
XciD merged 2 commits intomainfrom
fix/signal-race-data-loss

XciD commented Apr 1, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

XciD commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Tests

Uh oh!

github-actions bot commented Apr 1, 2026

POSIX Compliance (pjdfstest)

Uh oh!

github-actions bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

XciD commented Apr 1, 2026 •

edited

Loading

github-actions bot commented Apr 1, 2026 •

edited

Loading