Skip to content

fix: prevent deadlock when shutdown is called concurrently#89

Merged
XciD merged 2 commits intomainfrom
fix/signal-race-data-loss
Apr 1, 2026
Merged

fix: prevent deadlock when shutdown is called concurrently#89
XciD merged 2 commits intomainfrom
fix/signal-race-data-loss

Conversation

@XciD
Copy link
Copy Markdown
Member

@XciD XciD commented Apr 1, 2026

Summary

Fix a deadlock where destroy() and the signal handler both call FlushManager::shutdown() concurrently, causing the process to hang forever and dirty files to never be uploaded.

Root cause

When a CSI-managed pod terminates, the CSI driver does fuseUnmount(source) (triggering destroy() -> flush) then Delete(pod) (SIGTERM) nearly simultaneously. The signal handler calls shutdown() which deadlocks on self.handle.lock() because destroy()'s shutdown() holds the MutexGuard across runtime.block_on(handle) (the guard stays alive inside the if let body).

Production logs show 53ms between destroy starting flush and SIGTERM arriving. Process hung indefinitely with 2 dirty files never uploaded.

Fix

  1. Single-shot shutdown via AtomicBool: first caller runs the flush, concurrent callers return immediately
  2. Drop MutexGuard before blocking: let handle = lock.take(); drop(lock); block_on(handle) instead of if let Some(h) = lock.take() { block_on(h) } which holds the guard
  3. Signal handler returns instead of process::exit(1) when unmount fails, letting the normal shutdown path complete

Tests

215 unit tests pass. The deadlock was timing-dependent (requires SIGTERM during in-flight flush) and was observed in CI but not reproducible on EC2.

@XciD XciD force-pushed the fix/signal-race-data-loss branch from e060168 to 2a3a1c6 Compare April 1, 2026 16:42
@XciD XciD force-pushed the fix/signal-race-data-loss branch from 2a3a1c6 to b826b87 Compare April 1, 2026 17:41
@XciD XciD changed the title fix: prevent data loss when signal races with destroy() flush fix: prevent deadlock when shutdown is called concurrently Apr 1, 2026
@XciD XciD force-pushed the fix/signal-race-data-loss branch 3 times, most recently from e01c73f to f917625 Compare April 1, 2026 17:57
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

POSIX Compliance (pjdfstest)

============================================================
  pjdfstest POSIX Compliance Results
------------------------------------------------------------
  Files: 130/130 passed    Tests: 832 total (0 subtests failed)
  Result: PASS
------------------------------------------------------------
  Category               Passed    Total   Status
  -------------------- -------- -------- --------
  chflags                     5        5       OK
  chmod                       8        8       OK
  chown                       6        6       OK
  ftruncate                  13       13       OK
  granular                    5        5       OK
  mkdir                       9        9       OK
  open                       19       19       OK
  posix_fallocate             1        1       OK
  rename                     10       10       OK
  rmdir                      11       11       OK
  symlink                    10       10       OK
  truncate                   13       13       OK
  unlink                     11       11       OK
  utimensat                   9        9       OK
============================================================

Two fixes for a deadlock where destroy() and the signal handler both
call FlushManager::shutdown() concurrently:

1. Make shutdown() single-shot with std::sync::Once. The first caller
   runs the full flush. Concurrent callers block until it completes,
   then return without re-running.

2. In the signal handler, return instead of process::exit(1) when
   unmount_fuse() fails, since this means destroy() is already
   handling the flush via the normal shutdown path.

Root cause: the CSI driver does fuseUnmount (triggering destroy/flush)
then Delete(pod) (sending SIGTERM) nearly simultaneously. The signal
handler's shutdown() call deadlocked on the handle mutex held by
destroy()'s in-flight flush (MutexGuard stayed alive across block_on
inside the if-let body). With Once, the second caller simply waits for
the first to finish.
@XciD XciD force-pushed the fix/signal-race-data-loss branch from f917625 to 6c2ad07 Compare April 1, 2026 18:01
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Benchmark Results

============================================================
  Benchmark — 50MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                    197.8 MB/s     259.2 MB/s
  Sequential re-read                1529.3 MB/s    2349.1 MB/s
  Range read (1MB@25MB)               32.1 ms         0.2 ms
  Random reads (100x4KB avg)          34.5 ms         0.0 ms
  Sequential write (FUSE)           1487.2 MB/s
  Close latency (CAS+Hub)            0.139 s
  Write end-to-end                   289.7 MB/s
  Dedup write                       1745.2 MB/s
  Dedup close latency                0.077 s
  Dedup end-to-end                   472.7 MB/s
============================================================
============================================================
  Benchmark — 200MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                   1064.4 MB/s     927.6 MB/s
  Sequential re-read                1665.9 MB/s    2345.3 MB/s
  Range read (1MB@25MB)               33.6 ms         0.2 ms
  Random reads (100x4KB avg)          34.1 ms         0.0 ms
  Sequential write (FUSE)           1268.8 MB/s
  Close latency (CAS+Hub)            0.108 s
  Write end-to-end                   751.6 MB/s
  Dedup write                       1590.7 MB/s
  Dedup close latency                0.110 s
  Dedup end-to-end                   849.9 MB/s
============================================================
============================================================
  Benchmark — 500MB
------------------------------------------------------------
  Metric                                 FUSE          NFS
  ------------------------------ ------------ ------------
  Sequential read                   1350.2 MB/s    1455.8 MB/s
  Sequential re-read                1758.4 MB/s    2348.0 MB/s
  Range read (1MB@25MB)               31.5 ms         0.2 ms
  Random reads (100x4KB avg)          34.3 ms         0.0 ms
  Sequential write (FUSE)           1472.3 MB/s
  Close latency (CAS+Hub)            0.119 s
  Write end-to-end                  1090.2 MB/s
  Dedup write                       1474.9 MB/s
  Dedup close latency                0.124 s
  Dedup end-to-end                  1080.3 MB/s
============================================================
============================================================
  fio Benchmark Results
------------------------------------------------------------
  Job                        FUSE MB/s   NFS MB/s  FUSE IOPS   NFS IOPS
  ------------------------- ---------- ---------- ---------- ----------
  seq-read-100M                  456.6      473.9                      
  seq-reread-100M               2272.7       11.1                      
  rand-read-4k-100M                0.1        0.1         19         19
  seq-read-5x10M                 769.2      892.9                      
  rand-read-10x1M                  0.1        0.1         35         36
  Random Read Latency           FUSE avg      NFS avg
  ------------------------- ------------ ------------
  rand-read-4k-100M           53690.7 us   53240.2 us
  rand-read-10x1M             28698.0 us   27700.8 us
============================================================

Spawns two threads calling VirtualFs::shutdown() simultaneously
(simulating destroy() + signal handler race). Asserts both complete
within 10s. Without the Once fix, this would deadlock on the
FlushManager handle mutex.
@XciD XciD marked this pull request as ready for review April 1, 2026 18:14
@XciD XciD merged commit 8af7656 into main Apr 1, 2026
1 check passed
@XciD XciD deleted the fix/signal-race-data-loss branch April 1, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant