feat(vmm): replace ProcessMonitor polling with pidfd/kqueue event-driven detection#407
feat(vmm): replace ProcessMonitor polling with pidfd/kqueue event-driven detection#407lilongen wants to merge 2 commits intoboxlite-ai:mainfrom
Conversation
The FD cleanup in pre_exec used hardcoded upper bounds (1024 on Linux, 4096 on macOS). On systems with raised ulimit -n, FDs above these limits leaked into jailed processes, potentially exposing credentials, database connections, or network sockets. Replace with a 3-strategy cascade (Linux): 1. close_range(first_fd, ~0U, 0) — O(1), Linux 5.9+ 2. /proc/self/fd enumeration via raw getdents64 — no heap allocation 3. Brute-force close with dynamic limit from getrlimit(RLIMIT_NOFILE) macOS uses brute-force with dynamic getrlimit limit (replaces hardcoded 4096). All operations remain async-signal-safe for the pre_exec context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ven detection ProcessMonitor::wait_for_exit() used a 500ms sleep-based polling loop (tokio::time::sleep + try_wait), violating the project's Rule boxlite-ai#15: "No Sleep for Events." This added up to 500ms latency to VM crash detection during startup (used in guest_connect.rs select! race). Replace with platform-native event-driven mechanisms: - Linux: pidfd_open() (kernel 5.3+) + tokio AsyncFd - macOS: kqueue + EVFILT_PROC + NOTE_EXIT + tokio AsyncFd - Fallback: 100ms polling for older kernels (< 5.3) Key design decisions: - OwnedFd wraps raw FDs immediately after creation (leak-free by construction) - fcntl O_NONBLOCK checked; falls back to polling on failure - Block scope for kevent struct (contains *mut c_void, not Send) - Best-effort race guard via is_alive() after FD setup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR removes the sleep-based polling latency from ProcessMonitor::wait_for_exit() by using event-driven process-exit notifications on Linux/macOS, while retaining a polling fallback path. It also expands jailer FD-cleanup logic to use more efficient platform mechanisms before falling back to brute-force closing.
Changes:
- Implement event-driven process exit detection via Linux
pidfd_open()and macOSkqueue/EVFILT_PROC, with a 100ms polling fallback. - Add async tests covering exit-code capture, delayed exit detection, and non-existent PID behavior.
- Update jailer FD closing strategy to prefer
close_range, then/proc/self/fdenumeration viagetdents64, then a dynamic brute-force bound viagetrlimit.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| boxlite/src/util/process.rs | Replaces polling wait_for_exit() with pidfd/kqueue event-driven waiting plus fallback and adds tests. |
| boxlite/src/jailer/common/fd.rs | Reworks inherited-FD cleanup to prefer close_range and /proc enumeration, with an RLIMIT-based brute-force fallback and new tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
| }; | ||
|
|
||
| if nread <= 0 { |
There was a problem hiding this comment.
getdents64 errors are currently treated the same as EOF (if nread <= 0 { break; }), and the function still returns true. If nread < 0 (e.g., EINTR), this will stop enumeration early and skip the brute-force fallback, potentially leaving inherited FDs open. Handle nread < 0 as a failure (close dir_fd and return false).
| if nread <= 0 { | |
| if nread < 0 { | |
| // Error while enumerating /proc/self/fd: treat as failure so that | |
| // callers can fall back to a brute-force close strategy. | |
| // SAFETY: close() is async-signal-safe. dir_fd was opened by us above. | |
| unsafe { libc::close(dir_fd) }; | |
| return false; | |
| } | |
| if nread == 0 { | |
| // EOF reached: enumeration completed successfully. |
| fn brute_force_close_fds(first_fd: i32) { | ||
| let max_fd = get_max_fd(); | ||
| for fd in first_fd..max_fd { | ||
| // SAFETY: close() is async-signal-safe. Closing a non-open FD returns EBADF (ignored). | ||
| unsafe { libc::close(fd) }; |
There was a problem hiding this comment.
brute_force_close_fds() relies on get_max_fd() (soft RLIMIT_NOFILE) as an upper bound. If the process opened high-numbered FDs and then lowered its soft limit before fork, rlim_cur can be below the highest open FD, so this loop may miss closing some inherited FDs in the last-resort path. Consider using a more conservative bound (e.g., max(rlim_cur, rlim_max) and/or a minimum floor matching the old hardcoded limits) to avoid FD leaks when falling back to brute force.
| //! ## Strategy (in order of preference) | ||
| //! | ||
| //! 1. **Linux 5.9+**: `close_range(first_fd, ~0U, 0)` — O(1), kernel closes all. | ||
| //! 2. **Linux < 5.9**: Enumerate `/proc/self/fd` via raw `getdents64` syscall | ||
| //! (no memory allocation) and close only open FDs. Falls back to brute-force | ||
| //! with a dynamic upper bound from `getrlimit(RLIMIT_NOFILE)`. |
There was a problem hiding this comment.
The PR description focuses on ProcessMonitor::wait_for_exit() (pidfd/kqueue), but this file introduces a substantial new FD-closing strategy (close_range + /proc getdents64 + rlimit). If this is intentional, it should be called out in the PR description (or split into a separate PR) so reviewers understand the added scope and security impact.
| if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, libc::O_NONBLOCK) < 0 { | ||
| return None; // OwnedFd closes the FD on drop | ||
| } |
There was a problem hiding this comment.
fcntl(F_SETFL, O_NONBLOCK) overwrites the FD’s existing status flags. Even if pidfds typically start with 0 flags, it’s safer to preserve existing flags by reading with F_GETFL and OR-ing O_NONBLOCK; otherwise this could unexpectedly clear other status flags if they exist in future kernels/libc behavior.
| if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, libc::O_NONBLOCK) < 0 { | |
| return None; // OwnedFd closes the FD on drop | |
| } | |
| let mut flags = libc::fcntl(owned.as_raw_fd(), libc::F_GETFL); | |
| if flags < 0 { | |
| return None; // OwnedFd closes the FD on drop | |
| } | |
| flags |= libc::O_NONBLOCK; | |
| if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, flags) < 0 { | |
| return None; // OwnedFd closes the FD on drop | |
| } |
| if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, libc::O_NONBLOCK) < 0 { | ||
| return None; // OwnedFd closes the kqueue FD on drop | ||
| } |
There was a problem hiding this comment.
Same issue here: fcntl(F_SETFL, O_NONBLOCK) replaces the current file status flags rather than adding non-blocking mode. Prefer F_GETFL + F_SETFL(old | O_NONBLOCK) to avoid clearing any existing flags on the kqueue FD.
| if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, libc::O_NONBLOCK) < 0 { | |
| return None; // OwnedFd closes the kqueue FD on drop | |
| } | |
| let flags = libc::fcntl(owned.as_raw_fd(), libc::F_GETFL); | |
| if flags < 0 { | |
| return None; // OwnedFd closes the kqueue FD on drop | |
| } | |
| if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, flags | libc::O_NONBLOCK) < 0 { | |
| return None; // OwnedFd closes the kqueue FD on drop | |
| } |
Summary
ProcessMonitor::wait_for_exit()500ms sleep-based polling loop with platform-native event-driven process exit detectionpidfd_open()(kernel 5.3+) wrapped intokio::io::unix::AsyncFdfor near-zero latencykqueue+EVFILT_PROC+NOTE_EXITwrapped inAsyncFdMotivation
ProcessMonitor::wait_for_exit()is used in production atguest_connect.rsinside atokio::select!race to detect VM subprocess death during startup. The 500ms polling loop added up to 500ms latency to crash detection. Withpidfd/kqueue, detection is near-instantaneous.Design
Key safety decisions:
OwnedFdwraps raw FDs immediately after creation — all error paths are leak-free by constructionfcntl(O_NONBLOCK)return checked; falls back to polling on failure (required byAsyncFd)keventstruct (contains*mut c_void, notSend) — dropped before.awaitis_alive()after FD setup, before.awaitlibc+tokio(already inCargo.toml)Test plan
test_wait_for_exit_captures_exit_code— verifies exit code through event-driven pathtest_wait_for_exit_detects_delayed_exit— spawns child with 200ms delay, verifies prompt detectiontest_wait_for_exit_already_dead_pid— non-existent PID returnsUnknownimmediatelycargo fmt --checkclean🤖 Generated with Claude Code