Skip to content

feat(vmm): replace ProcessMonitor polling with pidfd/kqueue event-driven detection#407

Open
lilongen wants to merge 2 commits intoboxlite-ai:mainfrom
lilongen:feat/pidfd-kqueue-process-monitor
Open

feat(vmm): replace ProcessMonitor polling with pidfd/kqueue event-driven detection#407
lilongen wants to merge 2 commits intoboxlite-ai:mainfrom
lilongen:feat/pidfd-kqueue-process-monitor

Conversation

@lilongen
Copy link
Copy Markdown

Summary

  • Replace ProcessMonitor::wait_for_exit() 500ms sleep-based polling loop with platform-native event-driven process exit detection
  • Linux: pidfd_open() (kernel 5.3+) wrapped in tokio::io::unix::AsyncFd for near-zero latency
  • macOS: kqueue + EVFILT_PROC + NOTE_EXIT wrapped in AsyncFd
  • Fallback: 100ms polling for older kernels (Linux < 5.3) when event-driven setup fails
  • Fixes violation of project Rule Fix silent initialization failures #15: "No Sleep for Events"

Motivation

ProcessMonitor::wait_for_exit() is used in production at guest_connect.rs inside a tokio::select! race to detect VM subprocess death during startup. The 500ms polling loop added up to 500ms latency to crash detection. With pidfd/kqueue, detection is near-instantaneous.

Design

wait_for_exit()
  ├── Fast path: try_wait() → already dead? return immediately
  ├── Linux: pidfd_open() + AsyncFd::readable().await
  ├── macOS: kqueue(EVFILT_PROC, NOTE_EXIT) + AsyncFd::readable().await
  └── Fallback: 100ms polling loop (for kernel < 5.3)

Key safety decisions:

  • OwnedFd wraps raw FDs immediately after creation — all error paths are leak-free by construction
  • fcntl(O_NONBLOCK) return checked; falls back to polling on failure (required by AsyncFd)
  • Block scope for kevent struct (contains *mut c_void, not Send) — dropped before .await
  • Best-effort race guard via is_alive() after FD setup, before .await
  • No new crate dependencies — uses existing libc + tokio (already in Cargo.toml)

Test plan

  • New async test: test_wait_for_exit_captures_exit_code — verifies exit code through event-driven path
  • New async test: test_wait_for_exit_detects_delayed_exit — spawns child with 200ms delay, verifies prompt detection
  • New async test: test_wait_for_exit_already_dead_pid — non-existent PID returns Unknown immediately
  • All 598 existing tests pass on macOS ARM64
  • All process tests (31/31) pass on Linux ARM64 (via Lima VM)
  • Clippy clean (zero warnings) on both platforms
  • cargo fmt --check clean

🤖 Generated with Claude Code

lile and others added 2 commits March 26, 2026 10:55
The FD cleanup in pre_exec used hardcoded upper bounds (1024 on Linux,
4096 on macOS). On systems with raised ulimit -n, FDs above these limits
leaked into jailed processes, potentially exposing credentials, database
connections, or network sockets.

Replace with a 3-strategy cascade (Linux):
1. close_range(first_fd, ~0U, 0) — O(1), Linux 5.9+
2. /proc/self/fd enumeration via raw getdents64 — no heap allocation
3. Brute-force close with dynamic limit from getrlimit(RLIMIT_NOFILE)

macOS uses brute-force with dynamic getrlimit limit (replaces hardcoded 4096).

All operations remain async-signal-safe for the pre_exec context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ven detection

ProcessMonitor::wait_for_exit() used a 500ms sleep-based polling loop
(tokio::time::sleep + try_wait), violating the project's Rule boxlite-ai#15:
"No Sleep for Events." This added up to 500ms latency to VM crash
detection during startup (used in guest_connect.rs select! race).

Replace with platform-native event-driven mechanisms:
- Linux: pidfd_open() (kernel 5.3+) + tokio AsyncFd
- macOS: kqueue + EVFILT_PROC + NOTE_EXIT + tokio AsyncFd
- Fallback: 100ms polling for older kernels (< 5.3)

Key design decisions:
- OwnedFd wraps raw FDs immediately after creation (leak-free by construction)
- fcntl O_NONBLOCK checked; falls back to polling on failure
- Block scope for kevent struct (contains *mut c_void, not Send)
- Best-effort race guard via is_alive() after FD setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 26, 2026 06:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the sleep-based polling latency from ProcessMonitor::wait_for_exit() by using event-driven process-exit notifications on Linux/macOS, while retaining a polling fallback path. It also expands jailer FD-cleanup logic to use more efficient platform mechanisms before falling back to brute-force closing.

Changes:

  • Implement event-driven process exit detection via Linux pidfd_open() and macOS kqueue/EVFILT_PROC, with a 100ms polling fallback.
  • Add async tests covering exit-code capture, delayed exit detection, and non-existent PID behavior.
  • Update jailer FD closing strategy to prefer close_range, then /proc/self/fd enumeration via getdents64, then a dynamic brute-force bound via getrlimit.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
boxlite/src/util/process.rs Replaces polling wait_for_exit() with pidfd/kqueue event-driven waiting plus fallback and adds tests.
boxlite/src/jailer/common/fd.rs Reworks inherited-FD cleanup to prefer close_range and /proc enumeration, with an RLIMIT-based brute-force fallback and new tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)
};

if nread <= 0 {
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getdents64 errors are currently treated the same as EOF (if nread <= 0 { break; }), and the function still returns true. If nread < 0 (e.g., EINTR), this will stop enumeration early and skip the brute-force fallback, potentially leaving inherited FDs open. Handle nread < 0 as a failure (close dir_fd and return false).

Suggested change
if nread <= 0 {
if nread < 0 {
// Error while enumerating /proc/self/fd: treat as failure so that
// callers can fall back to a brute-force close strategy.
// SAFETY: close() is async-signal-safe. dir_fd was opened by us above.
unsafe { libc::close(dir_fd) };
return false;
}
if nread == 0 {
// EOF reached: enumeration completed successfully.

Copilot uses AI. Check for mistakes.
Comment on lines +156 to +160
fn brute_force_close_fds(first_fd: i32) {
let max_fd = get_max_fd();
for fd in first_fd..max_fd {
// SAFETY: close() is async-signal-safe. Closing a non-open FD returns EBADF (ignored).
unsafe { libc::close(fd) };
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brute_force_close_fds() relies on get_max_fd() (soft RLIMIT_NOFILE) as an upper bound. If the process opened high-numbered FDs and then lowered its soft limit before fork, rlim_cur can be below the highest open FD, so this loop may miss closing some inherited FDs in the last-resort path. Consider using a more conservative bound (e.g., max(rlim_cur, rlim_max) and/or a minimum floor matching the old hardcoded limits) to avoid FD leaks when falling back to brute force.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +15
//! ## Strategy (in order of preference)
//!
//! 1. **Linux 5.9+**: `close_range(first_fd, ~0U, 0)` — O(1), kernel closes all.
//! 2. **Linux < 5.9**: Enumerate `/proc/self/fd` via raw `getdents64` syscall
//! (no memory allocation) and close only open FDs. Falls back to brute-force
//! with a dynamic upper bound from `getrlimit(RLIMIT_NOFILE)`.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description focuses on ProcessMonitor::wait_for_exit() (pidfd/kqueue), but this file introduces a substantial new FD-closing strategy (close_range + /proc getdents64 + rlimit). If this is intentional, it should be called out in the PR description (or split into a separate PR) so reviewers understand the added scope and security impact.

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +165
if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, libc::O_NONBLOCK) < 0 {
return None; // OwnedFd closes the FD on drop
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fcntl(F_SETFL, O_NONBLOCK) overwrites the FD’s existing status flags. Even if pidfds typically start with 0 flags, it’s safer to preserve existing flags by reading with F_GETFL and OR-ing O_NONBLOCK; otherwise this could unexpectedly clear other status flags if they exist in future kernels/libc behavior.

Suggested change
if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, libc::O_NONBLOCK) < 0 {
return None; // OwnedFd closes the FD on drop
}
let mut flags = libc::fcntl(owned.as_raw_fd(), libc::F_GETFL);
if flags < 0 {
return None; // OwnedFd closes the FD on drop
}
flags |= libc::O_NONBLOCK;
if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, flags) < 0 {
return None; // OwnedFd closes the FD on drop
}

Copilot uses AI. Check for mistakes.
Comment on lines +237 to +239
if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, libc::O_NONBLOCK) < 0 {
return None; // OwnedFd closes the kqueue FD on drop
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: fcntl(F_SETFL, O_NONBLOCK) replaces the current file status flags rather than adding non-blocking mode. Prefer F_GETFL + F_SETFL(old | O_NONBLOCK) to avoid clearing any existing flags on the kqueue FD.

Suggested change
if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, libc::O_NONBLOCK) < 0 {
return None; // OwnedFd closes the kqueue FD on drop
}
let flags = libc::fcntl(owned.as_raw_fd(), libc::F_GETFL);
if flags < 0 {
return None; // OwnedFd closes the kqueue FD on drop
}
if libc::fcntl(owned.as_raw_fd(), libc::F_SETFL, flags | libc::O_NONBLOCK) < 0 {
return None; // OwnedFd closes the kqueue FD on drop
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants