fix(jailer): replace hardcoded FD limits with dynamic closure strategies#406
fix(jailer): replace hardcoded FD limits with dynamic closure strategies#406lilongen wants to merge 1 commit intoboxlite-ai:mainfrom
Conversation
The FD cleanup in pre_exec used hardcoded upper bounds (1024 on Linux, 4096 on macOS). On systems with raised ulimit -n, FDs above these limits leaked into jailed processes, potentially exposing credentials, database connections, or network sockets. Replace with a 3-strategy cascade (Linux): 1. close_range(first_fd, ~0U, 0) — O(1), Linux 5.9+ 2. /proc/self/fd enumeration via raw getdents64 — no heap allocation 3. Brute-force close with dynamic limit from getrlimit(RLIMIT_NOFILE) macOS uses brute-force with dynamic getrlimit limit (replaces hardcoded 4096). All operations remain async-signal-safe for the pre_exec context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates the jailer’s pre-exec FD cleanup to avoid leaking inherited file descriptors on hosts with high ulimit -n, while maintaining the async-signal-safe constraint required by CommandExt::pre_exec.
Changes:
- Replace hardcoded FD upper bounds with a dynamic limit from
getrlimit(RLIMIT_NOFILE). - Add a Linux-only middle strategy that enumerates
/proc/self/fdvia rawgetdents64to close only actually-open FDs. - Add tests covering high-numbered FD closure and Linux FD-name parsing.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| loop { | ||
| let nread = unsafe { | ||
| libc::syscall( | ||
| libc::SYS_getdents64, | ||
| dir_fd, | ||
| buf.as_mut_ptr(), | ||
| buf.len() as libc::c_uint, | ||
| ) | ||
| }; | ||
|
|
||
| if nread <= 0 { | ||
| break; | ||
| } | ||
|
|
||
| let mut offset = 0usize; | ||
| while offset < nread as usize { | ||
| // SAFETY: getdents64 returns packed linux_dirent64 structs. | ||
| // d_reclen is at byte offset 16. Use read_unaligned because the | ||
| // buffer is a byte array and the u16 field may not be 2-byte aligned. | ||
| let d_reclen = | ||
| unsafe { buf.as_ptr().add(offset + 16).cast::<u16>().read_unaligned() } as usize; | ||
|
|
||
| if d_reclen == 0 || offset + d_reclen > nread as usize { | ||
| break; | ||
| } |
There was a problem hiding this comment.
close_fds_via_proc currently returns true even if getdents64 fails (e.g., syscall returns -1 for EINTR/EBADF). That would skip the brute-force fallback and can leave FDs open. Consider treating nread < 0 (and also malformed d_reclen conditions) as a failure: close dir_fd and return false so brute_force_close_fds() runs.
| if result == 0 && rlim.rlim_cur > 0 { | ||
| if rlim.rlim_cur < i32::MAX as u64 { | ||
| rlim.rlim_cur as i32 | ||
| } else { | ||
| // rlim_cur is RLIM_INFINITY or very large; cap to a safe maximum. | ||
| // 1048576 (2^20) matches Linux's default /proc/sys/fs/nr_open. | ||
| 1_048_576 | ||
| } |
There was a problem hiding this comment.
get_max_fd() uses if rlim.rlim_cur < i32::MAX as u64 which incorrectly treats an exact i32::MAX soft limit as “too large” and caps it to 1,048,576. This looks like an off-by-one comparison; use <= if the intent is “fits in i32”.
| // getrlimit failed or returned 0; safe default per POSIX OPEN_MAX | ||
| 1024 |
There was a problem hiding this comment.
If getrlimit(RLIMIT_NOFILE) fails (or returns 0), get_max_fd() falls back to 1024, which can reintroduce the original FD-leak problem on hosts with a higher limit (since brute-force will stop at 1024). Consider a fallback that still errs on the side of closing more FDs (e.g., a higher cap like the existing 1,048,576) to preserve the security invariant even when getrlimit is blocked or fails.
| // getrlimit failed or returned 0; safe default per POSIX OPEN_MAX | |
| 1024 | |
| // getrlimit failed or returned 0; use the same conservative cap to | |
| // ensure we still attempt to close a wide range of FDs. | |
| 1_048_576 |
Summary
getrlimit(RLIMIT_NOFILE), fixing FD leakage on systems with raisedulimit -n/proc/self/fdenumeration via rawgetdents64syscall as an efficient middle strategy on Linux (zero heap allocation, async-signal-safe)close_fds_from()into a 3-strategy cascade:close_range→/proc/self/fd→ brute-force with dynamic limitProblem
The FD cleanup in the
pre_exechook used hardcoded upper bounds:for fd in first_fd..1024for fd in first_fd..4096On production systems with
ulimit -n 65536or higher, any FDs above these limits leaked into the jailed process, potentially exposing credentials, database connections, or network sockets inherited from the parent.Solution
Linux — 3-strategy cascade
close_range/proc/self/fd/procmountedgetdents64, close only open FDsfirst_fd..getrlimit(RLIMIT_NOFILE)macOS
Brute-force close with dynamic limit from
getrlimit(RLIMIT_NOFILE)(replaces hardcoded 4096).Key constraints
All operations are async-signal-safe (required for
pre_execcontext):getdents64, byte literals for pathsio::Error,String,Vec,CString— raw syscalls onlygetrlimitis POSIX async-signal-safeRLIM_INFINITYhandled by capping to 1,048,576 (Linuxnr_opendefault)Test plan
test_close_high_numbered_fd— creates FD 2000 (above old limits), verifies closuretest_get_max_fd_returns_positive— validates dynamic limit querytest_parse_fd_from_name(Linux) — covers valid FDs,./.., empty, i32 overflowcargo clippy -D warnings— zero warnings on both platformscargo fmt --check— clean🤖 Generated with Claude Code