Skip to content

fix(jailer): replace hardcoded FD limits with dynamic closure strategies#406

Open
lilongen wants to merge 1 commit intoboxlite-ai:mainfrom
lilongen:fix/jailer-dynamic-fd-closure
Open

fix(jailer): replace hardcoded FD limits with dynamic closure strategies#406
lilongen wants to merge 1 commit intoboxlite-ai:mainfrom
lilongen:fix/jailer-dynamic-fd-closure

Conversation

@lilongen
Copy link
Copy Markdown

Summary

  • Replace hardcoded FD upper bounds (1024 on Linux, 4096 on macOS) with dynamic limits from getrlimit(RLIMIT_NOFILE), fixing FD leakage on systems with raised ulimit -n
  • Add /proc/self/fd enumeration via raw getdents64 syscall as an efficient middle strategy on Linux (zero heap allocation, async-signal-safe)
  • Restructure close_fds_from() into a 3-strategy cascade: close_range/proc/self/fd → brute-force with dynamic limit

Problem

The FD cleanup in the pre_exec hook used hardcoded upper bounds:

  • Linux: for fd in first_fd..1024
  • macOS: for fd in first_fd..4096

On production systems with ulimit -n 65536 or higher, any FDs above these limits leaked into the jailed process, potentially exposing credentials, database connections, or network sockets inherited from the parent.

Solution

Linux — 3-strategy cascade

Strategy Condition Complexity Description
1. close_range Linux 5.9+ O(1) Kernel closes all FDs in range (already existed)
2. /proc/self/fd /proc mounted O(open FDs) Enumerate via raw getdents64, close only open FDs
3. Brute-force Fallback O(ulimit) Close first_fd..getrlimit(RLIMIT_NOFILE)

macOS

Brute-force close with dynamic limit from getrlimit(RLIMIT_NOFILE) (replaces hardcoded 4096).

Key constraints

All operations are async-signal-safe (required for pre_exec context):

  • No heap allocation — stack buffer for getdents64, byte literals for paths
  • No io::Error, String, Vec, CString — raw syscalls only
  • getrlimit is POSIX async-signal-safe
  • RLIM_INFINITY handled by capping to 1,048,576 (Linux nr_open default)

Test plan

  • test_close_high_numbered_fd — creates FD 2000 (above old limits), verifies closure
  • test_get_max_fd_returns_positive — validates dynamic limit query
  • test_parse_fd_from_name (Linux) — covers valid FDs, ./.., empty, i32 overflow
  • All 4 existing tests continue to pass
  • cargo clippy -D warnings — zero warnings on both platforms
  • cargo fmt --check — clean
  • Tested on macOS ARM64 (6/6 passed) and Linux ARM64 via Lima (7/7 passed)

🤖 Generated with Claude Code

The FD cleanup in pre_exec used hardcoded upper bounds (1024 on Linux,
4096 on macOS). On systems with raised ulimit -n, FDs above these limits
leaked into jailed processes, potentially exposing credentials, database
connections, or network sockets.

Replace with a 3-strategy cascade (Linux):
1. close_range(first_fd, ~0U, 0) — O(1), Linux 5.9+
2. /proc/self/fd enumeration via raw getdents64 — no heap allocation
3. Brute-force close with dynamic limit from getrlimit(RLIMIT_NOFILE)

macOS uses brute-force with dynamic getrlimit limit (replaces hardcoded 4096).

All operations remain async-signal-safe for the pre_exec context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 26, 2026 03:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the jailer’s pre-exec FD cleanup to avoid leaking inherited file descriptors on hosts with high ulimit -n, while maintaining the async-signal-safe constraint required by CommandExt::pre_exec.

Changes:

  • Replace hardcoded FD upper bounds with a dynamic limit from getrlimit(RLIMIT_NOFILE).
  • Add a Linux-only middle strategy that enumerates /proc/self/fd via raw getdents64 to close only actually-open FDs.
  • Add tests covering high-numbered FD closure and Linux FD-name parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +71 to +95
loop {
let nread = unsafe {
libc::syscall(
libc::SYS_getdents64,
dir_fd,
buf.as_mut_ptr(),
buf.len() as libc::c_uint,
)
};

if nread <= 0 {
break;
}

let mut offset = 0usize;
while offset < nread as usize {
// SAFETY: getdents64 returns packed linux_dirent64 structs.
// d_reclen is at byte offset 16. Use read_unaligned because the
// buffer is a byte array and the u16 field may not be 2-byte aligned.
let d_reclen =
unsafe { buf.as_ptr().add(offset + 16).cast::<u16>().read_unaligned() } as usize;

if d_reclen == 0 || offset + d_reclen > nread as usize {
break;
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close_fds_via_proc currently returns true even if getdents64 fails (e.g., syscall returns -1 for EINTR/EBADF). That would skip the brute-force fallback and can leave FDs open. Consider treating nread < 0 (and also malformed d_reclen conditions) as a failure: close dir_fd and return false so brute_force_close_fds() runs.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +40
if result == 0 && rlim.rlim_cur > 0 {
if rlim.rlim_cur < i32::MAX as u64 {
rlim.rlim_cur as i32
} else {
// rlim_cur is RLIM_INFINITY or very large; cap to a safe maximum.
// 1048576 (2^20) matches Linux's default /proc/sys/fs/nr_open.
1_048_576
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_max_fd() uses if rlim.rlim_cur < i32::MAX as u64 which incorrectly treats an exact i32::MAX soft limit as “too large” and caps it to 1,048,576. This looks like an off-by-one comparison; use <= if the intent is “fits in i32”.

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +43
// getrlimit failed or returned 0; safe default per POSIX OPEN_MAX
1024
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If getrlimit(RLIMIT_NOFILE) fails (or returns 0), get_max_fd() falls back to 1024, which can reintroduce the original FD-leak problem on hosts with a higher limit (since brute-force will stop at 1024). Consider a fallback that still errs on the side of closing more FDs (e.g., a higher cap like the existing 1,048,576) to preserve the security invariant even when getrlimit is blocked or fails.

Suggested change
// getrlimit failed or returned 0; safe default per POSIX OPEN_MAX
1024
// getrlimit failed or returned 0; use the same conservative cap to
// ensure we still attempt to close a wide range of FDs.
1_048_576

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants