fix: IPC/ptrace hardening — fix 30s exec hang + related races#215
Merged
fix: IPC/ptrace hardening — fix 30s exec hang + related races#215
Conversation
Atomically sets close-on-exec on the parent fd, eliminating a TOCTOU race window in multi-threaded Go processes where the fd could leak into a concurrently-forking child. Matches the CLI layer pattern (internal/cli/wrap_linux.go:93). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Treat ECHILD (already reaped), exited/signaled status, and ESRCH on detach as success with debug logging. Previously these races surfaced as exit_code=127 errors to callers (exec.go, exec_stream.go).
SetReadDeadline fails on os.NewFile-wrapped socketpair fds (not registered with Go's netpoll). The notify handler already handles this correctly (notify_linux.go:218-221). Match that pattern so signal monitoring is not silently disabled.
unixmon.RecvFD calls recvmsg on the raw fd, bypassing Go's netpoll, so SetReadDeadline was a no-op — it never actually bounded the receive. Replace with SO_RCVTIMEO, a kernel-level timeout that works with raw blocking recvmsg. Matches the wrapper pattern at cmd/agentsh-unixwrap/main.go:163. Fixes a Medium-severity roborev finding on 0e0a6b6 where the signal handler could block indefinitely on RecvFD if the wrapper stalled. The same issue exists in the notify handler (pre-existing); both are fixed here. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 10s SO_RCVTIMEO set on parentSock for the FD handoff persisted and unexpectedly bounded the later READY byte read, which was intended to have a 30s timeout. The previous SetReadDeadline calls in the READY block were silent no-ops because parentSock wraps a raw socketpair fd that isn't registered with Go's netpoll. Clear SO_RCVTIMEO immediately after RecvFD, apply a fresh 30s SO_RCVTIMEO before the READY read, and clear it again afterward so it doesn't leak to any subsequent reads on the socket. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
unix.SOCK_CLOEXEC is not defined on darwin, so the !windows build tag broke darwin cross-compilation after the previous SOCK_CLOEXEC fix. The seccomp user-notify code path this helper feeds is linux-only anyway (matching the notify_stub.go pattern), so narrow the build tag to linux and add a no-op stub for other unix platforms so core.go still builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agentsh-unixwrap signal filter installs its own SECCOMP_RET_USER_NOTIF filter on top of the main wrapper filter. When the main filter is already using USER_NOTIF (for unix socket monitoring, file monitoring, metadata interception, or execve interception), stacking a second listener on the same thread breaks notification delivery. On Alpine/musl this surfaces as libreadline EBADF loops inside bash because the signal filter's listener interferes with the main filter's openat notifications, causing TestAlpineEnvInject_BashBuiltinDisabled to hang indefinitely. Previously the wrap.go gate only disabled the signal filter when execveEnabled was true; unix sockets and file monitoring were silently stacked. core.go had no gate at all, so every exec that hit setupSeccompWrapper with signal rules tripped the bug. Extend the wrap.go signalFilterEnabled helper with a new mainFilterUsesUserNotify method that mirrors the feature gates in unixmon.InstallFilterWithConfig, and route core.go's setupSeccompWrapper through the same helper. execveEnabled has to move up above the gate in setupSeccompWrapper because the runtime value (which respects hybrid-ptrace mode) is what the gate must see. signal_handler_linux.go's earlier fix — exit on error instead of hot-looping on ENOENT when the filter is destroyed — is included here as an orthogonal defensive improvement. Fixes TestAlpineEnvInject_BashBuiltinDisabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three IPC/ptrace bugs surfaced by a 30-second hang in
agentsh execafter rebuilding withCGO=1. Root cause was thatSetReadDeadlineis a silent no-op onos.NewFile-wrapped socketpair fds (they aren't registered with Go's netpoll), so the FD-handoff and READY-byte handshakes had no effective timeout.wrap_linux.go. Prevents fd leak on accidental fork/exec.resumeTracedProcessrace hardening: handlesECHILDfromWait4, already-exited/signaled status, andESRCHfromPtraceDetach. Each path is logged at debug for forensics.SetReadDeadlinewithSO_RCVTIMEO(kernel-level timeout that works with raw blockingrecvmsg) in bothnotify_linux.goandsignal_handler_linux.go. Added lifecycle management in the notify handler so the 10s FD-handoff timeout is cleared before the 30s READY-byte read.Design spec and implementation plan live on
mainatdocs/superpowers/specs/2026-04-10-ipc-ptrace-hardening-design.mdanddocs/superpowers/plans/2026-04-10-ipc-ptrace-hardening.md.Test plan
go build ./...(linux)GOOS=darwin go build ./...GOOS=windows go build ./...go test ./...(all packages pass)go vet ./internal/api/...(clean on all platforms)roborev reviewon every commit (no findings above Low)agentsh execwith CGO=1 no longer hangs🤖 Generated with Claude Code