Two unhandled failure modes in the session↔compositor bootstrap sequence. Both produce the same outcome: the session hangs indefinitely.
Root cause: parent holds comp-side fd
run_compositor (comp.rs:96, socket setup at lines 107-118) creates a Unix socket pair and captures the comp-side OwnedFd inside the spawned tokio task's async move block. After start_process passes the fd number to the child via the COSMIC_SESSION_SOCK env var, the parent process still holds the comp-side fd open.
When the child process crashes or hangs before sending SetEnv:
session_rx.read_exact() in the IPC loop won't return EOF — the parent's OwnedFd keeps the comp side alive
- The IPC loop blocks forever →
env_tx is never dropped
env_rx.await at main.rs:121-124 hangs indefinitely:
let mut env_vars = env_rx
.await
.expect("failed to receive environmental variables")
- The
on_exit callback sends SessionRequest::Restart, but start() is stuck at env_rx.await and never reaches the tokio::select! (line 309) that would receive it
compositor_handle.abort() at line 334 also never fires
No child processes (panel, notifications, applets) are ever started. The user sees a blank screen until systemd's service timeout kills the session.
Triggers: GPU driver crash at startup, OOM during compositor init, GPU driver initialization hang, missing Wayland dependencies.
Possible fixes
-
Drop the comp-side OwnedFd after start_process returns — the child already has its own reference to the fd. This would let session_rx detect child exit via EOF, break the IPC loop, drop env_tx, and resolve env_rx.await with Err(RecvError).
-
Add a timeout — e.g., tokio::time::timeout(Duration::from_secs(N), env_rx) — to handle the case where the child is alive but stuck (GPU driver hang). On timeout, retry or exit cleanly.
-
Replace .expect() with error handling so the outer restart loop in main() can re-enter start().
Fix 1 addresses the fd leak that prevents crash detection. Fix 2 addresses the hang-while-alive case. Fix 3 is needed regardless so the session can recover.
🤖 Generated with Claude Code
Two unhandled failure modes in the session↔compositor bootstrap sequence. Both produce the same outcome: the session hangs indefinitely.
Root cause: parent holds comp-side fd
run_compositor(comp.rs:96, socket setup at lines 107-118) creates a Unix socket pair and captures the comp-sideOwnedFdinside the spawned tokio task'sasync moveblock. Afterstart_processpasses the fd number to the child via theCOSMIC_SESSION_SOCKenv var, the parent process still holds the comp-side fd open.When the child process crashes or hangs before sending
SetEnv:session_rx.read_exact()in the IPC loop won't return EOF — the parent'sOwnedFdkeeps the comp side aliveenv_txis never droppedenv_rx.awaitatmain.rs:121-124hangs indefinitely:on_exitcallback sendsSessionRequest::Restart, butstart()is stuck atenv_rx.awaitand never reaches thetokio::select!(line 309) that would receive itcompositor_handle.abort()at line 334 also never firesNo child processes (panel, notifications, applets) are ever started. The user sees a blank screen until systemd's service timeout kills the session.
Triggers: GPU driver crash at startup, OOM during compositor init, GPU driver initialization hang, missing Wayland dependencies.
Possible fixes
Drop the comp-side
OwnedFdafterstart_processreturns — the child already has its own reference to the fd. This would letsession_rxdetect child exit via EOF, break the IPC loop, dropenv_tx, and resolveenv_rx.awaitwithErr(RecvError).Add a timeout — e.g.,
tokio::time::timeout(Duration::from_secs(N), env_rx)— to handle the case where the child is alive but stuck (GPU driver hang). On timeout, retry or exit cleanly.Replace
.expect()with error handling so the outer restart loop inmain()can re-enterstart().Fix 1 addresses the fd leak that prevents crash detection. Fix 2 addresses the hang-while-alive case. Fix 3 is needed regardless so the session can recover.
🤖 Generated with Claude Code