Skip to content

Session crashes or hangs if compositor fails before sending environment variables #191

@modelmiser

Description

@modelmiser

Two unhandled failure modes in the session↔compositor bootstrap sequence. Both produce the same outcome: the session hangs indefinitely.

Root cause: parent holds comp-side fd

run_compositor (comp.rs:96, socket setup at lines 107-118) creates a Unix socket pair and captures the comp-side OwnedFd inside the spawned tokio task's async move block. After start_process passes the fd number to the child via the COSMIC_SESSION_SOCK env var, the parent process still holds the comp-side fd open.

When the child process crashes or hangs before sending SetEnv:

  1. session_rx.read_exact() in the IPC loop won't return EOF — the parent's OwnedFd keeps the comp side alive
  2. The IPC loop blocks forever → env_tx is never dropped
  3. env_rx.await at main.rs:121-124 hangs indefinitely:
let mut env_vars = env_rx
    .await
    .expect("failed to receive environmental variables")
  1. The on_exit callback sends SessionRequest::Restart, but start() is stuck at env_rx.await and never reaches the tokio::select! (line 309) that would receive it
  2. compositor_handle.abort() at line 334 also never fires

No child processes (panel, notifications, applets) are ever started. The user sees a blank screen until systemd's service timeout kills the session.

Triggers: GPU driver crash at startup, OOM during compositor init, GPU driver initialization hang, missing Wayland dependencies.

Possible fixes

  1. Drop the comp-side OwnedFd after start_process returns — the child already has its own reference to the fd. This would let session_rx detect child exit via EOF, break the IPC loop, drop env_tx, and resolve env_rx.await with Err(RecvError).

  2. Add a timeout — e.g., tokio::time::timeout(Duration::from_secs(N), env_rx) — to handle the case where the child is alive but stuck (GPU driver hang). On timeout, retry or exit cleanly.

  3. Replace .expect() with error handling so the outer restart loop in main() can re-enter start().

Fix 1 addresses the fd leak that prevents crash detection. Fix 2 addresses the hang-while-alive case. Fix 3 is needed regardless so the session can recover.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions