Skip to content

Conversation

@faganihajizada
Copy link
Contributor

@faganihajizada faganihajizada commented Oct 15, 2025

Summary

Add optional SSH access to NodeSet worker pods with persistent host keys. This feature is controlled by a new ssh.enabled field in the NodeSet CRD, making it opt-in with zero overhead when disabled. This implementation follows the same pattern as LoginSet for consistency.

Access control is enforced by pam_slurm_adopt (configured in the container images), which restricts SSH access to users with active jobs on the worker node.

Changes:

  • Add NodeSetSsh type with Enabled field to NodeSet CRD spec
  • Add SshHostKeys() method to generate namespaced name for SSH keys secret
  • Implement BuildWorkerSshHostKeys() for SSH key generation (RSA, Ed25519, ECDSA)
  • Conditionally mount SSH host keys volume when ssh.enabled: true and expose SSH port (22)
  • Add worker_secret.go following the same pattern as login_secret.go
  • Helm updates
  • Add unit tests for SSH host keys generation in worker_secret_test.go

Related PR in containers repo: SlinkyProject/containers#6

Breaking Changes

N/A

Testing Notes

Built operator based on this branch and SlinkyProject/containers#6

srun -p <partition> -n1 --time=5:00 sleep 300 &

# 2. Find the worker node running the job
squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R"

# 3. SSH from login node to worker node (should succeed)
ssh <worker-pod-hostname>

# 4. Verify you're on the worker node
hostname
ps aux | grep sleep

# 5. Cancel the job
scancel <job-id>

# 6. Try to SSH again (denied)
ssh <worker-pod-hostname>

Verified:

  • SSH daemon starts alongside slurmd
  • Unique SSH host keys generated per pod
  • Users can SSH to nodes where they have running jobs
  • Users cannot SSH to nodes without running jobs
  • PAM configuration applied correctly

@GridexX
Copy link

GridexX commented Nov 18, 2025

Thanks @faganihajizada for this PR !
It will be very useful for our team.

Hope to see it reviewed and merged soon 😊

Add SSH port (22) to worker container specification to enable users to
SSH into worker nodes where they have running jobs. This works with
updated slurmd images that include pam_slurm_adopt for access control.
RSA and ECDSA keys were swapped in the secret data mapping.
This fix aligns key types with their filenames.
Enable SSH access to NodeSet worker pods with a CRD toggle, following
the same pattern as LoginSet. SSH host keys are shared across all pods
in a NodeSet to prevent "host key changed" warnings when pods are
recreated or scaled.

Ref: https://slurm.schedmd.com/pam_slurm_adopt.
@faganihajizada faganihajizada changed the title Expose SSH port for pam_slurm_adopt Add optional SSH access to worker pods for pam_slurm_adopt Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants