Nomad version
Nomad v1.10.0
BuildDate 2025-04-09T16:40:54Z
Revision e26a2bd
Operating system and Environment details
Rocky 9.3
Issue
When server node is restarted with systemctl restart nomad, it often never opens port 4646 (although sometimes it does). The only way to fix this is to stop the service, wait 30 seconds or more, then start the service.
Reproduction steps
Restart nomad systemctl service
Expected Result
Port 4646 opens
Actual Result
Port 4646 never opens, although 4648 and 4647 do open.
$ ss -ltnp | grep nomad
LISTEN 0 4096 *:4648 *:* users:(("nomad",pid=216756,fd=9))
LISTEN 0 4096 *:4647 *:* users:(("nomad",pid=216756,fd=6))
Nomad Server logs (if appropriate)
Nothing is notable in the server logs when this occurs:
Mar 27 00:37:18 nomad[216705]: nomad: setting up raft bolt store: no_freelist_sync=false
Mar 27 00:37:18 nomad[216705]: nomad.raft: starting restore from snapshot: id=76-3907612-1774483368628 last-index=3907612 last-term=76 size-in-bytes=6376122982
Mar 27 00:37:29 nomad[216705]: nomad.raft: snapshot restore progress: id=76-3907612-1774483368628 last-index=3907612 last-term=76 size-in-bytes=6376122982 read-bytes=6376122982 percent-complete="100.00%"
Mar 27 00:37:29 nomad[216705]: nomad.raft: restored from snapshot: id=76-3907612-1774483368628 last-index=3907612 last-term=76 size-in-bytes=6376122982
Mar 27 00:37:29 nomad[216705]: nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:c87212c8-5294-bfb3-6dab-701eb4b32dc3 Address:10.0.35.52:4647}]"
Mar 27 00:37:29 nomad[216705]: nomad.raft: entering follower state: follower="Node at 10.0.35.52:4647 [Follower]" leader-address= leader-id=
Mar 27 00:37:29 nomad[216705]: nomad: serf: EventMemberJoin: nomad-server0.global 10.0.35.52
Mar 27 00:37:29 nomad[216705]: nomad: starting scheduling worker(s): num_workers=4 schedulers=["batch", "system", "_core", "service"]
Mar 27 00:37:29 nomad[216705]: nomad: started scheduling worker(s): num_workers=4 schedulers=["batch", "system", "_core", "service"]
Mar 27 00:37:29 nomad[216705]: nomad: serf: Failed to re-join any previously known node
Mar 27 00:37:29 nomad[216705]: nomad: adding server: server="nomad-server0.global (Addr: 10.0.35.52:4647) (DC: aws-us-east-1)"
Mar 27 00:37:31 nomad[216705]: nomad.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
Mar 27 00:37:31 nomad[216705]: nomad.raft: entering candidate state: node="Node at 10.0.35.52:4647 [Candidate]" term=77
Mar 27 00:37:31 nomad[216705]: nomad.raft: pre-vote successful, starting election: term=77 tally=1 refused=0 votesNeeded=1
Mar 27 00:37:31 nomad[216705]: nomad.raft: election won: term=77 tally=1
Mar 27 00:37:31 nomad[216705]: nomad.raft: entering leader state: leader="Node at 10.0.35.52:4647 [Leader]"
Mar 27 00:37:31 nomad[216705]: nomad: cluster leadership acquired
Mar 27 00:37:31 nomad[216705]: nomad: eval broker status modified: paused=false
Mar 27 00:37:31 nomad[216705]: nomad: blocked evals status modified: paused=false
Nomad version
Nomad v1.10.0
BuildDate 2025-04-09T16:40:54Z
Revision e26a2bd
Operating system and Environment details
Rocky 9.3
Issue
When server node is restarted with
systemctl restart nomad, it often never opens port 4646 (although sometimes it does). The only way to fix this is to stop the service, wait 30 seconds or more, then start the service.Reproduction steps
Restart nomad systemctl service
Expected Result
Port 4646 opens
Actual Result
Port 4646 never opens, although 4648 and 4647 do open.
Nomad Server logs (if appropriate)
Nothing is notable in the server logs when this occurs: