Skip to content

Server Node Doesn't Open Port 4646 After Restart with Systemctl #27766

@msherman13

Description

@msherman13

Nomad version

Nomad v1.10.0
BuildDate 2025-04-09T16:40:54Z
Revision e26a2bd

Operating system and Environment details

Rocky 9.3

Issue

When server node is restarted with systemctl restart nomad, it often never opens port 4646 (although sometimes it does). The only way to fix this is to stop the service, wait 30 seconds or more, then start the service.

Reproduction steps

Restart nomad systemctl service

Expected Result

Port 4646 opens

Actual Result

Port 4646 never opens, although 4648 and 4647 do open.

$ ss -ltnp | grep nomad
LISTEN 0      4096               *:4648            *:*    users:(("nomad",pid=216756,fd=9))                                                                                                                                  
LISTEN 0      4096               *:4647            *:*    users:(("nomad",pid=216756,fd=6))                                                                                                                                  

Nomad Server logs (if appropriate)

Nothing is notable in the server logs when this occurs:

Mar 27 00:37:18 nomad[216705]:  nomad: setting up raft bolt store: no_freelist_sync=false
Mar 27 00:37:18 nomad[216705]:  nomad.raft: starting restore from snapshot: id=76-3907612-1774483368628 last-index=3907612 last-term=76 size-in-bytes=6376122982
Mar 27 00:37:29 nomad[216705]:  nomad.raft: snapshot restore progress: id=76-3907612-1774483368628 last-index=3907612 last-term=76 size-in-bytes=6376122982 read-bytes=6376122982 percent-complete="100.00%"
Mar 27 00:37:29 nomad[216705]:  nomad.raft: restored from snapshot: id=76-3907612-1774483368628 last-index=3907612 last-term=76 size-in-bytes=6376122982
Mar 27 00:37:29 nomad[216705]:  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:c87212c8-5294-bfb3-6dab-701eb4b32dc3 Address:10.0.35.52:4647}]"
Mar 27 00:37:29 nomad[216705]:  nomad.raft: entering follower state: follower="Node at 10.0.35.52:4647 [Follower]" leader-address= leader-id=
Mar 27 00:37:29 nomad[216705]:  nomad: serf: EventMemberJoin: nomad-server0.global 10.0.35.52
Mar 27 00:37:29 nomad[216705]:  nomad: starting scheduling worker(s): num_workers=4 schedulers=["batch", "system", "_core", "service"]
Mar 27 00:37:29 nomad[216705]:  nomad: started scheduling worker(s): num_workers=4 schedulers=["batch", "system", "_core", "service"]
Mar 27 00:37:29 nomad[216705]:  nomad: serf: Failed to re-join any previously known node
Mar 27 00:37:29 nomad[216705]:  nomad: adding server: server="nomad-server0.global (Addr: 10.0.35.52:4647) (DC: aws-us-east-1)"
Mar 27 00:37:31 nomad[216705]:  nomad.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
Mar 27 00:37:31 nomad[216705]:  nomad.raft: entering candidate state: node="Node at 10.0.35.52:4647 [Candidate]" term=77
Mar 27 00:37:31 nomad[216705]:  nomad.raft: pre-vote successful, starting election: term=77 tally=1 refused=0 votesNeeded=1
Mar 27 00:37:31 nomad[216705]:  nomad.raft: election won: term=77 tally=1
Mar 27 00:37:31 nomad[216705]:  nomad.raft: entering leader state: leader="Node at 10.0.35.52:4647 [Leader]"
Mar 27 00:37:31 nomad[216705]:  nomad: cluster leadership acquired
Mar 27 00:37:31 nomad[216705]:  nomad: eval broker status modified: paused=false
Mar 27 00:37:31 nomad[216705]:  nomad: blocked evals status modified: paused=false

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Triaging

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions