Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion templates/common/on-prem/files/keepalived.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ contents:
{
if pid=$(pgrep -o keepalived); then
kill -s SIGTERM "$pid"
# Give keepalived time to shut down
while pgrep -o keepalived; do sleep 1; done
Copy link
Contributor

@rbbratta rbbratta Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

killall -o -w -s SIGTERM keepalived maybe?

  -o,--older-than     kill processes older than TIME
  -w,--wait           wait for processes to die

or do we need the extra sleeps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind. killall doesn't seem to match pgrep -o. Could pkill though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or could we do wait $pid it's a child process?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we don't want to be fast, we want to sleep.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked about this with Ross offline, but documenting here for future reference:

The reason I'm looping on pgrep instead of using wait is that sometimes the pid we get back from keepalived isn't a child of the main script and wait will fail. I suspect that may be a bug in itself - we use -o to get the oldest pid, which would presumably be the parent keepalived process that was started by the main script, but it seems that isn't always true. It may be that the oldest pid somehow isn't always the main keepalived process.

In any case, another advantage of using pgrep is we will wait until all of the keepalived processes have exited so we should know for certain that priority 0 was sent by the time that completes. It's a bit inelegant, but it seems to be the safest way to handle this.

fi
}

Expand Down Expand Up @@ -146,7 +148,8 @@ contents:
fi

rm -f "$keepalived_sock"
socat UNIX-LISTEN:${keepalived_sock},fork system:'bash -c msg_handler'
socat UNIX-LISTEN:${keepalived_sock},fork system:'bash -c msg_handler' &
wait
resources:
requests:
cpu: 100m
Expand Down