check_restart: move feature to task level hook#27750
check_restart: move feature to task level hook#27750mismithhisler wants to merge 6 commits intob-check-restart-refactorfrom
Conversation
If check_restart is specified in a group service block, it is not part of the task's lifecycle, so a task that restarts and relies on check_restart.grace will be restarted unnessecarily. These changes move the check_restart functionality to always be part of the tasks lifecycle.
| var checks []*hookCheck | ||
| for _, s := range taskenv.InterpolateServices(req.TaskEnv, h.tgServices) { | ||
| for _, c := range s.Checks { | ||
| if c.TriggersRestarts() && c.TaskName == h.taskName { |
There was a problem hiding this comment.
c.TaskName == h.taskName means we would only register that for group level checks, only the task specified in the service/check will be restarted. Also if the task is not specified the check won't even be registered.
| } | ||
| } | ||
|
|
||
| func (h *HandlerWrapper) CheckWatcher(provider, key string) serviceregistration.CheckWatcher { |
There was a problem hiding this comment.
In a subsequent PR, I want to remove the CheckWatcher from the serviceregistration.Wrapper. Currently we create a check watcher on the nomad client, and a check watcher for every consul API on the client and server (even though servers don't watch checks.. Arguably we could do a isNomadClient check and not run the watcher on the server, but the consul service client code is difficult enough, so I think pulling it out could help reduce some cognitive complexity).
I'd like to see this done as part of the normal client startup process. But it will take a little refactoring of how we create consul api clients.
Description
Currently, if check_restart is specified in a group service block, it is not part of the task's lifecycle, so a task that restarts and relies on check_restart.grace will be restarted unnessecarily due to failing checks. These changes move the check_restart functionality to always be part of the tasks lifecycle, so when a task is restarted the check watch is restarted.
Testing & Reproduction steps
mysql.hcl
Take the above job, run it, then restart the task. It will get restarted by the Nomad CheckWatcher due to failing healthchecks.
Links
Fixes #9431
Contributor Checklist
changelog entry using the
make clcommand.ensure regressions will be caught.
and job configuration, please update the Nomad product documentation, which is stored in the
web-unified-docsrepo. Refer to theweb-unified-docscontributor guide for docs guidelines.Please also consider whether the change requires notes within the upgrade
guide. If you would like help with the docs, tag the
nomad-docsteam in this PR.Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.
Changes to Security Controls
Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.