Skip to content

Conversation

@ardelato
Copy link
Collaborator

@ardelato ardelato commented Jan 8, 2026

Description

Previously, nginx health probes were hitting /healthz which proxied to PHP-FPM/Laravel, creating a dependency chain where nginx would be marked unhealthy and restart due to database or PHP-FPM issues.

This caused:

  • False nginx failures when the problem was elsewhere
  • Startup race conditions (nginx checked at 45s, PHP-FPM ready at 60s)
  • Cascading failures and unnecessary restarts

Changes:

  • Add dedicated /nginx-health endpoint that returns 200 directly from nginx
  • Update nginx liveness/readiness probes to use /nginx-health
  • Reduce initialDelaySeconds (10s/5s) since nginx starts faster
  • Reduce timeouts to 5s for faster failure detection

Laravel /healthz endpoint remains available for application-level health monitoring and external tools.

Fixes nginx container restart issues in EKS.

QA Notes

Its uncertain if this will actually resolve the constant container restart issue, but we should at least validate the deployment still works in the test cluster first.

Reference: https://ifixit.slack.com/archives/C09FMSNS1/p1767808200093109

Previously, nginx health probes were hitting /healthz which proxied
to PHP-FPM/Laravel, creating a dependency chain where nginx would
be marked unhealthy and restart due to database or PHP-FPM issues.

This caused:
- False nginx failures when the problem was elsewhere
- Startup race conditions (nginx checked at 45s, PHP-FPM ready at 60s)
- Cascading failures and unnecessary restarts

Changes:
- Add dedicated /nginx-health endpoint that returns 200 directly from nginx
- Update nginx liveness/readiness probes to use /nginx-health
- Reduce initialDelaySeconds (10s/5s) since nginx starts faster
- Reduce timeouts to 5s for faster failure detection

Laravel /healthz endpoint remains available for application-level
health monitoring and external tools.

Fixes nginx container restart issues in EKS.
Copy link

@djmetzle djmetzle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR 🧑‍⚕️

@ardelato
Copy link
Collaborator Author

ardelato commented Jan 8, 2026

QA 👍

ArgoCD is still reporting healthy containers and pod.

image

NGINX is still working as well.

@ardelato ardelato merged commit 5ac8067 into hermes Jan 8, 2026
@ardelato ardelato deleted the fix--nginx-health-checks branch January 8, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants