-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Description
We’ve recently seen customers implement external monitoring tools (e.g. Route53 health checks, Google Stackdriver Monitoring) in ways that generate expensive, uncached requests to /, /feed, or other dynamic pages. During traffic spikes, these requests can place significant and unnecessary load on the backend and negatively impact site performance.
To avoid this, clients should be encouraged to expose a lightweight, static endpoint (e.g. /health) that reliably indicates site availability in a cheap and safe way.
We should create documentation that provides clear guidance and best practices for implementing external monitoring without causing performance or stability issues on the sites being monitored.
This issue is to investigate what kind of guidance and best practices we can give.
Acceptance Criteria
- Identify Clear risks and constraints of external monitoring
- Define recommended best-practice guidance
- Document do’s and don’ts for client use
Depending on the above results and if any approval/discussion is needed the following criteria might be moved to another ticket: - Update the documentation: https://docs.altis-dxp.com/cloud/healthchecks/
- Write the FAQ
- Write a blog post reminding users about our healthcheck API along with documentation and newly created FAQ
Ready for Work Checklist
Is this ticket ready to be worked on? See
the Play Book Definition of Ready
- Is the title clear?
- Is the description clear and detailed enough?
- Are acceptance criteria listed?
- Have any dependencies been identified? (Optional)
- Have any documentation/playbook changes been identified? (Optional)
- Is an estimate or time box assigned?
- Is a priority label assigned?
- Is this ticket added to a milestone?
- Is this ticket added to an epic? (Optional)
Completion Checklist
Is this ticket done? See
the Play Book Definition of Done
- Has the acceptance criteria been met?
- Is the documentation updated (including README)?
- Do any code/documentation changes meet project standards?
- Are automatic tests in place to verify the fix or new functionality?
- Or are manual tests documented (at least on this ticket)?
- Are any Playbook/Handbook pages updated?
- Has a new module release (patch/minor) been created/scheduled?
- Have the appropriate
backportlabels been added to the PR? - Is there a roll-out (and roll-back) plan if required?