ops: backup failure alerting — healthchecks.io dead-man's switch#148
Merged
GitAddRemote merged 1 commit intomainfrom Apr 28, 2026
Merged
ops: backup failure alerting — healthchecks.io dead-man's switch#148GitAddRemote merged 1 commit intomainfrom
GitAddRemote merged 1 commit intomainfrom
Conversation
Adds infra/docs/backups.md covering all 7 sections required by #133: - What is backed up (PostgreSQL, nightly at 3AM UTC + pre-deploy) - Where backups live (B2 bucket path structure) - How to verify backups are running (log tail + healthchecks.io) - What to do when a backup alert fires (5-step checklist) - Retention policy (180-day B2 lifecycle) - How to silence a false alarm (healthchecks.io pause/mute) - How to restore using restore-db.sh Also updates infra/README.md to reference #133 and the new doc. Closes #133
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
infra/docs/backups.md— the operational runbook for the backup system covering all 7 sections from the issue spec: what is backed up, where it lives, how to verify, alert response checklist, retention policy, silencing false alarms, and how to restoreinfra/README.mdto reference ops: backup failure alerting — notify when nightly backup cron fails #133 and link to the new docContext
The backup script (
backup-db.sh), healthcheck ping logic,BACKUP_HEALTHCHECK_URLsecret injection inrelease.yml, and cron job setup inbootstrap-vps.shwere all already in place from earlier issues (#125, #128). The only remaining code-deliverable for #133 was the runbook.The remaining DoD items are manual operational steps:
BACKUP_HEALTHCHECK_URLto GitHub production environment secretsTest plan
infra/docs/backups.mdfor accuracy against the live scripts (backup-db.sh,restore-db.sh,bootstrap-vps.sh)restore-db.shusage ($0 <b2-path>)rclone lscommand uses correct config flaginfra/README.mdconflict resolved cleanly (ISSUE-133 blurb appears before the Redis Persistence section)Closes #133