-
Notifications
You must be signed in to change notification settings - Fork 177
Runbook
Susan Valente edited this page Apr 17, 2026
·
31 revisions
This document describes common procedures for operation of Data.gov.
- Keypair Rotation
- TLS SSL Certificates
- WAF and rate limiting
- Break Glass deployment
- Nessus Agent Upgrade
- BSP or FCS requests
We designate two types of alerts:
- Critical -- drop what you are doing; an outage is happening or action is required to prevent one. Critical alerts go to #datagov-alerts.
- Warning -- indicates a problem but can wait until the next business day. Warnings go to #datagov-alerts as email notifications.
Triggered when a host is not reporting to New Relic for 5 minutes.
- Check New Relic for obvious issues (high memory or CPU load)
- If the app is down, check cloud.gov for application status and recent deploy activity
- Restart the application via cloud.gov if needed
- If unresolvable, open a ticket with BSP or FCS requests
Triggered when 4xx or 5xx error rates exceed thresholds.
- Check New Relic for which service is affected
- Review cloud.gov logs for the affected application
- Check recent deploys for a likely cause
- Escalate to the contractor team via #datagov-dev if not resolvable
SecOps performs regular scans on our hosts. If the ISSO contacts us regarding IPs that could not be authenticated, see Nessus Agent Upgrade for current procedures.
Note: This runbook is a stub and needs contractor input to document current alert resolution procedures for the cloud.gov-based stack.