Conversation
docs/releases/status.md
Outdated
|
|
||
| The Severity of incidents is the product of number of users affected (for 100 users, N = 1), magnitude of the effect (scale 1-5 from workable to no service), and the duration (in hours). Severity below 1 is LOW, between 1 and 100 is SIGNIFICANT, and above 100 is HIGH. The severity is used to decide how much we invest in preventative measures, detection, mitigation plans, and rehearsals. | ||
|
|
||
| ## 2025 October 20th: AWS Outage in US East (No effects, brief review) |
There was a problem hiding this comment.
I don't think it's relevant to mention a non-outage?
Some infra goes down every day somewhere around the world and we don't mention it; this is not different from my perspective.
There was a problem hiding this comment.
It's a fair point, where do we draw the line. I can take this out.
docs/releases/status.md
Outdated
|
|
||
| Handwriting in response areas (but not in the canvas) did not return a preview and could not be submitted. Users received an error in a toast saying that the service would not work. All other services remained operational. | ||
|
|
||
| ### Timeline (UK / BST) |
There was a problem hiding this comment.
Style: I don't know why but this title isn't picked up as Markdown?1
There was a problem hiding this comment.
Will fix on next push.
| - Monitoring immediately after pushes, and approximately an hour after pushes, should be standard procedure. | ||
| - Integration tests would help, although they are considered outside the scope of this project at the current stage due to the resource required to continually maintain those tests | ||
|
|
||
| N=0.2, effect = 2, duration = 5. Severity = 2 (SIGNIFICANT.) |
There was a problem hiding this comment.
Should that line be there?
(Great to see how you're using maths to pick a severity level!)
There was a problem hiding this comment.
I agree it's not a perfect place for them, but I'll leave them for now for transparency.
No description provided.