Incident Management Best Practices
If you're here, chances are you're worried about how your team keeps the software it builds running. Everyone needs a plan for when things go wrong, and hopefully this guide will help you with yours.
Most of this guide I learned while employed by Atlassian, who followed these incident values:
- We know there is a problem before our customers do.
- Escalate, escalate, escalate (and communicate with customers).
- Shit happens, clean it up quickly.
- Always blameless.
- Never have the same incident twice.
Chapters
On Call
Incident Response
- Communicating to Users During Incidents
- Writing your first runbooks
- Guidelines for writing better runbooks