Operator recovery
The system is back. The people are not. This is the part most teams skip. The bill for an incident arrives later than the incident, and it arrives in the people who held the line.
The technical incident has a clean ending. The human one does not. Adrenaline carries the team through the response. The bill arrives in the week after, sometimes the month after, sometimes a quiet resignation letter three months later from the person who held the line on the longest call.
Operator recovery is not a nicety. It is the difference between a team that learns and a team that loses people. The same operators who held the line through the incident are the ones you need next time. They are not interchangeable. Treat them as the asset they are.
Most organisations get the first week wrong. The technical work stops, so the operational pressure to be back at full speed begins immediately. Time off is interrupted because someone has a question. Workload returns at full volume because “the incident is over”. The hero narrative quietly takes hold and the people who carried the most weight are the ones who get asked to carry the next round. The bill from this gets paid later, in retention, in attention, in the quality of the next response.
Pause. Adapt. Continue. The middle word is doing real work.
Checklist
The week after
- Responders take real time off. Not "available on Slack" time off.Critical
- Non-incident work is paused or reassigned for the respondersCritical
- One-on-one check-in with each responder, not a group ritualCritical
- Sleep and food are the first ask. Nothing else until those.Critical
- Acknowledge the people at home who held things together
- Meeting load lightened. Optional means optional, in writing.
The conversation that is not the review
- Asked each responder how they are. Listened to the answer.Critical
- Asked what was hardest, separate from what was hard to fixCritical
- Watched for pressure signals (sleep, irritability, withdrawal, dark humour, isolation)Critical
- Clinical support and EAP details put in their hand, not just mentionedCritical
- Peer support contacts named (industry groups, trusted colleagues)
- Each responder's line manager briefed on what they were doing during the incidentCriticalSo no one returns to "where have you been all week" energy.
Structural recovery
- On-call rotation reviewed for fairness during the incident
- Pager load reset and on-call cover arranged for the next two weeksCritical
- Recovery time protected from new high-priority projectsCritical
- Some of the lessons turned into less work, not just more controlsCriticalIf every incident adds work, the team breaks before the system does.
- Backfill considered for anyone whose recovery is going to take longer than a week
- Time protected for the team to process and write down what they learned
What actually breaks
- Time off is real. No one is "checking in just briefly" from their week off.Critical
- The team is not returned to full workload as soon as the incident is containedCritical
- The invisible cost (sleep debt, family strain, lost focus) is named in writing, not just felt
- The "hero" narrative is avoided. Heroes are a sign the system asked too much of one person.Critical
- Pressure performance is not rewarded over sustainable performance in the next review cycleCritical
- Quiet leavers are watched for. People who hand in notice three months later were often the load-bearing ones.CriticalThis is the most common late cost. It will not show up in the post-mortem.
The long arc
- Note the date. Anniversaries can carry weight you did not predict.Critical
- Check in again in 30 and 90 days. Different things surface at different times.Critical
- Acknowledge any career impact (promotions deferred, projects lost, reputation)Critical
- Thanks given specifically, not generally. Names. What they did.Critical
- When the incident is talked about later, it is told truthfully. The hard parts are not edited out.