2025-11-12 · Haneul Park

Runbook Rhythm for Sleep-Deprived Rotations

Runbook Rhythm for Sleep-Deprived Rotations

Rotations go sideways when runbooks pretend every outage is identical. We coach teams to build rhythm: a predictable order of checks, a place to scribble deviations, and a hard stop when evidence says escalate.

Paragraph two stays practical. Each step should name the command, the expected healthy signal, and the unhealthy signal in the same breath. That pattern keeps scribes aligned during bridges and prevents the loudest voice from skipping half the checklist.

Paragraph three acknowledges limits. Runbooks cannot replace judgment when power or cooling fails. They should explicitly say when to stop poking hosts and open a facility ticket instead. That honesty saves credibility with leadership.

Finally, treat runbooks like code: small diffs, reviewers, and retirement dates. If nobody has run a section in six months, archive it or rewrite it with someone who still touches that stack weekly.

Tags: runbooks, on-call, documentation