2024-08-09 · Mira Lee

Drills for Disk Alerts That Cry Wolf

Drills for Disk Alerts That Cry Wolf

Disk alerts rot because growth is slow until it is not. Run drills that include gradual fill and sudden spike scenarios. Measure how long detection to mitigation takes with real humans, not scripts pretending to be humans.

Second, chart growth weekly even when quiet. Linear projections lie on bursty workloads, but they still expose teams that never look until pager noise returns.

Third, document when an alert should wake someone versus wait for business hours. Not every partition is equal; treat caches differently from durable data volumes.

Limitation: cloud object storage classes are mentioned only briefly. This piece focuses on host-local filesystem behavior.

Tags: storage, alerting