MTTR Explained: Mean Time to Recovery as a DORA Metric
Two production outages, same root cause: a bad config push that crashed a payments service. Team A spent 2 hours 14 minutes restoring service. Team B was back in 6 minutes. Team B's MTTR wasn't lower because they had smarter engineers. They had a one-command rollback rehearsed monthly, a runbook pinned in the on-call channel, and write access to production already granted to the responder. That 134-minute gap is what MTTR measures, and what separates the DORA 2023 State of DevOps Report elite cluster from everyone else.
