Engineering

What is Mean Time to Recovery?

Average time to restore service after an incident.

How to calculate it

Calculate Mean Time to Recovery as: Total downtime / Number of incidents. Pull the inputs from your connected data and track the trend over time in your dashboard.

Examples

Example 1

120 minutes total downtime across 4 incidents -> 30-minute MTTR.

Example 2

120 minutes of total downtime across 4 incidents -> 30-minute MTTR, within the elite range thanks to good alerting and runbooks.

Why it matters

Mean time to recovery (MTTR) is the average time to restore service after an incident and is a DORA measure of resilience. Fast recovery limits the customer and revenue impact of inevitable failures. Excluding detection time understates true MTTR and can mask slow alerting.

Benchmark context

Elite teams recover in under one hour; longer recovery times point to gaps in monitoring, runbooks or on-call processes.

Common pitfalls

Excluding detection time.

Related KPI guides

Change Failure Rate

Engineering

Uptime

Engineering

Bug Escape Rate

Engineering

Turn KPI definitions into governed dashboards

Metricwise helps teams define metrics once, reuse them across dashboards, and ask trusted business questions in plain English.

Get Started