incident
A discrete event during which a service was unavailable or degraded, with a defined start, updates, and resolution.
An incident is a specific, time-bounded event during which a service experiences downtime or degradation. Incidents have a start time, one or more status updates as they progress through investigation and mitigation, and a resolution time.
On a public status page, incidents are typically grouped by component (“API”, “Dashboard”) and labelled with a severity. Internal incident management tools track additional metadata: the responder, the runbook used, the mitigation applied, and the post-incident review notes.
The number of incidents per quarter and their average duration are the operational equivalents of code-quality metrics: lagging indicators of how well the system is built and how well the team operates it.
downtime
Any period during which a service is unavailable or degraded for end users. The inverse of uptime.
MTTR
Mean time to recovery — the average elapsed time from an incident’s start to its resolution. A core reliability KPI.
public status page
A customer-facing page that shows the live availability of a service, plus the history of past incidents.