Glossary

incident

A discrete event during which a service was unavailable or degraded, with a defined start, updates, and resolution.

An incident is a specific, time-bounded event during which a service experiences downtime or degradation. Incidents have a start time, one or more status updates as they progress through investigation and mitigation, and a resolution time.

On a public status page, incidents are typically grouped by component (“API”, “Dashboard”) and labelled with a severity. Internal incident management tools track additional metadata: the responder, the runbook used, the mitigation applied, and the post-incident review notes.

The number of incidents per quarter and their average duration are the operational equivalents of code-quality metrics: lagging indicators of how well the system is built and how well the team operates it.