The quiet hours

2m readPettabyte Group

Every engineer who has carried a pager knows the quiet hours — the weekend mornings, the holiday evenings, the deep part of the night when the system is supposed to be still. The quiet hours are when your pager is both least welcome and most honest. A buzz at 3am means something.

Or at least, it should.

The noise economy

Most monitoring tools get this exactly backward. They bias toward false positives because the vendor's downside of missing a real outage is larger than the user's downside of getting paged for nothing. Miss one outage and the customer leaves. Wake them up twenty times for nothing and they might, grumpily, stay.

This is a terrible deal for the person holding the pager.

Signal is subtractive

A useful alert isn't "something happened." A useful alert is "something happened, and you are the person who can fix it, and it will not fix itself, and the thing that happened is still happening."

Every one of those qualifications is subtractive. PingPane's job isn't to tell you everything. It's to tell you the least — the smallest possible number of notifications that still add up to a full picture.

What we decided

  • Two consecutive failures before we mail. One failure is weather. Two in a row is a signal.
  • One notification per incident. Not per check. You get one when the service goes down and one when it comes back. Nothing in between.
  • No "degraded" alerts. If we don't know whether the service is up or down, we don't wake you up to ask you.
  • Recovery is its own alert. You need to know when it's over. That second email is as important as the first.

The cost of an unread alert

The real problem with a noisy tool isn't the notification itself. It's what happens over the next six months: you stop reading the notifications. You set up a filter. You mute the app. You ignore the pager at 3am.

And then the real incident arrives, and the alert lands in the same stack as the noise, and you miss it, because your pattern-matching brain long ago learned that PingPane — or Pingdom, or whoever — was probably wrong.

The goal is not that you read every email we send. The goal is that we never send one you wouldn't want to read.

If we get this right, the quiet hours stay quiet.

Related