Jan 22, 2008

How To Apologize for Server Outages

Server problems and billing errors happen to everybody. I wouldn’t wish them on my worst enemy. Actually, I know how embarrassing and interruptive they are, so yes—I would wish them on my worst enemy. But they happen to everybody.

When there’s a goof-up, any decent business owner wants to apologize to his customers. But there’s a right way and a wrong way to apologize…

Here are three recent "oopsies" that caught my eye. Two were server outages that were handled well, and the other was a billing error that was handled horribly.

Bad: Dreamhost Billing D’oh!

Better: 37Signals Server Outage – What happened this morning

Best: JoelOnSoftware’s "The Five Whys"

First, I should start off by saying we’ve had all of the above problems here at MailChimp. Run an e-commerce site long enough, and they’re inevitable. And when I see one of my competitors posting something to their blog about a server problem, I don’t laugh. I cringe, because I know what they’re going through (and because I respect server-karma). Just to demonstrate how un-perfect we are:

  • We once had a hardware failure where some switch-over thingy (ironically, it’s designed to prevent outages) failed. It took hours to replace. Then, immediately after replacing it, the hard drive it was pointing to failed. Luckily, there was a tape backup. But tape backups take hours to load. We learned our lesson many times over: have multiple servers, multiple switch-thingies, and multiple copies of absolutely everything.
  • Once, we discovered our automated billing system wasn’t billing anyone at all. After slapping ourselves on the forehead (it was one of those "Oh, we forgot to plug it in" moments), we jump started it, which resulted in lots of customers getting billed for several months at a time. Nobody was overcharged—they were charged what they owed us anyway. They just got it all at once, with no warning whatsoever.

We’re far from perfect, and we’ve given our fair share of apologies. So I thought I’d share some of the lessons I’ve learned, and observations I’ve made over the years.

If you run a web application, I hope you find this somewhat useful…

Update Frequently. But keep your updates useful

We use a 3rd party vendor for our live chat support. We love them, and highly recommend them. But once, they had an outage, and we were left in the dark. We were furious! We were chatting with customers, and it looked like we "hung up" on them. We wanted to know what the heck was going on. Now. To their credit, they learned from that incident.

The next time they had an outage, they emailed updates every hour to keep us informed. At first, it was nice to know that they were on top of things. But then it just got annoying. Their email updates were too apologetic. There’s only so much of "We’re very sorry for the interruption. We know how important this is to you. We’re doing all we can." that a person can take. Just stick to the facts, and don’t get too gushy.

To make it worse, sometimes these guys have outages that don’t seem to affect us. They’re happening to some server for some other group of customers in some other country. But we get all the "I’m sorry" emails anyway.

Eventually, I think they got the right formula. Now, if there’s an outage, we get updates, but they’re technical, and not too emotional.

How do you update customers when your servers are dead?

If your back-end system (shopping cart, application, database) dies, but your front-end website is still alive, you can post a notice wherever people login to use your service.

During their recent outage, 37 Signals posted a nice updates page (here’s a screenshot):

public-apology-37signals.png

I also hear they used Twitter, which is a pretty creative use of that site for business.

You might also consider an externally hosted blog (FogCreek uses TypePad) and an externally hosted email solution (ahem, like MailChimp). If you do setup an external blog, you have to constantly tell your customers about it. Put a link to it in the footer of all your email newsletters or something. Some companies setup a subdomain, like "status.company.com"

During an outage, if you’ve got a toll-free number, make sure your customer service people are talking with your server people constantly during the ordeal. If those two groups are offsite and working remotely, get them talking over the phone constantly.

Give your customer service team the corporate American Express card. Authorize them to buy gifts for key customers who mean a lot to your business. Or, make sure they jot down "the most pissed customers" on a list, so you can email a personal apology to them when the dust settles.

And buy everyone in the office pizza, so that they don’t have to leave until everything is fixed.

No Empty Promises

Don’t just say that "We are doing everything in our power to prevent this from happening again." You can’t prevent it. We all know crap happens. So if you don’t list specifics, we won’t believe you.

Tell us what you’ve changed to keep this from happening again. If you’ve changed nothing, and it was all a fluke, say nothing.

A good response posted by a 37Signals staffer:

"It was no doubt a mistake not to have a second load balancer sitting ready. We’re working with them right now to ensure that we have redundant hardware for all the network pieces ready by the end of the day."

They list what went wrong, what they’re doing to fix it, and by when. No more F.U.D. (fear, uncertainty, doubt).

Or, take the Joel on Software approach, and list nothing but facts. Tons and tons and tons of facts. He explains exactly what happened, teaches us all about servers, switchover thingies that never work, and worthless SLAs. By the end of the post, you’re just nodding your head, and saying "Yep. Amen brother." Then he lists what they’re doing to prevent this from happening again. Notice there are no "I’m sorries" in that post, but we feel their pain, and we know they’re sorry.

Remember the Tylenol scare? People got poisoned and died. Tylenol apologized. Then, they created a new tamper-proof bottle. That’s another nice example of how to say "We actually did something to prevent this from happening again."

When to be funny, and when to put on the "serious" hat

Then there’s the tone of your apology. Should you be funny? Or serious and "corporate"? If you know anything about MailChimp, you know we’ve got some attitude around here. Here’s what I’ve learned about the tone of my apologies:

  • When your mistake makes you (and only you) look stupid, you can be funny.
  • When your mistake makes you look stupid, AND it makes your customers look stupid in front of their customers, do NOT be funny. Wear a tie, comb your hair, tuck your shirt in, and talk like a grownup.

Just read through all the comments under that Dreamhost blog post to see what I mean. People are blind with rage, and here’s what they’re seeing in the apology:

  • Casual, flippant tone
  • Homer Simpson cartoons
  • Funny face photos

Dreamhost later posted an apology for their apology.

Compare all this to one of Rackspace’s recent apologies:

Measures Taken and Future Actions

Things that make Rackspace’s apology better:

  • Serious tone
  • Mea culpa attitude, followed up with
  • What we’re specifically doing to prevent it, by when
  • Refunds – they speak of "millions of dollars" of refunds.
  • Heads rolled. Ouch.

Don’t point fingers

When we had one of our biggest server meltdowns a few years ago, I pointed the finger at our data center (Rackspace). Big mistake. I still remember letters from our customers who wrote me back and told me that was a stupid thing to do (Hi Frank).

37Signals handled their apology almost perfectly, except for their mention of Rackspace. And you can tell by the replies at the bottom of the apology that users thought that was a very uncool thing to do (even if maybe it was Rackspace’s fault). Eventually, 37Signals updated the page and said "the buck stops here, and we take all the blame."

Even when it really, really, really isn’t your fault, don’t point fingers. Fix, refund, and move on. That’s all your customers want to hear.

Does it really have to be public?

When we screw up at MailChimp, we do indeed apologize. But we don’t always do it on our blog for the entire world to see. It’s not because we’re too embarrassed. It’s because if we all apologize on our blogs every single day for every single mistake, nobody will take apologies seriously anymore.

We send an email only to those affected (when we can track that). Or we post an apology inside the product interface for those customers. We just don’t see any sense in telling the entire world about all our mistakes.

The only reason I can see for a public apology is if it can somehow turn into a positive thing, like a big boost in "community" (blog comments, forum replies, etc.) or media attention. But for something like that to be newsworthy and positive, you really need to do something BIG to stand out (think Tylenol big).

The casual, "We’re really, really sorry, but isn’t it cool that we’re apologizing in public on our new blog?" is just getting tiresome. And if that comes off as a little too negative, I’m really sorry.