Last week, we had some hardware failures at our US1 data center that affected about 400,000 users (here’s the blog post with all the related updates). Today I want to post an announcement about some upcoming server maintenance that’s related to that outage, plus provide a little followup to what happened.
Planned Downtime: January 22, 1am ET
First, we’re doing some server maintenance at our US1 data center on Sunday, January 22nd at 1am ET (see this in your timezone). The maintenance will require downtime, but should only last a few minutes. During those few minutes, MailChimp will not be available for US1 users at all. Their campaign links will not work, nor will new subscribes be tracked. Again, it should only be a few minutes before everything’s back online. This upgrade will basically help us rebound faster should a similar outage occur again (heaven forbid).
So what exactly happened that day?
To recap, last year we invested in super fast SSD equipped servers to handle our increasing traffic. They helped us handle a TON of load, and sped things up nicely through the holidays. Then on January 2nd, several of those servers just up and died all at once–for no apparent reason at all. It just didn’t make any sense, and we’ve never experienced anything like this before. We admittedly didn’t spend much time investigating the cause, because we were busy taking out those SSDs and replacing them with 15k rpm SAS drives (plus a bunch more RAM).
Then a few days later, we saw this news: 64GB Crucial M4s crashing after 5,000 hours, fix coming
Read More