[UPDATE: 9-17-10, 8:40am]
As of 6:47am ET, things appear to be back to normal. The RSS-to-email and autoresponder campaigns that we trickled out overnight have all been sent. Our delivery queues are back down to normal levels. If you have a campaign that did not get sent, please contact support, and we’ll look into its status for you. Thanks for your patience throughout all this.
Summary of what happened yesterday…
[UPDATE: 9-16-10, ~9:30pm]
Yesterday at 10pm ET, we launched MailChimp v5.3. The upgrade process went well. It only took about 30 minutes to finish. But then we started to experience extremely heavy loads the next morning around 10am. We quickly had a massive backlog of emails on about 10% of our IP addresses. This doesn’t sound like much, but it’s frustrating if you’re waiting on a test or a campaign, and it’s stuck in one of those particular queues. The rest of the system performed okay. Instead of roughly 3.5 million deliveries per hour, we were averaging 3 million. But every time we got close to clearing out those queues, another surge of traffic would hit. One surge caused an outage, which made one of our MTAs (a big email delivery server) reboot itself.
When these things reboot, they can usually remember where they left off, and resume their job. But sometimes, they get mixed up, and they resend campaigns. Unfortunately, a handful of our customers’ campaigns sent multiple times. We apologize to our customers, and to your recipients, for that. Contact our support team, and we’ll make it up to you.
Unfortunately, it didn’t end there. Around 8:30pm, we had another surge in volume (this is also when we normally start running some background processes) which caused yet another brief outage. On the one hand, this was downright miserable. But on the other hand, it ran some processes that ultimately helped clear things up. So we were finally able to get a grip on things and spread out the volume to be less “spikey.”
So. Things seem to finally be calming down in terms of the server-gremlin whack-a-mole game we’ve been playing all day. Emails are sending, albeit a little slower, but the load is a lot better now. And we’re watching things closely to make sure no more moles pop up. Again, very sorry for the inconvenience.
These are the unexpected realities of major changes… you do your best to prepare for as much as possible but inevitably something else can always go wrong. For those that read this, don’t measure MailChimp on what little goes wrong but what normally goes right.
Also, it should be obvious but I just want to bring attention to the fact that if you were following these issues that Ben (founder of MailChimp – and presumably Head Chimp) has been personally answering blog complaints and FB complaints throughout the day… in a day and age of corporate excess and lack of ownership at the top on down it is refreshing to see a Top Dog (chimp) taking ownership and getting his hands dirty when things just don’t go right…
Kudos.
Hey Ben,
I really do appreciate the ‘Chimp’ and the service you provide. We made a big switch leaving CM after many years and coming over and I don’t regret it… and these things happen. HOWEVER I do want to say… you guys picked just the xxxxxxx time to make a big update… really was this the low load time on the servers over the course of the week?… I assume you knew/thought that if you went down this was going to be the least impactful to customers…
Frankly I find it hard to believe you are pushing more traffic Saturday night at 2am… Seems like weekday traffic would crush weekend traffic…. and luckily for us… we had a VERY tiny e-mail impacted (granted it was significant to the client that was impacted… 2 day event… and today was the kick off… you can’t really say that is nothing)
Anyhow, I just expect you are making upgrades at times that are best… I’m still perplexed by selecting a Wed. evening… the risk seems large.
All valid points, Rob. When we have an update that affects the MTA, we wait till weekends. Often, even for updates that don’t affect the MTA, we’ll perform a bunch of background processes and updates on the weekend, to get the underlying code done and out of the way. Then we start flipping switches on the next Monday (where stuff gets revealed in the UI). When the entire office is fully staffed on a weekday, we can provide proper depth of support. Anyway, this was an update that wasn’t supposed to affect the MTA. But boy did it. We’re looking into why. Sorry.
Will this affect timed campaigns at all? I’ve got one that needs to go out at 11am Eastern tomorrow on the dot if at all possible….should I expect a delay?
Happy to report both my scheduled campaigns went out right on time this morning. Kudos to the Mailchimp team for getting the problems solved quickly!
is my 4PM message going to resend, or should i set it for midnight eastern and hope it sends in the next hour? thank you!
Hi Jennifer, it should eventually send. We don’t recommend rescheduling.
You guys are only human… and are adding new features pretty darn quickly. So don’t beat yourself up
Really? Are you serious? How disappointing! That broke my heart. I though they were chimps 8_(
=====
Despite all these problems, I’m somehow happy that it all had happened. I’m very confident that it will serves to make MailChimp even better, they will learn from whatever they made wrong this time and will fine tune their processes for the future.
Hmmm. My scheduled campaign didn’t send when it was supposed to, at 8 p.m. Support told me it was in the queue. Then when it still hadn’t sent a few hours later, I asked them to confirm it — and was told no, it wasn’t in the queue and I should resend. So now I’ve scheduled it for tomorrow morning … but reading Ben’s note above to Jennifer makes me concerned my subscribers will receive it twice … which will of course cause unsubscribes. Which will really upset me, as a new customer who had heard great things.
I’d love to get a definitive answer about what to do, but given that it’s 1:30 a.m., I’m assuming that’s not possible. I’m frustrated.
Hi Alison, if you’ve rescheduled, I’d leave it as-is. Am asking the team to look for your campaign, and make sure it’s not duped. Sorry for the confusion and all the inconvenience.
My RSS campaign has not been sent out either. It’s 2.5 hours late. Is there still a backlog?
As of about 6:40 am this morning, it should be sent. If you don’t see it getting delivered by now, contact support so they can check on its status. Sorry!
I sent my campaign a couple of hours ago – and it has confirmed as sent, but I have registered 0 opens so far … I think this means it must be stuck somwhere and not yet delivered. How long are delays going to continue? I really need it to arrive this morning.
Thanks
Oliver
Hi Oliver, as of about 6:40 am this morning, all campaigns in the queue should be sent. If you don’t see it getting delivered by now, contact support so they can check on its status.
So I read thru this entire blog posting, thinking, why are they telling me this ??? Is this really relevant to me & my operation? I mean, I, & many others have my own website to administer.
So who cares that you had these problems. Doesn’t everyone? Of course !
If it’s an FYI, only to summon good customer support, then 1 line would do it. But instead, you turn this into prose that make it sound like I have nothing better to do, nothing to read, no other blogs to subscribe to. LOL
Next time, save everyone time, post your 1 liner & work on making your system effecient, so you can keep your current customers … right ?
Regards
TDK
I sent my campaign at 4:17pm 9-16, got a first reply from client at 5:25pm, so far so good…but I still haven’t received it at my personal address. now I wonder if there are “missing” emails. Tried to search activity by email address but was kicked off. Anyone else notice missing pieces?
Ned, I also have a campaign sent on the 16th around that time that I believe has not been sent out entirely. My emails plant addresses have not received them but there have been a few opens in the reports. I have a ticket in with support and I suggest you do the same.
No more bananas for you guys! LOL
I have still not gotten any of the test mailers that I send out yesterday, and the campaigns I sent out this morning have not been received yet either.
DKN, please contact our support team with your account info, so they can check on your campaign. Sorry for the problems. (I just gave them a heads-up about your account, but they’ll probably ask for a few more details)
Perhaps when there is going to be an update that hits the MTA’s (I understand you didn’t know this one would), the scheduler feature of MailChimp should mention it to us when we set the schedule, and give us the option of picking a time outside the maintenance window? Also, perhaps, a note that the servers are having problems, if it’s a “send now” email and there’s something going on in the MailChimp network.
We’ve been very happy with MailChimp, and if this is the worst upgrade process you’ve had, that’s not bad at all. Yes, it was disappointing to not have our emails go out exactly when we wanted them to, but it’s not the end of the world, either.