Jul 25, 2011

Delivery Speed, Part 1

Over the past year, I’ve seen MailChimp grow and grow. Along with all our new users, it seems like we see bigger and bigger lists every month.  In terms of email volume, our current daily average is now higher than our daily peak was one year ago.  To keep up with this explosion, we’re warming up new IPs which means more queues and more connections for our users.  Instead of asking everyone to deal with longer wait times, we’re actually trying to speed things up.

That leads to a very interesting question… How are we doing?  After sifting through 500 or so IPs and crunching the numbers on over a quarter of a billion emails, I might have an answer.  There are graphs with colored lines and all kinds of explanations below, but within my data set one thing is true.  75% of the emails we receive are delivered within 5 minutes or less.  That’s pretty cool.

Begin Technical Babble

Before I throw the charts at you, it might help if we went over a few basics of how MailChimp delivery works.  When you send a campaign, your list is divided up and distributed amongst our shared IP pool.  Each IP has a FIFO queue for outbound emails, and the size of these queues directly affects your delivery time.

Clearly, we should divide your list evenly amongst all our IPs.  That would give us the fastest delivery time for sure.  Yep, it sure would.  It’d be really fast.  Are you getting the feeling we don’t do that?  Good, now let me explain why it isn’t the best solution.

At this stage, your campaign looks a lot like a highway.  One of the things we like to do is match your subscriber’s activity rating with the reputation of our IP.  We’re putting your best subscribers in the fast lane so they have the best possible chance for a successful delivery.  Of course, this means we can’t divide your list evenly.

If you’re wondering why we don’t send your campaign over all of our IPs, I have a simple answer.  “Hell is other people.”  I’m pretty sure Sartre would approve of me borrowing that phrase.

You see, every now and then we get a bad apple.  Sometimes it’s accidental and sometimes it’s malicious, but these rogue apples can get our IPs blocked.  By optimizing the number of IPs that any one campaign touches, we protect all of our users from the odd mushy apple.

More Complicated

This is where the highway analogy starts to get out of control.  Not only is your list divided among several IPs, but each IP has a separate connection for every receiving domain in its queue.  The receiving domain is the ISP your subscriber uses, like @gmail.com or @hotmail.com.  It’s worth taking a look at your list and noting which domains you send to the most.

The good news is that these connections can all send concurrently.  Of course, the ISPs themselves often throttle incoming email to their own preference.  You’ll notice in the charts below that one ISP in particular throttles heavily.

So it’s like there are different highways, but the lanes are kind of… No, it’s like each ISP is a different car manufacturer, and the speed limit is…  Okay, maybe if we all had flying cars and the toll booth was like a filter … Ugh, I honestly can’t think of a good way to picture this.  Feel free to make suggestions.

On top of all that mess, you’re not the only one sending a campaign.  Your emails are being queued along with everyone else who just hit the send button.  We do everything we can to minimize the queues, but there are certain times of day when our volume is off the chain.  It’ll make you think twice about the "schedule delivery" feature.

Actual Data

I took two weeks of data from our pool of shared IPs and added them up by hour.  The first chart shows the volume of emails we sent.  You can see our heaviest hours are between 9am and noon (EST), but from experience, that range shifts back an hour depending on the time of year.

The next few charts measure the time interval between when an email was queued and when it was accepted for delivery by the email provider.  Keep that last part in mind.  It takes two to deliver.  We send the email to the ISP, and they deliver it to your subscriber.  The first part I can measure.  The second part is anyone’s guess.

You can see that 85% of the emails queued between 10:00am and 10:59am were sent within 5 minutes of being queued.  For Yahoo, it took 45 minutes to deliver the next 10% (looking at the 95% graph).  To send the next 4.5% (looking at the 95.5% graph) of Yahoo emails took a whopping 285 minutes.  Yikes!

MailChimp Email Volume per Hour Time to Send 75% of Emails Received (per Hour) Time to Send 85% of Emails Received (per Hour) Time to Send 95% of Emails Received (per Hour) Time to Send 99.5% of Emails Received (per Hour)

Notes

Why don’t you see a graph for 100%?  Well, it turns out that a lot of lists contain one or two addresses that don’t exist or can’t receive email anymore.  The way we figure it, there’s always a chance.  We’ll try to get the ISP to accept the email for up to three days, and sometimes it actually works.  It makes my charts look horrifying though, so I cut the numbers off at 99.5%.  Problem solved.

You may have noticed that Yahoo loves to throttle email.  If your list has a lot of Yahoo addresses, you’ll want to take this into account.  For those looking at the “All” line, it’s simply the sum of all the emails together.  As the percent completion goes up, the last few thousand emails tend to be heavily weighted with Yahoo addresses.  The “All” line reflects this by inching up as well.

I’m pretty happy with these delivery times. For most ISPs, 95% of the emails in our queues are sent out in 5 minutes or less. It says we’ve done a good job at balancing reputation and speed. In the second part of this series, I’ll go over all the stuff that happens to your campaign before it ever hits the queue. It’ll be a lot like going behind the scenes at Universal Studios. There will be flames shooting up out of nowhere and you’ll see that Freddie is an awesome robot.

Go to Delivery Speed, Part 2