Jul 26, 2011

Delivery Speed, Part 2

In my last post, I talked about how quickly email moves through our IPs.  To get the data, I focused on our shared IP pool and basically ignored all of our Dedicated IP users.  Today is going to be different.  Today we’re looking under the hood to see the cool tweaks and updates the Developers have been working on.  These changes impact every single MailChimp user, but those of you with large lists will want to pay special attention.

Before a campaign can go out, before it even hits the queues, our system has to build it.  Build what?  Doesn’t the user build the campaign?  Well, I guess I’d say the user designs the campaign. It’s our job to write all the headers, define all the merge tags, and package all kinds of related info into a huge data payload so your awesome content can get to its final destination.

The Payload

If your list has a hundred thousand subscribers, building that payload can be a significant task.  To keep things simple, we’ve always built the whole thing at once before sending it on.  That way, we could easily track whether a campaign had gone out or not.  Yes the payload was built and went out, or no it didn’t.  Unfortunately, for users with sizable lists, that meant waiting for the entire payload to build before sending the first email.  Booooo!

When users click the "send" button, they kind of expect the campaign to start sending.  After all, they didn’t click the "build my campaign" button.  That’s why we’ve started chunking the payloads.  For those with huge lists, we build multiple payloads in chunks of ten thousand.  Each chunk gets moving immediately, so your campaign can start going out sooner.  The process does involve extra checks and balances on our part, but the results are pretty cool. We’ve supercharged your large campaigns.

The Validation

Okay, so we’ve built the payload chunk, and everything looks good.  Of course, we’re not done yet.  It’s not enough to create the payload, we have to make sure the payload is appropriate.  I’m talking about really naughty stuff here, like non-unicode characters.

I believe I mentioned before that the payload is, among other things, a huge list of headers and merge labels.  I’m talking about the all the customized unsubscribe links, first names, time zones, and all kinds of stuff.  All this data has to be in the correct format, with the correct character encoding, and blah blah blah.  The number of ways to break a system are exhaustive, and the number of ways to send email without breaking anything are limited.

Let’s just say our developers have streamlined the myriad little checks and validations.  I’d go into detail, but the intricacies of PHP extensions are not a journey I want to explore right now.  In short, MailChimp is faster.  A LOT faster!

The Routing

Okay, all the merge tags line up, and all the headers are perfect, but the work’s not done yet.  We still have to get your campaign to the right IPs.  The router does the work of actually dividing the payload and sending all the various parts to the best possible starting point.

The upgrades we made here fall into the "simple but profound" category.  The Router grabs a payload chunk and splits it up.  Each chunk actually contains "hints" at where every subscriber should go.  For dedicated IP users, it’s their dedicated IP.  For shared IP users, it indicates the IP reputation that would best match the subscriber activity.  We call it a "hint" because the Router has the final word here.  If worse comes to worst and your dedicated IP isn’t available or that perfect IP is full, the Router can override the "hint" and still get your campaign moving.

To do all of that, the Router has to parse the payload.  Well, it turns out one of our Developers came up with a brilliant C extension to speed that up.  Yeah, it was that easy.

The Math

The Router also has to keep track of what’s been sent.  This is important because some ISPs impose hourly maximums.  That means, they don’t want to see more than x number of emails per IP per hour.  If we go over those rates, we’re basically throwing your emails away, and no one wants that.

In order to stay under these rates, the Router has to ping a database that keeps track of IP stats.  Even if this ping takes 1 millisecond, it would take almost 3 hours to do it ten million times.  Realistically, we were pinging the database a lot more than that, and even with parallel processing, it wastes time. One of our developers had a brilliant idea to to pull the data in batches.  What used to take almost 3 hours now takes about 10 seconds.  You’ve got to love that.

The truth is, we’re always looking for ways to make MailChimp more lovable. Sometimes that’s with shirts and hats, but more often it’s the peeps behind the scenes pushing for updates to speed, reliability, and whatever else they can think of.