Apr 13, 2010

Speeding up MailChimp with Akamai

MailChimp’s been growing at a really fast rate. In September of 2009, we were at 85k users. By the end of March, we had just surpassed the 300,000 user mark. Last I checked, we’ve been adding an average 1,000 users a day, and we’re now sending an average 20 million emails a day.

And the growth is global. Our #1 city, in terms of users, is London (followed by New York City, Sydney, Atlanta and Los Angeles). The US is still our largest country by user, but after that it’s the UK, Canada, Australia, and the Netherlands.

Obviously, one of the challenges of running a SaaS like this (besides fending off the bad guys) is keeping our application nice and speedy, while balancing everything out with rapid innovation. So seeing tweets like this can be really frustrating:

slow-mexico-city

When we see stuff like that, we treat them seriously. Even though there’s no way to tell from that tweet if they’re talking about our mailchimp.com website, our email app, or email delivery speed that he’s referencing. Then, there’s no way to tell if it’s our data center, the intertubes that connect our data center to his ISP, a problem with his ISP, the internet cafe he might be sitting at (and whether or not they’re warming something in the microwave), his wifi network, an email gateway at his company, a spam filter on his machine, or maybe even a memory leak in his browser that a simple reboot (or three) will help.

Well, there’s no way to tell in 140 characters, at least.

But we still investigate. Especially if the tweets about speed are "clumping." One tweet, and maybe there’s just something wrong with that user’s setup. Two tweets within a minute or two, and Houston-we-may-indeed-have-a-problem. So we investigate.

And 9 times out of 10, our investigation ends with, "we need more info from the user." Problem is, the best way to get more info is to have the customer "run a traceroute." I’m not going to go into the details of that. Just trust me. It’s too much of a p.i.t.a. to expect anyone but a techie to do.

So we created a landing page where we can direct users to, in order for our support team to get more information:

http://us1.admin.mailchimp.com/speedtest/index.php

speedtest

It’s kind of a mini traceroute.

It helped us pinpoint some problems. Instead of getting a tweet like, "Hey @MailChimp, you’re slow" (which really is not helpful to us at all) and then replying with, "Sorry, we’ll look into it" (which is really not helpful to the customer at all), we now had a page that helped people tell us almost exactly where things slowed down.  and I believe we’ve even changed our CDN as a result of the feedback from this page, and worked with our data center about their backup procedures. But we still seemed to be getting a wide variance of feedback from people, most of it being complaints about our interface loading slowly. Internet cafes are probably the biggest culprit, along with wifi on trains, and satellite internet providers. Sometimes, the US was blazing fast, and the UK slow. Sometimes, it was exactly the other way around. It was extremely frustrating. So it was always in our plans to just start adding more data centers around the planet.

Akamai’s Web Application Accelerator

But first, we wanted to experiment with Akamai. If you’ve ever downloaded any videos or streamed anything from a big website, you’ve probably noticed that they distribute their files through Akamai. Some members on our team had success with Akamai at other companies where they worked, but they were more content-heavy sites — not "SaaS" web apps like MailChimp. We’re not content heavy. We’re process heavy. Then we heard about their service called Web Application Accelerator that claims to "boost the performance and reliability of web apps."

It’s not cheap. We were filled with skepticism at first, but equally filled with concern for a speedy user experience, so we gave it a try.

Here are the results:

app-monitor-before-and-after

In January (the light green line), before using Akamai, we measured how long it took to complete a task in MailChimp (basically, to log in, open up an email, save it, look at some stats, then log out). It took up to 30 seconds to complete that task. Furthermore, it was very "spikey" in nature. Around the 24th, we made some changes to our system that stabilized things to around 20 seconds, which established a new, lower baseline. We were doubtful at this point that Akamai could help any further.

Now the line turns dark green, as we head into February. The baseline stays steady around 20 seconds.

Then on February 24th, we started our experimentation with Akamai. You can see time spiking a little the days just before Akamai, because we were moving things around to prepare for the switch. The day after the switch, time dropped down closer to 10 seconds. So even after we did our own optimization, Akamai still helped considerably.

Now here’s the fun graph that shows load time by city:

load-time-by-city

Before Akamai, the load time for our application varied dramatically, depending on what city you were in. Akamai pulled everyone closer together, plus cut the load time down dramatically.  So yeah, Akamai works. If you run a SaaS, and business is growing fast, Akamai might just be a good investment for you too.

We’re Not Done Yet

But that’s not to say we’re done optimizing. Akamai’s a great help, but there’s still a lot we’ll be doing ourselves…

UI Optimization

We’ve hired some talented new nerds to help us keep refining the MailChimp app. Jason Beaird, one of the awesomest, most detail-oriented "web designelopers" we know, joined our team in March, and is already waist-deep in UI changes. You’ll start to see some of his work on our new /Lists dashboard, and all throughout the application, starting in v5.1 (launching soon).

New Functionality

We also suckered Eric Muntz, a developer who created a nice Blackberry app for MailChimp, into joining us full time. He’s working on a bunch of new projects, one of them being an external application for our high volume senders that runs on your desktop, and can handle a lot of the heavy-duty list processing and segmentation work that would normally be slow in your browser. This is one of the challenges of scaling our business. Our chosen target audience is not "small businesses with small lists." Nor is it "the enterprise," who presumably have ginormous lists.  With our audience, list size varies from free users with less than 500 subscribers, to small businesses with less than 5,000 subscribers, to high-volume senders with 3 million subscribers. Not easy building a scalable infrastructure to accommodate them all.

Pre-Loading

Another change that’s planned for v5.1 is pre-loading of the MailChimp application on log in. People don’t realize (nor should they) that putting together an email campaign is heavy-duty stuff. We’ve got image editors, text editors, database management, analytics, and email delivery inside MailChimp — it’s like an entire business suite in one place. So as you move from the campaign creation section in MailChimp to the reports section, you have to load up lots of new code. We’re only talking about a few seconds, but this is all about perception. If you click a tab, you want the next page to just pop. We’re going to try pre-loading some of our code to see if that helps.

You may have noticed whenever you log in to Gmail, they do some pre-loading:

gmail-loading

and pre-loading is something we’ve all come to expect with image editing web apps like Aviary:

aviary-loading

soon, you’ll see something similar in MailChimp. It’ll be brief, and totally worth it.

You’re going to start noticing a lot of changes at MailChimp this year. We launched in 2001, and we just let it grow organically, as we ran our other (web-dev) company. It wasn’t until 2007 that we really started working on the MailChimp app. The 2007-2008 years were all about ramping up the innovation, and 2009 was all about growth via innovation. This year, we’ll be focusing a lot on innovative ways to make MailChimp faster and more powerful. Buckle up!