Mar 17, 2009

Want 700,000 HTML email templates?

mecanicalturk

When we launch MailChimp v4.1 later this month, there will be more email template options to choose from. A lot more. I’m not really sure exactly how many templates there will be, because we’re still counting them.

Basically, we came up with over 700,000 HTML email template options, and we’re narrowing it down using Amazon’s Mechanical Turk.

Here’s how…

As you know, MailChimp has never offered "pre-built" HTML email templates.

Our philosophy has always been to give you nice, modular layouts, plus powerful design tools like our automagic email designer, header designer, and magic color picker to help you build your own beautiful template. Think "photoshop for email" instead of, "microsoft word stationery."

But our customers have been asking us for more choices. They don’t necessarily want fully pre-built templates. They just want more choices that can help them get started, which they can then customize (instead of starting from scratch). Good point. But we didn’t want to just hire someone to sit here and "come up with a bunch of craptacular templates that can be re-purposed." We needed something more scalable and automatic.

We have over 100 beautiful header graphics for different occasions (here’s an example for St.Patricks Day). We took each one of them and ran them through a color analyzer (which we developed in the MailChimp Lab) so that we could automatically generate color palettes for your template that compliment each header graphic.

It’s what I was talking about in this post: Color Experiments in the MailChimp Lab.

The initial results were pretty good:

mecanicalturk

By the way, this technology will also be live in MailChimp v4.1. Basically, whenever you upload your logo into a MailChimp template, we’ll analyze the colors and suggest a few color combinations for the rest of your template.

Anyway, for each header graphic, we found out we could generate roughly 400 color palette possibilities (or "themes").

We narrowed that down with certain rules, like "Bright #FF0000 red should not be used for titles" and "default body text shouldn’t be bright blue," and "don’t let the colors for backgrounds and fonts be within x% similarity, or there won’t be enough contrast."

That narrowed things down to about 200 possible themes per header graphic.

So we went from roughly 700,000 options to around 25,960.

Then we started to apply them to actual email templates. And we were kinda shocked how good they looked, considering they were automatically generated. Here’s a batch:

mecanicalturk

Here’s another batch:

mecanicalturk

And here’s a batch that didn’t turn out so good (IMHO):

mecanicalturk

The results were better than we expected, but we still didn’t want to post them all and overwhelm our users with too many choices.

For the record, I actually did want to overwhelm our users with too many choices (just so that I could write a blog article with a ridiculous template count in my title).

Luckily, our product team is smarter and nicer than me, and wanted to narrow things down a little more intelligently so that we’d have a more usable interface. Party poopers.

They wanted some kind of human review. Sure, we learned a lot about programmatic design harmony in all our lab experiments, but color and design is so subjective.

But how could we possibly review so many email themes?

How to review 25,960 designs in 19 hours

So our engineers turned to Amazon’s Mechanical Turk, which is a "global, on-demand, 24×7 workforce." It’s composed of "Turkers" who complete micro tasks really, really fast.

They’ve got an API that allowed us to basically post all our design combinations, and ask their users to judge them. Users get paid (think pennies, not dollars) for each time they complete a task.

A typical screen looks something like this:

mecanicalturk

Turkers would each scroll through 3 options, and tell us which they think is best. For each task they completed, we pay 2 cents. We can choose not to pay a Turker if we think he’s cheating or gaming the system, which apparently can be an issue.

Reviewing Turk Work

Because of the possibility of cheating, there’s actually a review process where we have to analyze all the results we got back from Turkers, and decide, "this guy gets 2 cents, this guy doesn’t, etc."

The only issue we faced here was sifting through all the results. We ran just shy of 100,000 tasks through the system, so we got back a HUGE .csv file that showed us how much time each Turker spent on each task, and all kinds of other useful data. FYI, running this large job actually prompted a call from Amazon, where a very, very nice person kindly asked us to please talk to them before running something that big again. Heh. Now we know why. There’s so much data, we’d need a bunch of Turkers to review the Turkers!

Anyway, we’re not inclined to be "strict" and deny anyone a couple pennies for their work. Something about that feels dirty and mean.

So all we did was look for heuristics in the data to tell if someone is cheating. For example, if they review a dozen themes in just a few seconds, and they happened to vote for "Theme #1" every single time, they’re probably cheating. Or, if we can discern a repeatable pattern, like voting for "1,2,3" over and over again.

Otherwise, this is completely subjective work, so we’re not going to be harsh.

It’s just that sifting through this data to run these scripts kept crashing our engineer’s desktop computer. It would run for 15 minutes, then just crash. That’s when we remembered the good old Nvidia Tesla supercomputer that we bought a while back (we talked about it in this newsletter), precisely for crunching massive amounts of data (for a certain "Project Omnivore" which we may or may not deny the existence of).

The supercomputer ran it in under 1 minute. Ha.

The results

In total, we found 27 turkers who fit the profile of "possibly gaming the system" and we unfortunately had to reject their 8700 tasks. On the bright side, we approved the 85,000 tasks that the other 503 workers performed.

We’re going to run the remaining templates that were deemed "prettiest" through a "March Madness" style bracket, and we think we’ll end up with around 600 beautiful templates.

Give or take a few hundred bajillion.