Let’s take a trip back to the heady days of 2009. Balloon Boy, Battlestar, Ms. Boyle. And, of course, MailChimp freemium. Three years ago, we allowed users to sign up for free accounts that they could keep forever. The effort has been a huge success, and these days we’re up to 2.5 million users. But when we first went free, the prospect really freaked some of us doomsdayers out. Wouldn’t we be opening the doors to the riff raff? How could we keep an eye on millions of customers to make sure no one was sending spam?
If even a few people got in the system and started abusing it, that could mean trouble for our reputation with major ISPs like Gmail, Hotmail, and Yahoo. No, we had to find a way to stop bad actors before they even sent a drop of email.
So we released Omnivore into the wild. Just as Taylor Swift was genetically engineered to handle both pop and country with aplomb, so was Omnivore engineered to predict the good and the bad senders. Omnivore eats all our data and uses it to see into the very soul of users. Not really. It’s just an AI model. But with over 10 years’ worth of MailChimp data generated at a rate of over 4 billion email address interactions a month, Omnivore has gotten pretty stinking intelligent. And fat. Think Churchill or Taft. Or an aging Elvis.
Let’s detail what Omnivore looks like these days and how it’s being used.
Getting Technical for a Moment
Omnivore sits on top of the Email Genome Project infrastructure. In its brain are internal profiles on nearly 2 billion email addresses. When they’ve been sent to, bounced, when they’ve clicked and opened, whether they have “asdf” in their prefix or use funky symbols, whether their domain is Gmail or some graphic design firm in Ljubljana, whether the address is stolen, for sale, or public, etc. We keep all this nutty data in RAM (!), so right now we’re pushing one terabyte of utilized RAM in a key-value store called Redis that sits on top of a massive sharded Postgres implementation. What this means to you is that Omnivore makes judgements about users and their lists ultra-fast.
In terms of models, Omnivore right now uses an ensemble of boosted trees and random forest models to provide the application with data about new users and new lists. One of the main predictors for these models is something I’ve termed the “Badness Cumulative Distribution Function” or BCDF. Sexy, I know.
Let me break down what the BCDF is for you. Let’s assume we’ve got a heretofore unknown user. Each email address on that user’s list is a piece of evidence supporting whether the user is good or bad. So if you import 80,000 addresses into MailChimp, that’s 80,000 pieces of evidence telling us just how trustworthy you are. We score each email address from the user from 0 (good) to 1 (evil). There’s some secret sauce in here as to how this works so I’ll just replace our copyrighted genius with some Star Wars characters so that you’ll get the idea:
Now, once we’ve scored every address on a list, we combine those scores into a single cumulative curve. This curve is the user’s Badness CDF, and it’s one of many things that flows into our AI models. Here are two real-world examples of these curves:
In the graph above, the bad user has all sorts of Jar Jars and Billy D’s on their list. The good user is majority Yoda. That’s partially how the model kicks spammers out the airlock.
Too Long; Didn’t Read
In short, Omnivore let’s us know things like:
- How many email addresses on a list are likely to be dead (Hard Bounce)? How does a list get oodles of dead addresses on it? If these people legitimately signed up for a newsletter recently, especially if they double opted in, then why would their addresses be dead? The list must be old or collected with questionable methods.
- How many email addresses on the list have been stolen, sold, or scraped? Good people don’t sign up for a MailChimp account with a list that’s identical to the ones stolen from Sony or Ticketmaster. Just doesn’t happen.
How does a user experience Omnivore?
Hopefully, you don’t! At MailChimp, we wish we could just open the doors and let people flow through our system with as little friction as possible. Now that we have Omnivore humming along, we’re getting close to making that a reality.
When you upload a list into the system, Omnivore predicts whether you’re naughty or nice. And the prediction has to be lightning fast. Right now we run a user’s list data through 5 million naive classifiers in just 2 seconds. Just like your cranky grandma, Omnivore’s a quick judge of character. These predictions flow through a live dashboard that alerts us any time an evil doer just got punted. Here’s a snapshot of the dashboard:
If you’re nice, you shouldn’t see any hurdles as you attempt to send newsletters or buy a larger account. Your predictions just bounce along the bottom of that graph, and Omnivore let’s you slide through. If Omnivore thinks you’re in a middle ground and may have some problems, you might be advised to clean up your list or may be allowed to send on a provisional basis. If Omnivore thinks you’re undoubtedly evil, you get its mechanical foot in your back as you’re shown the door.
The great thing about this setup is that good folks are no longer slowed down by vetting in the same way they used to be. If we can verify you’re legit, you move through the system like offal on its way to a sausage casing. Yay!
Back to Freemium
Not just anyone can open up their site to a freemium plan. MailChimp is great fun for the user, but we’re dead serious about keeping the email ecosystem healthy for our users. Over the past three years we’ve been refining our models, and in just a couple months we’ll be releasing an even better version of Omnivore.
The neat part is that as we grow and gain more users and traffic, our models get smarter, which allows us to safely grow even larger. Our growth as a company and our reputation for great delivery are now intertwined, and I can’t wait for the future…when Omnivore becomes sentient and takes over the earth.



Taylor Swift: engineered to handle pop and country with aplomb. Omnivore: engineered to predict good and bad senders. http://t.co/tMFGvBKV
MailChimp News: Project Omnivore: Three Years of Gorging on Data http://t.co/R6ss5pCJ
Project Omnivore: Three Years of Gorging on Data: Let’s take a trip back to the heady days of 200… http://t.co/wQU5wHES via @MailChimp
Project Omnivore: Three Years of Gorging on Data: Let’s take a trip back to the heady days of 2009. Balloon Boy,… http://t.co/ky8niyU4
Project Omnivore: Three Years of Gorging on Data http://t.co/ro20a6KM
Project Omnivore: Three Years of Gorging on Data: Let’s take a trip back to the heady days of 2009. Balloon Boy,… http://t.co/VPLECF7k
“In the graph above, the bad user has all sorts of Jar Jars & Billy D’s on their list. The good user is majority Yoda” http://t.co/4K5yLaEU
RT @benchestnut: “In the graph above, the bad user has all sorts of Jar Jars & Billy D’s on their list. The good user is majority Yoda” http://t.co/4K5yLaEU
Project Omnivore: Three Years of Gorging on Data: Let’s take a trip back to the heady days of 2009. Balloon Boy,… http://t.co/9jBwh5vx
Jar Jar bad. Yoda good. Project Omnivore: Three Years of Gorging on Data http://t.co/BETaudLH
La evolución de @mailchimp: Project Omnivore: Three Years of Gorging on Data http://t.co/DfZjgRAP
#mailchimp is AWESOME http://t.co/Jo6Ug8qC
RT @John4man: Update on how MailChimp uses #bigdata #analytics to keep evildoers out of the system. http://t.co/qAwhzKV3 #ai
RT @benchestnut: “In the graph above, the bad user has all sorts of Jar Jars & Billy D’s on their list. The good user is majority Yoda” http://t.co/4K5yLaEU
Project Omnivore: Three Years of Gorging on Data | MailChimp … http://t.co/oCEUPyxv
RT @benchestnut: “In the graph above, the bad user has all sorts of Jar Jars & Billy D’s on their list. The good user is majority Yoda” http://t.co/4K5yLaEU
Project Omnivore: Three Years of Gorging on Data http://t.co/GQbrzPzA
“@MailChimp: Project Omnivore: Three Years of Gorging on Data http://t.co/UuIfzKpY” Very Cool!!
RT @MailChimp: Project Omnivore: Three Years of Gorging on Data http://t.co/GQbrzPzA
RT @MailChimp: Project Omnivore: Three Years of Gorging on Data http://t.co/GQbrzPzA
RT @MailChimp: Project Omnivore: Three Years of Gorging on Data http://t.co/GQbrzPzA
RT @MailChimp: Project Omnivore: Three Years of Gorging on Data http://t.co/GQbrzPzA
RT @MailChimp: Project Omnivore: Three Years of Gorging on Data http://t.co/GQbrzPzA
RT @benchestnut: “In the graph above, the bad user has all sorts of Jar Jars & Billy D’s on their list. The good user is majority Yoda” http://t.co/4K5yLaEU
Project Omnivore: Three Years of Gorging on Data http://t.co/F6jUoXwP
Those are some stats we can get behind. RT @MailChimp: Project Omnivore: Three Years of Gorging on Data http://t.co/e7dtAMH8
Omnivore: Impressive (and real) use of big data to train “bad customer/spammer” classifiers at Mailchimp – http://t.co/hUKCvPD8
Rocket scientists at @MailChimp regurgitate 3 years of data from Omnivore: http://t.co/vVVqg8qp. #deliverability #emailmarketing
RT @hey4ndr3w: Rocket scientists at @MailChimp regurgitate 3 years of data from Omnivore: http://t.co/vVVqg8qp. #deliverability #emailmarketing
Project Omnivore: Three Years of Gorging on Data http://t.co/2XSTw0os via @mailchimp #newsletter
They’re real Rocket Scientists “@hey4ndr3w: Rocket scientists at @MailChimp regurgitate 3 years of data from Omnivore: http://t.co/lJvFcRdm.
RT @hey4ndr3w: Rocket scientists at @MailChimp regurgitate 3 years of data from Omnivore: http://t.co/vVVqg8qp. #deliverability #emailmarketing
wow, impressive blogpost about #project #omnivore at #mailchimp: http://t.co/igh5mtHx
QOTD from @MailChimp “Just as Taylor Swift was genetically engineered to handle both pop and country with aplomb…” http://t.co/e4gMNonO
“Project Omnivore: Three Years of Gorging on Data | MailChimp Email Marketing Blog” http://t.co/LF1aetjn
Project Omnivore: Three Years of Gorging on Data #redis #postgresql http://t.co/hvYDE5RO
How @mailchimp uses machine learning to prevent spammers from using their services: http://t.co/ABf1xXL7
@MailChimp omenja lj :) cool domain is Gmail or some graphic design firm in Ljubljana, http://t.co/CO4d0wyT“
A high-level overview on Omnivore, @MailChimp’s spam prevention tool – http://t.co/Fye8z7Mg
RT @aaronjgoodman: How @mailchimp uses machine learning to prevent spammers from using their services: http://t.co/ABf1xXL7
@siddartha Did you say big data? #bigdata http://t.co/N2E11O4R
RT @aaronjgoodman: How @mailchimp uses machine learning to prevent spammers from using their services: http://t.co/ABf1xXL7
Project Omnivore: Three Years of Gorging on Data | MailChimp Email Marketing Blog http://t.co/HEn3r43r
RT @aaronjgoodman: How @MailChimp uses machine learning to prevent spammers from using their services: http://t.co/id7FtK5R
Project Omnivore: Three Years of Gorging on Data http://t.co/2LVqnHDU via @mailchimp
RT @austinlouisray: Taylor Swift: engineered to handle pop and country with aplomb. Omnivore: engineered to predict good and bad senders. http://t.co/tMFGvBKV
RT @aaronjgoodman: How @mailchimp uses machine learning to prevent spammers from using their services: http://t.co/ABf1xXL7
RT @aaronjgoodman: How @mailchimp uses machine learning to prevent spammers from using their services: http://t.co/ABf1xXL7
RT @frato: “Project Omnivore: Three Years of Gorging on Data | MailChimp Email Marketing Blog” http://t.co/LF1aetjn
RT @aaronjgoodman: How @mailchimp uses machine learning to prevent spammers from using their services: http://t.co/ABf1xXL7
Project Omnivore: Three Years of Gorging on Data | MailChimp Email Marketing Blog http://t.co/Tv4YMIkk
Interessant inzicht over hoe MailChimp voorspelt of je gaat spammen http://t.co/HyDcgCSW
How Mailchimp blocks bad lists at the front door http://t.co/7qurCAns