In late 2008, MailChimp Labs began Project Omnivore. Our goal was to build a massively scalable tool for our abuse team that could predict bad behavior.
The experiment started with an nVidia Tesla supercomputer, then grew to a cluster of Amazon EC2 servers running a genetic optimization program for 2 weeks nonstop, running over 61 trillion email data comparisons.
This article shares some of the results of our experiment, and where the technology is taking us…
Why Is Omnivore Needed?
You know what the hardest part of running an Email Service Provider (ESP) is? Detecting ignorant spammers. They’re very different from evil spammers. See, it’s pretty easy to detect “evil” spam. You know, the pharmaceutical appendage enhancing stuff, phishing scams, and Nigerian prince (419) junk. Spam filters actually do a really good job of catching the evil stuff nowadays (not perfect, but pretty darn good, all things considered). And most ESPs employ some kind of spam filter (usually a variation of SpamAssassin) to scan outgoing emails in their queue. Either to prevent evil spam from tainting our reputation, or to “grade” the spamminess of a message.
But those spam filters aren’t designed to detect when an ignorant marketer doesn’t realize he’s spamming, and sends a mass email without permission (remember, the definition of spam is “unsolicited bulk email“).
Lack of permission, in an otherwise perfectly legitimate looking business email, is very subtle and much harder to detect.
I’m talking about when a well-meaning small business owner just wants to get the word out about his new store, and “blasts” an unsolicited email to a list he obtained from his local chamber or from a tradeshow. He didn’t mean harm, and he thinks he’s “just doing business,” but he’s actually spamming. While it’s a different flavor of spam, it’s still spam (again, see: definition of spam). This kind of spam is hard to detect because the content is often perfectly fine and doesn’t contain the normal keywords or traits that spam filters are trained to look for. But this flavor of spam can cost an ESP dearly, because they tend to generate the bad kind of engagement (high complaints, high bounces, high unsubs) that can get our IPs blacklisted by email gateways and ISPs.
How exactly does one detect the lack of permission in someone’s account? Across over 230,000 accounts? Sure, we’ve got a well-trained compliance team who can review a new user’s account, and in the blink of an eye, judge whether or not they’re going to cause trouble for us. But as good as we are, a human review team is just not scalable enough to deal with hundreds of thousands of senders. Not to mention that someone we might approve as a “good sender” can eventually become a “bad” sender. Rigorous, 24/7 account review becomes a necessity.
So our abuse desk decided long ago that we had to change the way we think about handling abuse. We began experimenting and analyzing massive amounts of data in 2008, which led to our list activity score feature. The idea here was to stop classifying customers as good or bad (and giving them access to special IP ranges for better deliverability), and start looking at their list management practices instead.
This then led to even more granular analysis: subscriber engagement tracking. We now treat email delivery differently, depending on the engagement level of your subscribers. Which is nice, considering ISPs are also looking at engagement to decide whose emails show up in the inbox or not. As a sender, you can segment your campaigns based on subscriber engagement, or clean out the inactive members.
But it was when we came up with the idea for our freemium plan that we knew we needed a completely automated, intelligent abuse detection system in place. Without a scalable abuse prevention system, there’d be no (scalable) way to protect the deliverability of our servers from the abuse that comes with free. So we stepped up our research and created Omnivore.
What Omnivore Does
Omnivore is a program that runs in the background and analyzes email campaign and user account data. Non-stop.
When it finds anything suspicious about a MailChimp user or his campaigns, it’ll do one of two things:
- Send the user a warning for something that looks problematic.
- Suspend a user’s account for something bad, send them a warning, and alert our abuse team to investigate the account.
What Omnivore Doesn’t Do
Most important of all, Omnivore doesn’t replace or reduce our human abuse desk team. And despite what some angry people out there might think (or tweet), Omnivore doesn’t shut down “totally innocent, opt-in users” with “absolutely no warning.” Humans review reports from Omnivore. If an account’s been suspended or flagged by Omnivore for problems, our team investigates. So long as the user is not obviously an evil spammer, we attempt to contact the sender with some advice or instructions for account reinstatement. If you’re curious about how our abuse team makes its decisions, check out these compliance tips.
How Omnivore Works
Chad, our lead engineer, headed up the Omnivore Project. I’ve asked him to provide some technical insight into how it all works.
Ben: Without revealing too much of the secret sauce, how does Omnivore work? I heard the team discussing something about “genetic optimization?”
Chad: Yes, in a nutshell, genetic optimization is a method of determining the best option from a large set of possible choices. When the universe of possibilities is large enough, it isn’t practical to just try all of them and pick the best – you have to use an optimization algorithm to narrow down on the best choices. Genetic optimization uses a process that roughly mirrors how natural selection processes can incrementally produce the fittest candidate over many generations, hence the name. You create a population of possible options, then breed and mutate the top performers until you get a good enough solution to stop. Assuming that choices that are similar to each other will perform similarly, this can get you to a good answer relatively efficiently.
Ben: So how’d you apply that to email marketing and spam?
Chad: We took every bad campaign that had ever been shut down by our human reviewers as well as every bad campaign that managed to get through, and started looking for common patterns. We know a lot about every campaign that goes through our systems, as well as every list we manage and customer we sign up. Our human experts had a laundry list of the traits that scream “bad campaign”, but for this thing to scale we needed to be absolutely, mathematically certain. So we used a series of large scale genetic optimization tests running against every campaign we’ve ever sent to confirm which traits were predictive, and how predictive they were.
We did this for both negative reactions (bounces, unsubscribes, abuse complaints) and signs of engagement (opens, clicks) to give our team a complete picture of the likely results of any campaign, before the campaign is ever sent. If Omnivore sees something that it’s certain will be bad, it alerts the abuse desk to review the campaign before it’s let through the system.
Ben: I hear you tried this on the machines at the office and they were too slow?
Chad: Right – even early small-scale tests would run for weeks before giving good results. The full tests would have taken years to complete. We ended up getting an nVidia Tesla and writing the process in highly-optimized C code, which was able to give us our preliminary results in a couple of hours. After we knew our algorithm was pretty close to what we wanted, we converted the process to a giant Hadoop Map/Reduce program running on a cluster of Amazon EC2 servers for about 20 days to get the final results for the first version. Smaller optimization processes still run continuously to test new ideas and refine the model.
Ben: So this is totally different than just checking all outgoing campaigns with a spam filter?
Chad: Yes. It’s using the detailed sender information that we have as an ESP to look for that permission “gray area” mentioned above.
More importantly, we needed to be sure that Omnivore would continue to be efficient and predictive as our customer base grew and morphed after the free program was put into place. Unlike static rules or blacklist-based methods of detecting spam, all of the major Omnivore systems are learning algorithms that keep up with changing user behavior without losing their predictive power.
Ben: After all is said and done, any fun or surprising observations to share?
Chad: Some traits and keywords that we thought we should focus on were actually poor predictors of bad behavior. For example, highly-targeted campaigns don’t do much better than other campaigns when it comes to abuse or unsubscribe rates. Other things that you’d think are totally irrelevant at first glance turned out to be effective predictors, like the length of the subject line.
Ben: So a subject line that’s too short, or um — too long — would be a sign of trouble?
Chad: Something like that. Keep in mind it takes a combination of traits that add up in order for Omnivore to determine “this looks like lack of permission.”
Ben: Any other interesting observations?
Chad: When we started this process, we went straight to our team of human reviewers to show us the patterns that they were looking at when evaluating a new customer. A lot of it was right on the money – particular industries definitely have a profile, and the language used when describing where permission came from is crucially important. However, some of the patterns turned out to be less predictive, like having a mailing address displayed prominently in the content and some of the other details of CAN-SPAM compliance. It was also a bit surprising to discover exactly how bad most spam filters are at predicting permission issues. Whether or not a campaign passes any of the free or commercial spam filters generally has little impact on its predicted outcomes.
Results So Far
As MailChimp scales and sends more campaigns, Omnivore will collect more data and adapt. It’s by no means complete. There are switches and knobs we haven’t even turned on yet. We’re currently running some of Omnivore’s scanning in “observation mode,” and not letting it act on anything. As it gets smarter, we’ll gradually activate more functionality and grant it more decision-making power.
But so far, here are some of the results:
- As of January 6, 2010, Omnivore has automatically sent 19,581 warnings to 9,349 users for exhibiting bad behavior. Of course, we also include tips and pointers on how they can change their ways.
- Omnivore has automatically suspended 2,249 users since September 1st 2009.
- 861 of those users ultimately had to be shut down. We hate losing customers (because we love money), but no customer is worth jeopardizing the deliverability and reputation of our entire system.
Looking ahead (literally)
The reason we built Omnivore was because we wanted to change the way we think about abuse. The project involved so much data crunching that it resulted in some interesting byproducts. Our subject line suggester is one example, as well as the engagement ranking and segmenting tools we mentioned earlier.
But Omnivore is learning more every day, and is actually getting good at predicting not just bad behavior, but good behavior too. Here’s a snapshot from our internal dashboard:
As you can see, Omnivore’s predicting open and click rates for this particular campaign, along with the “bad” stuff. As we feed it more data, the margin of error narrows, making it a powerful new feature we could be offering to our customers one day.
Omnivore’s predictive reporting is changing the way we deal with abuse, but might end up changing the way we think about email marketing in general.


I have say that that is just insanely cool. Kudos to the Monkey-folk. Wow!
“…making it a powerful new feature we could be offering to our customers one day.”
Sounds like I could create a campaign and ask Omnivore to “Please Predict my Campaign” (before sending) and it will give me an estimate of open rate, click through, headline recommendation, etc.
That would be pretty nifty.
Thanks for the insight and detail about his project.
Hi Bob – The plan is to offer this as a feature one day, but our focus right now for Omnivore is abuse prevention.
Sounds very cool. I look forward to trying it out.
This is really cool functionality. Great idea!
It’s worth noting that very few ISPs look at clicks, opens and other measures of “engagement” when making inbox placement determination. They look at the basics (compalints, unknown users, spam traps) and more sophisticated techniques (ham/spam voting panels, “Not Spam” reports, This is Spam reports from reliable reporters).
However, in our experience, the engagement data is highly predictive of problems with the stuff that ISPs actually measure.
Really good stuff.
Thanks for the clarification, George!
You should consider using other models, like logistic regression or SVMs, for prediction. They tend to converge to better solutions much faster than genetic algorithms. I dare you to do a controlled comparison!
@Joseph – Heck no. Last time we tried a controlled comparison on a logistic regression (*also* on a dare), our floopjack capacitor totally pizzled inwards on itself (topologically speaking), forcing us to evacuate quadrant 5 of the MailChimp Lab. Yep, we have five quadrants.
That’s only because you did not feed it the requisite 1.21 jiggawatts!
LOL. Thanks!
Yup, but maybe the flux capacitor wasn’t configured properly?
suds
I find this really surprising. Did you try adding a l2 regularization term?
I’m so happy that you’re using this not just for abuse, but to go beyond to things like suggesting how to create better campaigns. That’s awesome!
Ben –
Yikes! That is … well … uh … VERY FREAKING COOL. Once you’ve got it “perfected” (ha ha) and in V 5.0, this would be an incredible resource for the industry. If you chose to go that route.
(We’d love it at Blue Sky Factory).
On another note, I’m hoping we get to meet face to face some day soon. I hung out with Amy in Miami last week. She’s pretty cool. A good nab for MC!
DJ Waldow
Director of Community, Blue Sky Factory
@djwaldow
DJ, I wish I had 1/100th of your enthusiasm! I’m more of a nerd who sends email, and not an “email marketer.” So I don’t go to many email events. Amy more than makes up for that! :-)
Ben –
I try! All I’m saying is that it would be cool to meet some day. You are like Mr. Snuffleupagus. Ha ha.
http://en.wikipedia.org/wiki/Mr._Snuffleupagus
DJ Waldow
Director of Community, Blue Sky Factory
@djwaldow
Once again the MailChimp team Shines! It’s great to know that you’re taking additional measures to ensure the deliverability of our campaigns through your IPs.
Great idea! You should consider licensing it out to other ESPs if you don’t mind empowering some competitors. Disagree w/ George on ISPs using opens and clicks — the top ones do (e.g. AOL & Yahoo) and have been explicit and public about it – so using engagement data (the same types of open & click metrics mailers use for performance metrics) is brilliant and likely to become strong predictors of deliverability/reputation.
Only Yahoo has been public about it – and use those sorts of metrics (opens, views, clicks). AOL doesn’t use those sorts for metrics. They say “engagement” but they mean other sort of metrics.
Engagement (click, opens, etc.) is clearly important to track but isn’t a broadly direct determinant of delivery rates – at least yet.
Actually George you are wrong. We went through a detailed review with AOL and they absolutely do use activity / engagement (such as Opens, Clicks, and Replies) in determining placement! Clearly the ESP (and not a DSP such as you or PV) is in the best position to measure this and Mailchimp is right-on in leaping on this.
Thanks, Deirdre. We have some perfecting to do before we license this out to anybody, but that’s something we’re considering. A few ESPs have already reached out to me. In terms of empowering anybody, we’re leaning a little more toward ISPs than ESPs right now (like to help make our little email ecosystem a little cleaner) but it’s still early, and there’s a lot more optimization to be done.
Hi Ben — the larger ISPs already have something similar … although term it adaptive-learning filters but your small & mid size ISPs might be a great audience. I hazard to say though … alot more money might be made from licensing this out to ESPs *and* in house mailers….as there are more of them and the interesting marketing benefits that might evolve from this (e.g. a chance to create a better campaign before you deploy to the ISP) would be incredibly appealing to them. Additionally, I would envision that the ISPs would want licensed software …thus, more a flat rate price model …and, if this is market preference, then the millions of mailers & hundreds/thousands of esps represent a larger market. Would love to chat sometime on this!
Thanks, Deirdre. Omnivore still has a lot of learning to do before we can even consider offering it to other ESPs, but the inquiries we’re getting certainly show a lot of demand. We just have to get the predictions better before we even think about that.
This is impressive, sir.
As a competitor of yours, I must say this is really cool! I look forward to hearing more about this in the future as it may become a service in and of itself you could commoditize.
We do something similar at Bronto (http://brontoversity.com/2009/05/12/dont-let-your-campaign-jump-the-shark-keep-your-sender-ratings-high/) where we actually track the usual stats for deliverability/abuse (as George mentions, complaints, bounces, volume, burst in frequency) but have also augmented the algorithm to take into account user engagement metrics as well. But, this definitely gives us something to think about and keeps us (and dare I say other ESP’s) on our toes. Predictability comes with previous behavior of the mailings, though, and has not quite made it to the pure predictive measures you have in place where it sounds like a client can, without even “sending” an email, have a quantitative forecast of what the outcome will be.
Thanks for sharing!
-Chris
Thanks, Chris. Re: engagement scoring, sounds like great minds think alike.
Nice work! sounds very cool.
AREA TESTED | DESCR OF TEST | SCORE
comment | FLATTERED_BY_COMMENT | 5.12
thank you sir!
what’s a Jiggawatt?? :))
1.21 Jiggawatts was the fictional amount of power required to run Doc Brown’s time machine.
I thought it was a mispronunciation of gigawatts, much like people say cobalt instead of COBOL.
I’m sure not a lot of people know that MailChimp is a part of The Rocket Science Group – you’re doing your company name a lot of justice with this project ;-)
VERY IMPRESSIVE!
Ben,
This is awesome. As a client side guy, I would love to be able to predict things before they happen. Its great to see that some of the education in terms of effective email practices is being automated in such a fashion that people are notified and action is taken.
Andrew
Pure Rocket Science:) Cool!
There had to have been a time during Omnivore development when one of the programmers said “sheesh, wish we would have just let users have smtp auth”.
Amazing! Just pure amazing!
Where did you get those chimps and what do they have for breakfast?
Wow, I didn’t understand all of that, which is really cool.
I have a plebian end-user type question. I have a client that has been dipping their tiny toe in, I’ve set up sign up forms, explained CAN-SPAM, templates etc but so far they’ve just hand-forwarded emails.
We’re getting ready to do a great big ‘new newsletter’ announcement to existing firm contacts but they really want to auto-subscribe them wtih an easy/clear opt-out link at the top. We’ve been seeing this a lot lately from others so I have a hard time saying no.
Is Omnivore going to eat us for lunch??
Mahalie, in that situation, you’re dealing with presumably paying clients of the firm who do have a business relationship. It’s a bit of a gray area. So in the initial “big announcement” newsletter, be sure to start it off with some good, “remember us?” kinda text. If Omnivore decides to suspend your account for abuse complaints, a human is notified internally. If you then came in and asked for a human review, all it takes is a quick evaluation to get reinstated.
This is really amazing, I can’t imagine how you managed to get something like this working at acceptable times with such amounts of data.
I know you won’t say how you did but I think you may as well take out part of my (our?) curiosity:
Where does the name (“Omnivore”) come from? Anything to do with Carnivore project?
[...] incentives for spammers to use our system if we couldn’t filter them out. The other was Project Omnivore, the advanced anti-spam system designed to protect our email system against those problems. Since [...]
FYI SpamAssassin link near top of this link is broken.
Darn, looks like that broke when we moved our blog. Fixed now. Thanks!
Omnivore is slick, but as a former user, my single biggest complaint about Mailchimp is that they enable automated shutdowns–described here–without providing humans to entertain responses from “ignorant” customers on the receiving end. It’s pretty rough when Omnivore halts your autoresponders and there’s no one on the other end to help.
Our system suspends accounts, then it goes into a queue for human review. It’s up to a human on our compliance team to make the decision to permanently shut down an account or not, and the process involves asking the customer certain questions about their list management practices. It’s only under extreme circumstances that an account is completely shut down by an algorithm.
I’d be interested in knowing how this improves over time. An estimated open rate of 5 to 40% pretty much describes any campaign these days. Only a totally outstanding campaign will see over 40% open rate and you’d have to churn out a boatload of dreck to see under 5%.
How did you model your fitness function to know you’re not getting overly broad results?
So that was a year and a half ago, Has this thing changed it’s name to skynet yet? Awesome, project!
It’s great to see what MailChimp is doing with this concept. Not only is it good for MailChimp, but more importantly it’s good for the people that use MailChimp’s services.
But I hope that MailChimp continues to keep humans in the loop as the final decision authority. Robbie’s comment above is a good reminder that even the perception of an automated account shutdown is not good.
I wish you can share Omnivore’s scores with false positives.
Good Work Monkeys!
[...] that we use to police our service for bad apples:http://www.mailchimp.com/omnivore/ (an overview)http://blog.mailchimp.com/project-omnivore-declassified/ (more technical background)You might want to review the account shut down stats for [COMPETING [...]
[...] No, we had to find a way to stop bad actors before they even sent a drop of email.So we released Omnivore into the wild. Just as Taylor Swift was genetically engineered to handle both pop and country with [...]