Every once in a while, we ask some random questions about email here at MailChimp. Questions like:
- Remember that blog network that just got hacked, and how all their user data was posted to the public? Wonder if any bad guys are importing that email list into MailChimp anywhere. Would be nice to shut them down, and maybe even report them to the FBI.
- Hey, what if we purchased some spam lists ourselves, and just used them to scan all users’ imported lists for high levels of correlation?
- Across all the emails we’ve ever sent, what’s a realistic “average shelf life” for a subscriber’s engagement?
- Is there a *real* “best time” and “worst time” to send email? Of course people will always say “it depends” but what if we actually crunched (all) the numbers anyway? Would we find interesting patterns?
And some questions can be real dilemmas, like:
- If user X imports a list, and we find a bunch of hard bounces, why don’t we prevent those bad email addresses from being imported into our system by user Y? (after all, lots of bounces can lead to delivery problems at some of the big ISPs)
- If we know a particular subscriber is a habitual (false) complainer, should we keep allowing them to subscribe to lists that we host? Even if there’s double opt-in proof?
MailChimp Engineers: “Shutup, already. Go look it up yourself.”
I guess all these questions finally annoyed our engineers enough to make them setup The Email Genome Project, which scans MailChimp’s 600,000 users, the hundreds of millions of subscribers they manage, and the 40 million (and growing) messages they send every day for nuggets of information that we can use to improve our deliverability and train our Omnivore abuse prevention algorithms.
The fun part of all this? The nerds get to play with cool toys…
Read More

