Every once in a while, we ask some random questions about email here at MailChimp. Questions like:
- Remember that blog network that just got hacked, and how all their user data was posted to the public? Wonder if any bad guys are importing that email list into MailChimp anywhere. Would be nice to shut them down, and maybe even report them to the FBI.
- Hey, what if we purchased some spam lists ourselves, and just used them to scan all users’ imported lists for high levels of correlation?
- Across all the emails we’ve ever sent, what’s a realistic “average shelf life” for a subscriber’s engagement?
- Is there a *real* “best time” and “worst time” to send email? Of course people will always say “it depends” but what if we actually crunched (all) the numbers anyway? Would we find interesting patterns?
And some questions can be real dilemmas, like:
- If user X imports a list, and we find a bunch of hard bounces, why don’t we prevent those bad email addresses from being imported into our system by user Y? (after all, lots of bounces can lead to delivery problems at some of the big ISPs)
- If we know a particular subscriber is a habitual (false) complainer, should we keep allowing them to subscribe to lists that we host? Even if there’s double opt-in proof?
MailChimp Engineers: “Shutup, already. Go look it up yourself.”
I guess all these questions finally annoyed our engineers enough to make them setup The Email Genome Project, which scans MailChimp’s 600,000 users, the hundreds of millions of subscribers they manage, and the 40 million (and growing) messages they send every day for nuggets of information that we can use to improve our deliverability and train our Omnivore abuse prevention algorithms.
The fun part of all this? The nerds get to play with cool toys…
First, they setup a server that’s used for some occasional pre-test “heavy lifting.” To be honest with you, I don’t think they really needed this one. I’m pretty sure they got it for fun. Whatever the case, here are the specs:
- 4 x Xeon X7550 CPUs, each 8 cores @2.0Ghz with HT
- 128 GB of DDR3 RAM
- Hardware BBU-backed raid 10 of Intel X25-E SLC SSDs
And then they setup another server that is not quite as impressive (with “only” 2×6 core xeons for a total of 24 threads, 36 GB RAM). This one was configured more for storage, with a 12 disk raid 10 of 15k SAS drives with ~4TB of usable raid 10 space.
I pretty much have no idea what I just typed there. Sounds impressive, though. The monthly bill certainly made an impression on me.
But hey, all in the name of R&D. If they wanna use the toys to play Doom (people still play that game, right?) or test their password cracking skills, it’s all good.
Anyway, the high level goal of the Email Genome Project is to help improve the email ecosystem. Specifically, we want to provide answers — fast. The more we learn about email, the better we can help prevent the abuse of it.
We’ll talk more about our findings here on the MailChimp blog soon.
For now, to get a feel for what kind of data our Email Genome Project can produce, you should sign up to Dan Zarrella’s “Science of Email Marketing” webinar.
He asked us a few questions about email marketing. We scanned 10 billion emails, and gave him some answers:
See, it’s things like this that make MailChimp rock! With the vast treasure trove you guys have, there is so much opportunity for a small group of smart people to truly revolutionize email marketing.
Love the part about leveraging hard bounce data across your subscriber base. This is a feature that as a customer I’d even pay more for since it would enhance my deliverability score with almost no interaction on my part.
Keep up the great work, and keep asking the hard (and interesting) questions! Big props also to Ben for giving his engineers creative reign to explore the cool stuff.
Okay but, bottom line, you’re compliance team is still hanging on a few very limited parameters…
I remember a post of you a year ago about Tesla computer setups at MC for the same kind of calculations…
In the end, you don’t care about important metrics (open %, click rate, & so on)…
It seems like you’re stuck on complaints.
My previous startup (a BtoC dating system) was even worse than my actual one (a BtoB invoicing app).
After much thinking, I feel like europeans just use the “Report as spam” button as a trash. Maybe it’s different in the US ?
Actually, using the “Report as spam” button as a proxy for “go away” is pretty common throughout the email ecosystem. We also do look at all of your metrics when making a compliance decision, but complaints remain very important since it’s very important to the ISPs. Generally, if you’re in the bottom 0.5% of our customer base in any of the negative metrics we track, you’re likely going to get warned. If you’re in the bottom 0.05% of anything, you’re going to get reviewed and potentially blocked. If you’re actually hearing from our compliance team regularly, chances are you’re really doing something wrong, intentionally or not.
[...] on their account, because their list was evidently being used by multiple competitors. We used our Email Genome Project servers to search our system, and found at least three other users — in the same industry and [...]
I have a question
On the charts:
Effect of Time of Day on Clickthrough Rate
Effect of Day of Week on Clickthrough Rate
Most Clicked Subject Line Words
Most Abuse Reported Subject Line Words
I need to know the Demographics of these Charts. From what continents are people being surveyed. And if possible can I have the percentages by continents of the people being surveyed. This is very important for a project that I need an answer on.
This is from Dan Zarrella’s Webinar on HubSpot
So many of my store customers tell me that they never get my emails. My open rate has gotten lower over the last year. I did a test of recipients where I sent the same email to a group who opened the previous email and a 2nd one to a group that did not open the last email. The open rates were vastly different. The same folks always open the emails period. The rest must never get to them or they go to their spam folder. They’ve all requested to be on my list. I’ve tried smaller groups. That didn’t work, grouping them by server…no help….even the latest list was sent with totally new sign-ups and only one person opened it…I’m almost thinking that sending emails is a useless waste of energy. Suggestions?
Yes, my suggestion is to contact MailChimp support and not comment randomly to an unrelated blog post. There are too many reasons to go into here why you may have low open rates. MailChimp support can best advise you.
Hi Pamela, all this means is you have some really loyal subscribers, and you have some “meh” subscribers. IMHO, email’s best for keeping an ongoing relationship going with those loyal ones. It’s not a worth of time and you shouldn’t abandon email — just maybe think of it as a different kind of tool. I’ll contact you via email with some ideas.
[...] what if you fat-finger the domain? This past month, I was doing some big data wrangling for our Email Genome Project, and I saw something funky going on with fat-fingered domains of large ISPs and freemail [...]
[...] Email Genome Project: http://blog.mailchimp.com/mailchimps-email-genome-project/ No Comments Tweet Tagsbig dataemail genome [...]
[...] look at how people read email and what’s working in email marketing today. MailChimp’s Email Genome Project scans the messages its 600,000 users send, looking for ways to improve its service and curb spam. [...]
[...] and Gmai: All Your Typos Are Belong To UsAre Daily Deals Dead? We analyzed 4B emails to find outMailChimp’s Email Genome Project is Born No Comments Tweet Tagsbig datadata scienceemail genome [...]
[...] MailChimp’s Email Genome Project is Born [...]