Aug 28, 2013

Making Big Data Accessible in MailChimp

Back in v8.6, we launched a feature called "Discover Similar Subscribers." It’s our first major use of that "big data" stuff all the kids are talking about, and we’re pretty excited about it. Our data scientist has been blogging (and speaking) about how we use data science to protect the email ecosystem, and our head of UX has been talking about our use of data to improve our interface. Indeed, tons of companies out there are trying to find ways to use big data in their business, but it’s extremely difficult to wrangle vast amounts of data, let alone make it accessible to customers. Which is what we’ve done here (we actually made it a button). This is a feature that you can use yourself. Here are some example scenarios for Discover Similar Subscribers.

Discovering more VIPs on your list

Over the years, I’ve found and marked about 15 people on our customer list as VIPs. Half of them are well-known designers and creatives. The other half are tech journalists or people who work at a tech publication of some sort. I’ve always wondered if there were more VIPs on my list, but I’ve never had the time to sift through all 3.5 million subscribers (I knoooow, "excuses excuses").

With this new feature, I can copy/paste that handful of tech journalists from my VIP list, and then ask MailChimp to "Discover Similar Subscribers" on my list:



I simply press the "Discover" button, and in 4 minutes and 38 seconds, MailChimp analyzed my 3.5 million subscribers, compared them with the +2 billion other emails across our system, and found 5,221 subscribers on my list that have similar interests as those VIPs. You’re right–5,221 subscribers is still too many for me to sift through, but I can easily whittle this down in our new segmentation builder. Here, I narrowed it down to people in the discovered segment who are also in the media and publishing industry, and who have a member rating greater than 2 stars.



Notice that “Discovered Segments” are saved? We also added "saved segments" to MailChimp in v8.6, so you can use them to build new segments like this.

Specifying the INDUSTRY criteria above resulted in a segment of 227 members. That’s a lot easier to browse than 5,227. Lo and behold, I discovered subscribers on my list from,, @pandodaily and Some of you might be thinking, "Great, now you can blast those people press releases!" If so, please unplug your computer now. I can think of no better way to lose a customer than to "blast" them with–well, anything. I’ve never actually reached out to my VIPs for anything. I simply pick customers I like, then watch whether or not our content resonates with them (I use our Golden Monkeys app to do that).

Finding nerds in a haystack

Let’s say I want to tell people about Mandrill, our transactional email service. I could go for maximum awareness by sending an email to all 3.5 million MailChimp users, but then I’d have to unplug my computer and punch myself in the gut for being a jerk. Instead, let’s start with a segment like this:



Basically, I’m looking for people who match the profile of an engaged Mandrill customer.  This gives me 22 people who’ve already integrated MailChimp with Mandrill AND are relatively engaged, AND who clicked this previous campaign that we sent to a segment of existing Mandrill users.

Next, I choose "Discover Similar Subscribers" from the dropdown:



and after about 4 minutes, MailChimp suggests 7,286 subscribers on my list who are similar to those initial 22. Skimming through the 7,286 subscriber profiles, I can see it includes people who work in dental clinics, libraries, and advertising agencies. Hmm, this still feels too broad. So first, I’m going to save this "discovered segment" as "Similar to Mandrill customers who’ve integrated w/MC." Then I’ll use the segmentation builder to narrow things down a little further:


See how I’ve added the "INDUSTRY" criteria again?  These are people who’ve signed up for MailChimp and indicated that their industry is "Software and Web App." It’s self-reported, but likely accurate, since their primary motivation is to compare their results to their peers.

Anyway, that gives me 975 customers that I can email about Mandrill. By the way, we’ve recently made it easy to browse through subscribers within segments. Just click on a member profile, then look for this navigation:


 I’m seeing lots of SaaS companies in this segment, so I’m feeling confident that I’ve got a good batch of people to send to.

In my case, I’m keying off of user-provided data, plus user behavior (clicks in a past campaign). I’m lucky to have years of data collecting behind me. If you haven’t yet collected that kind of data, you could theoretically do something similar to this with your advertising. If you track where subscribers sign up from (like from a tech blog or an ad you ran in a tech newsletter via Launchbit), you could take that seed list and look for other techies on your list. Somewhat related: In many cases, MailChimp automatically tracks signup referrers

Expanding segments

What if I wanted to send an email to people about Craftmonkey, a nice little integration in our Connect Directory that helps Etsy sellers build pretty newsletters? For starters, I can find customers who’ve integrated their MailChimp accounts with our own Etsy integration (something we built a loooong time ago).



That finds me 3,222 people. Not too shabby, but surely there are more crafty and artistic people on MailChimp’s customer list than that.

So I used the "Discover Similar Subscribers" feature and got 382,640 people who have similar interests to those engaged Etsy integrators. That’s a bit too many for my comfort (I need the confidence that I’m not going to piss off a bunch of people by sending irrelevant stuff). These 382,640 people aren’t necessarily artists themselves–they’re all just subscribed to email newsletters about art, crafts, or maybe to common Etsy sellers or eCommerce sites. So I’ll refine that down to people in the "Arts and Artists" industry:



That gets me down to 15,199 subscribers, which is quite nice. Browsing through these subscribers, I’m seeing a lot of artistic-looking email addresses (pottery sites, jewelry sites, knitting, fashion, etc), plus some personal email addresses (names mixed with words like "whimsy" and "creative").




From their member profiles, I can click over to their websites and see that most have Etsy badges. I notice that some don’t link to Etsy, but they do link to Pinterest. Which makes sense. So I’d probably edit my email campaign to include content about our Pinterest integration as well.

While I’m at it, I might also include a reference to our Online Seller’s Guide. That would make the content a little more broad, and more likely to be relevant to the segment. Again, I could’ve sent this email to all 3.5 million users on my list, but roughly 3,485,000 of my customers would probably find it irrelevant. Segmenting my list reduces irrelevance, and data science helps me target.

Tips and advice

This feature builds on top of a MailChimp Labs experiment that we introduced in 2012 called Wavelength, which basically looked at subscriber overlap (people who are subscribed to similar newsletters). Since then, we’ve been finding ways to improve Wavelength’s algorithm and make its queries so fast that we could just pull it into "MailChimp Proper.” Analyzing a list of millions takes minutes, and a list of thousands takes seconds. Because the feature’s a bit intensive (and works better with lists of at least a few thousand anyway), it’s only available to our paying customers.

We all know that just because two people are subscribed to the same list, it doesn’t necessarily mean those people are alike. But we use some proprietary math stuffs that I cannot even begin to explain (expect another post from our data science team with details) to generate a segment that’s reasonably confident.

Suffice to say the "seed list" you provide is very important. Ideally, it needs at least 10 email addresses of people who are very similar in interests. A list of people who simply bought the same sheepskin carseat cover wouldn’t quite generate a list of other people who’d buy that same carseat cover, unless there are a bunch of awesome sheepskin carseat cover newsletters out there I don’t know about. But a list of rabid car fanatics might generate a list of other car fanatics who are subscribed to car fan newsletters.

Even though MailChimp scans our entire system, it won’t return emails from lists that don’t belong to you. It only returns emails who are subscribed to your own lists. Furthermore, you can copy-paste emails into MailChimp from other MailChimp accounts, but MailChimp will only accept and analyze the ones who are on your list. Find lots more details about that in our knowledge base.

We’ve made it really easy for small businesses to use big data. But it still requires a little manual work (Arnold Schwarzenegger taught us that if computers didn’t require some human intervention, we’d all be ruled by robots right now, so consider yourselves lucky). As you saw above, I still needed to tweak and refine all results with basic segmentation filters. But hey, it beats having to set up your own cluster of mega servers, or winning a couple IBM Watsons in a high stakes Jeopardy game or something.