Oct 23, 2012
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network
MailChimp recently launched Wavelength, which allows users to find publishers like themselves and discover at a high level what other content their readership is engaged with. MailChimp has more than two million customers and sends more than three billion emails a month, so we’ve got all kinds of data in the Email Genome Project to bring to bear in understanding our vast network of publishers and readers.
But I wanted to find out if we could take Wavelength’s data one step further, one step deeper.
Who are my neighbors? (a short math lesson)
At a high level, Wavelength finds lists with membership that’s a lot like yours. I wanted to drill down to the individual address level, look at any address on your list, and find out which addresses also on the list were most similar to it, based on the newsletters they were subscribed to across MailChimp. Let’s take a look at 3 fake email addresses:
OK, so on this fake list I’ve got three addresses, and just as in Wavelength, I can see other publishers who send to those addresses. 1 means “subscribed” and 0 means “not subscribed.” Let’s take a look at Coach Eric Taylor. Is Veronica his neighbor more than Lord Grantham?
Well, Veronica and Eric share three subscriptions, while Lord Grantham and Eric share two, so it’s Veronica for the win, right? Mmmm, not really—because Veronica is subscribed to everything. It’s hard to say that her interests truly align with Eric’s. Instead we’re going to use a simple calculation called cosine similarity. Basically, the weight of the connection between Eric and Veronica is their three shared subscriptions divided by the square root of Eric’s total connections (square root of 3) times the square root of Veronica’s total connections (square root of 7).
If you’re taking notes, that means Eric Taylor and Veronica Mars get a neighborlyness score of 3/sqrt(21)=.65, whereas he and Lord Grantham get a 2/sqrt(9)=.67.
Lord Grantham wins by a hair!
Let’s do more of that
So what if we ran these similarity calculations on every pair of addresses on your list? Then we’d know who was close to whom on your list. We could find little pockets or communities of subscribers that are different than the rest.
Well, we tried it. Lo and behold, we get some cool stuff:
Click the image for a zoom.it-hosted deep zoom image of the graph. Crazy.
The subscription checks we have to perform to make this graph just for one user is in the billions. It’s truly a big data problem that only works because of our vast user base.
Like most projects in EGP, this one started out as weapons development. We’re able to use this technique to find communities of blog comment spammers. Their scripts end up signing up for the blog’s newsletter whenever they sign up for an account to comment, and that’s when they find themselves in users’ lists. But by clustering, we can locate them. Here’s an image of a comment spam cluster detected on one of our publisher’s lists:
The red is our tumor of bad email addresses (each point is an address). And digging further into EGP data, guess where these addresses came from? The Sony breach where Lulzsec made a bunch of addresses and passwords public. Many of those email accounts were then breached and used for spam purposes. Whoops.
But there are a lot more interesting ways for email addresses to cluster than around comment spam. What about all our good readers?
Here are the results for a couple different users. Note: As I go through these results, I’ll overlay snapshots from the campaigns of the top lists the cluster is interested in.
Reader clustering for a fashion and beauty sender
We looked at a user with a large list that sends fashion and beauty tips. We wanted to know if there were clusters on their list that would help them:
- Understand unique pockets of their audience
- Better target individuals based on their interests
.
So we calculated these email address neighbors, created a graph of the top three connections of each address to others on the list, and loaded it into Gephi. (Gephi is just a large network graph analysis and visualization software. Fun stuff for nerds.) We were able to suss out quite a few clusters.
Let’s start with this one (click the image for a full-size look):
The cluster has unified interests in knitting, wedding music, wedding invitations, and customer jewelry. These interests also clue us into probable age and gender and stage in life, right? If I’m this sender, perhaps I should keep that in mind when creating content for these folks.
Here are a couple of fun ones:
We’ve got organic shampoo, organic clothing, wellness, and world peace. These are our Whole-Foodies. Let’s contrast them with this cluster on the same list:
Tea party stuff. And volumizing shampoo.
Needless to say, these two groups have wildly divergent interests. You probably wouldn’t do well trying to market fashion tips to both clusters in the same way.
Now, let’s contrast all the previous clusters with this sucker:
Whoa. I see fashion, fashion, fashion, fashion, fashion, and fashion. These are the people that made Nina Garcia into a rock star (and rightfully so—who doesn’t love Nina?).
To take a look at more clusters from this list, you can check out this swoopy, zoomy presentation over at Prezi.com. Beware: The presentation is large. Only open it in Firefox (not Chrome!) on a box with more than 8GB of RAM. May the odds be ever in your favor.
Let’s move on to another user.
Reader clustering for an e-commerce sender
We checked out one of our E-commerce360 users that sells snarky t-shirts. How would their list break down? Here’s an image of it at a high level:
Alright. I’m seeing some clusters there. Let’s investigate a few. Here’s the first one:
Fantasy sports! Guns! And flowers, for what I can only assume are apologies for doing something stupid with the first two.
Then we’ve got this group:
A bong newsletter, outdoor gear, and a dubstep festival that’s booked these guys. What, no Aphex Twin? Kids these days…
I can see, then, if I’m sending newsletters selling snarky t-shirts, that thematically different ones are going to appeal to these two groups. How about this group?
Fitness, fitness, really nice knives, high-end sunglasses, and “Dance Floor Filth 2.”
I could go on, but I’m gonna stop the fun there. To view all the clusters in another zoomy presentation, go over to Prezi. This one’s a bit more manageable than the previous one.
So what can we do with all this?
This is very much Skunk Works stuff here. We’re in the early days of data exploration. Maybe you could use this for ad placement (“Ah, lemme advertise in the Vogue Knitting Twitter feed”). If we combined it with engagement data, we could figure out which clusters are engaged more than your list as a whole, and then you could go after those types of readers with your ad dollars. Maybe gun fans spend more than potheads. You could segment and view your e-commerce360 data. We’re working on that data overlay piece, as well as clustering by click vectors rather than subscription vectors.
What do you think? How might you use such a feature? Or what would you need to make it work for your MailChimp account? I set up a little feedback form to take your suggestions if you’ve got them.
P.S. For more discussion on this clustering technique, there’s a tutorial here on doing it in Excel.
More Email Research:
- What Good Marketers Can Learn from V14gr4 Spammers
- This Just In: Subject Line Length Means Absolutely Nothing
- Comacast and Gmai: All Your Typos Are Belong To Us
- Are Daily Deals Dead? We analyzed 4B emails to find out
- MailChimp’s Email Genome Project is Born









benchestnut
“Instead we’re going to use a simple calculation called cosine similarity. ” http://t.co/MtiChWrg Flashbacks to Calc5 classes. #coldsweats
10.23.2012
iamacyborg
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/ZIAVbYXU #emailmarketing
10.23.2012
awsweet
“@benchestnut: “Instead we’re going to use a simple calculation called cosine similarity. ” http://t.co/2dZcjFYg #coldsweats” Great mining!
10.23.2012
digitamarketeer
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network: Graphing interest… http://t.co/XXb0W6Sj
10.23.2012
MarketSmartNow
MailChimp News: Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/VppF07cb
10.23.2012
newsletterblog
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network: Gr… http://t.co/y3b0W0LG via @MailChimp
10.23.2012
MailChimp
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network https://t.co/QzU1nxFp
10.23.2012
multivariates
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network: Graphing interest… http://t.co/zeMlQFhS
10.23.2012
mkronline
RT @MailChimp: Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network https://t.co/QzU1nxFp
10.23.2012
CorleyBrooklyn
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/TCHt6afp
10.23.2012
joeuhl
RT @MailChimp: Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network https://t.co/QzU1nxFp
10.23.2012
aaronjgoodman
Wow! More great content coming from @mailchimp’s @John4man: Clustering in mailchimp’s network. http://t.co/2JxU8GPn
10.23.2012
PositiveCareIE
We use and really like @MailChimp – this post (about discovering what else your subscribers are interested in) is great http://t.co/pPs2TQOs
10.23.2012
SolveSoft
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network: MailChimp recentl… http://t.co/avNSSGDI
10.23.2012
mriggen
Love how @john4man translates @mailchimp data into stories Digging Deeper into Wavelength & EGP Data: http://t.co/toB5Yn7K via @mailchimp
10.23.2012
erich_owens
RT @aaronjgoodman: Wow! More great content coming from @mailchimp’s @John4man: Clustering in mailchimp’s network. http://t.co/2JxU8GPn
10.23.2012
Jason
fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/97qJ48Kq
10.23.2012
PhilipHotchkiss
< insight from big data RT @Jason: fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/uAxnPT7K
10.23.2012
provirtual
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/SZZ0pO8x // Over my head…
10.23.2012
deliverability
RT @Jason: fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/97qJ48Kq
10.23.2012
HelloChrisMoore
In case anyone is interested in what you’ll learn in the Social Network Analysis course on @coursera … http://t.co/xOM28ps5 via @MailChimp
10.23.2012
howardscottj
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/AlB6uYYT by @mailchimp #WICKEDCOOL
10.23.2012
azeem
RT @PhilipHotchkiss: < insight from big data RT @Jason: fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/uAxnPT7K
10.23.2012
silverthan
RT @Jason: fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/97qJ48Kq
10.23.2012
JosieJosieg
RT @Jason: fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/97qJ48Kq
10.23.2012
NasNL
Mooie analyses: Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/oeNvaAx2
10.23.2012
mathcass
RT @multivariates: Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network: Graphing interest… http://t.co/zeMlQFhS
10.23.2012
eunho
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/y8ZRCmR6
10.23.2012
MailChimp
RT @Jason: fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/97qJ48Kq
10.23.2012
miz
RT @Jason: fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/97qJ48Kq
10.23.2012
t1c1
RT @John4man: Excited about my new @MailChimp blog post on subscriber clustering to understand your audience. #bigdata #analytics http://t.co/nfG8JunL
10.23.2012
jevy
RT @Jason: fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/97qJ48Kq
10.23.2012
secretmirth
Think you know your email list? Finding Interest Clusters in MailChimp’s Network http://t.co/X48DwmoM via @John4man
10.23.2012
jsilton
Finding Interest Clusters in MailChimp’s Network #email #segmentation #targeting http://t.co/bQL4dWIb
10.23.2012
tisal
I just love data http://t.co/aK2gkzkb
10.23.2012
FunStudioTweets
RT @Jason: fascinating reading cluster and charts in @MailChimp’s huge email network http://t.co/97qJ48Kq
10.24.2012
mikemcmurray
@MailChimp’s huge email network http://t.co/VJSq1huk Smart use of their data set incl finding tumours of compromised email addresses.
10.24.2012
heinrichdsf
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/914D3n0n via @mailchimp
10.24.2012
Push_Creative
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/OibSET59 via @mailchimp
10.24.2012
devinjelliot
“If your business has campaigns, √ out mailchimp or be behind the curve. These guys are doing some cool stuff http://t.co/BTiGR2DF
10.24.2012
bwertz
Great example of the power of pooled data: “Finding Interest Clusters in MailChimp’s Network” http://t.co/CdAA4O5B
10.24.2012
CaseyLy2
I knew I should have paid more attention in math: http://t.co/vrrCzTOs
10.24.2012
henrylf
Finding Interest Clusters In MailChimp’s Network: Geeky but interesting big data analysis on emails http://t.co/FTowKXPp
10.24.2012
alisonborealis
Holy crap! @MailChimp’s analytics are AMAZING! … glad I’m working with their programs… http://t.co/y35zH1Bl
10.24.2012
byosko
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/Fp3AybxC
10.24.2012
davidcrow
RT @byosko: Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in @MailChimp’s Network http://t.co/oZCD5xAj
10.24.2012
duanebrown
RT @davidcrow: RT @byosko: Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in @MailChimp’s Network http://t.co/oZCD5xAj
10.24.2012
PaulDavidMadden
@djdclarke ok try http://t.co/ho2y0i9b as the starting point :)
10.24.2012
waworld
Use mailchimp? Check out these subscriber cluster graphs from @John4man of MailChimp #bigdata #analytics http://t.co/ETCBQC6W
10.25.2012
Nansky
RT @waworld: Use mailchimp? Check out these subscriber cluster graphs from @John4man of MailChimp #bigdata #analytics http://t.co/ETCBQC6W
10.25.2012
davidseves
RT @waworld: Use mailchimp? Check out these subscriber cluster graphs from @John4man of MailChimp #bigdata #analytics http://t.co/ETCBQC6W
10.25.2012
GS_Creative
Do you love e-mail marketing and nerding out over data? Well, my goodness, @mailchimp have quite the blog for you! http://t.co/TS4M3e2C
10.25.2012
whangsf
Detecting interest clusters using cosine similarity. #graphdataproblems http://t.co/98aGXdgD
10.25.2012
abcwatson
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/GhAUPwZy via @mailchimp
10.26.2012
darios
@alebegoli Ciao Alessandra, non so se hai letto ma forse ti interessa: http://t.co/LUFbeO0U
10.26.2012
vuffray
This is huge: Digging Deeper into Wavelength and EGP Data
http://t.co/thbfLL29 #BigData #spam
10.26.2012
kele_on
Whoa, MailChimp’s blog totally satisfied my geek side with this post on Interest Clusters: http://t.co/beQe5xAa #marketing #email
10.26.2012
KellyMitchell
RT @kele_on: Whoa, MailChimp’s blog totally satisfied my geek side with post on Interest Clusters: http://t.co/XEJ6yrGW *Gr8 share! Mahalo!
10.26.2012
andygambles
Digging Deeper into Wavelength and EGP Data: Finding Interest Clusters in MailChimp’s Network http://t.co/iET7Y25v
10.27.2012
Chris Carmouche
Looks like a great feature for Ad placement and the abiloity to reach out to like-minded list owners for joint projects etc… hope you can implement it.
10.27.2012
Erroin
I am always amazed by all the great features @Mailchimp builds. One major reason I love this program and always recommend it!.
10.30.2012
UglyFashion
Absolutely crazy maths geeks! Brilliant though! I wish I could pay them to analyse my own lists!
10.31.2012
Ertuğrul ÇAKIR
I just love data …
11.01.2012
Bets10
I tried MailChimp before and interested. Very very nice data.
11.26.2012
tctjr
Yea good stuff this is very much in line with what we found when we had our stuff running on Forbes.com and DailyCandy.com with some tech at BeliefNetworks. We used prefuse to visualize in real time and had the same results across Twitter, Blogs and Premium content although Gephi is way cooler viz!
11.29.2012