Mar 6, 2009

Cloudmark Fingerprinting Algorithm


MailChimp’s abuse desk runs Cloudmark to perform occasional "customer audits." We basically scan for problem campaigns on our system that might jeopardize the deliverability of our servers. What’s Cloudmark, why do we use it, and how does it work?

Cloudmark is an advanced "message security" system that protects more than 300 million inboxes and works with more than 100 of the world’s largest ISPs and mobile operator networks such as EarthLink, Comcast, Cablevision, Charter Communications, Cox Communications, NTT Communications, Sprint Nextel, Virgin Media and Swisscom, as well as hosted messaging providers, including domainFACTORY and NuVox.

So if you send lots of email marketing, it’s kind of important to know who they are.

But how does their spam filtering technology (its fingerprinting algorithm) work?

Well, it’s a secret. Understandably so, because if they told everyone how they work, that would kind of defeat the purpose.

But here’s what they will tell you (from their website sales material):

Cloudmark’s Advanced Message Fingerprinting™ algorithms were designed to target sophisticated spamming and virus proliferation techniques. Unlike rules, Cloudmark fingerprinting algorithms are extremely lightweight, each optimized to perform only minimal processing on a message. As a result, message throughput is extremely fast and less processing CPU is required.

And here’s how they explain their Fingerprinting algorithm:


So they’re taking chunks of your message (which I assume could be content, senderscore reputation, and code), and taking it out of the context of your email campaign. I don’t know if this is done for speed, or as some kind of "double blind" methodology or what. Then they classify the chunks into "fingerprints." Then, they compare those fingerprints from your campaign with other fingerprints in their database that have been classified as spam.

This is where I invite any geek out there who knows way better than me to please comment below. Please.

What to do if Cloudmark blocks you

If you get blocked by Cloudmark (and our abuse desk sent you to this page), our recommendation is to take a long, hard look at your content. There’s something in there that looks spammy. Given that Cloudmark is installed across 300 million inboxes and +100 ISPs around the world, it’s safe to say that your campaign looks spammy to a LOT of people.

If you’re not sure what "looks spammy" means, I’m not so sure you’re ready to be sending lots of email marketing.

Okay, maybe that was a bit out of line. I work at the abuse desk, so I get jaded sometimes. So here are a couple resources you need to read quick:

If you’re looking for a simple, silver bullet kind of answer for "how to just get me past the spam filters" prepare to be frustrated. There is no single answer. The best answer I’ve been able to tell people is:

  1. Open up your email program’s junk folder.
  2. Look at what spammers do.
  3. Then, don’t do that.

Cloudmark is everywhere

We’re members of the ESPC, and once sat in on a presentation that Cloudmark gave to the group. It was fascinating. Mostly because it was a "marketing guy" talking, who actually knew his stuff. No offense to marketing guys or anything. He knew about this stuff, and in the cases where he didn’t, he was smart enough to admit it. I distinctly remember a slide in his presentation where he showed almost every single major ISP in North America using Cloudmark. IIRC, the only ISP not on the list was AOL. They’re even partnered with ReturnPath (who we’re also partnered with) so that they can pull in sender reputation data.

If you run an ESP (and manage the abuse desk at an ESP), it’s the kind of slide that makes you gulp really loud. So I’m really glad we’ve got this in place for our abuse desk. I’ll post something later about how we’re using it to make better decisions about email abuse, who we warn, who we suspend, and who we shut down.