MailChimp’s abuse desk runs Cloudmark to perform occasional “customer audits.” We basically scan for problem campaigns on our system that might jeopardize the deliverability of our servers. What’s Cloudmark, why do we use it, and how does it work?
Cloudmark is an advanced “message security” system that protects more than 300 million inboxes and works with more than 100 of the world’s largest ISPs and mobile operator networks such as EarthLink, Comcast, Cablevision, Charter Communications, Cox Communications, NTT Communications, Sprint Nextel, Virgin Media and Swisscom, as well as hosted messaging providers, including domainFACTORY and NuVox.
So if you send lots of email marketing, it’s kind of important to know who they are.
But how does their spam filtering technology (its fingerprinting algorithm) work?
Well, it’s a secret. Understandably so, because if they told everyone how they work, that would kind of defeat the purpose.
But here’s what they will tell you (from their website sales material):
Cloudmark’s Advanced Message Fingerprinting™ algorithms were designed to target sophisticated spamming and virus proliferation techniques. Unlike rules, Cloudmark fingerprinting algorithms are extremely lightweight, each optimized to perform only minimal processing on a message. As a result, message throughput is extremely fast and less processing CPU is required.
And here’s how they explain their Fingerprinting algorithm:
So they’re taking chunks of your message (which I assume could be content, senderscore reputation, and code), and taking it out of the context of your email campaign. I don’t know if this is done for speed, or as some kind of “double blind” methodology or what. Then they classify the chunks into “fingerprints.” Then, they compare those fingerprints from your campaign with other fingerprints in their database that have been classified as spam.
This is where I invite any geek out there who knows way better than me to please comment below. Please.
What to do if Cloudmark blocks you
If you get blocked by Cloudmark (and our abuse desk sent you to this page), our recommendation is to take a long, hard look at your content. There’s something in there that looks spammy. Given that Cloudmark is installed across 300 million inboxes and +100 ISPs around the world, it’s safe to say that your campaign looks spammy to a LOT of people.
If you’re not sure what “looks spammy” means, I’m not so sure you’re ready to be sending lots of email marketing.
Okay, maybe that was a bit out of line. I work at the abuse desk, so I get jaded sometimes. So here are a couple resources you need to read quick:
If you’re looking for a simple, silver bullet kind of answer for “how to just get me past the spam filters” prepare to be frustrated. There is no single answer. The best answer I’ve been able to tell people is:
- Open up your email program’s junk folder.
- Look at what spammers do.
- Then, don’t do that.
Cloudmark is everywhere
We’re members of the ESPC, and once sat in on a presentation that Cloudmark gave to the group. It was fascinating. Mostly because it was a “marketing guy” talking, who actually knew his stuff. No offense to marketing guys or anything. He knew about this stuff, and in the cases where he didn’t, he was smart enough to admit it. I distinctly remember a slide in his presentation where he showed almost every single major ISP in North America using Cloudmark. IIRC, the only ISP not on the list was AOL. They’re even partnered with ReturnPath (who we’re also partnered with) so that they can pull in sender reputation data.
If you run an ESP (and manage the abuse desk at an ESP), it’s the kind of slide that makes you gulp really loud. So I’m really glad we’ve got this in place for our abuse desk. I’ll post something later about how we’re using it to make better decisions about email abuse, who we warn, who we suspend, and who we shut down.


I’d guess from the description and that little diagram that they take certain HTML nodes in the message content (links, headers, images, maybe even paragraphs) and run them through a hashing function to get a numeric code describing the content. Then they see if that same code has been entered into their database of naughty email pieces. If enough of them match (which means your email has a number of the same parts as known spam, for example a link to mypennystockscam.com), you are likely to be spam too.
This would be distinct from a rule-based system which checks the message for many specific conditions like {Contains the phrase “penny stocks”} and would be faster and less susceptible to spammer trickery like misspelling words and moving pieces around randomly.
The simplest kind of “fingerprinting” you can do on a piece of data is to compute its hash using an algorithm such as SHA-1. This reduces a piece of text, of any length, into a 20-byte “fingerprint”.
Because changing the input to a hash algorithm even by one letter completely alters the output, the fingerprints generated are – for any given piece of text – essentially unique and unpredictable.
This is probably why Cloudmark extract numerous chunks from the message rather than computing the fingerprint from the entire message body. Otherwise your merge tags would make each persons’ email within a single campaign ever-so-slightly different, radically altering the “fingerprint” for each mail.
This is why spammers insert random garbage text into the body or subject line of spam email: they’re trying to evade tools that blacklist emails by only generating a single fingerprint against the entire message body.
With numerous fingerprints calculated, CloudMark can then see how many of them are spammy according to their database.
So even if your mail does have some text/fingerprints that happen to coincide with some spam (e.g. common phrases like “You are receiving this email because…”), hopefully you’ll have a much higher concentration of good, unique, non-spammy content.
Probably much over-simplified from how CloudMark works, and possibly a bit too geeky, but there you are..
When I wrote this post, I was worried that customers would think that they were gonna have access to our Cloudmark system.
Because this is strictly for our abuse desk.
I just received word from ReturnPath that they’re adding Cloudmark and Barracuda to our Inbox Inspector tool. Woo-hoo!
We’re gonna look into getting these up and running. Super exciting news!
@Ryan – that sounds pretty good to me!
@Christopher and @Ryan – Nerd alert! Can’t believe you guys fell for this.
Seriously, thanks very much for the scientific explanations.
I also would like to thank Christopher and Ryan for the explanations.
Does Cloudmark actually “block” emails from reaching their destination or do they filter emails to SPAM if its fingerprinted? Also, if the better part of your email deploy is fingerprinted (most hit the SPAM box) can that result in a domain block or do consumers actually have to mark it as a SPAM complaint?
I’d guess from the description and that little diagram that they take certain HTML nodes in the message content (links, headers, images, maybe even paragraphs) and run them through a hashing function to get a numeric code describing the content. Then they see if that same code has been entered into their database of naughty email pieces. If enough of them match (which means your email has a number of the same parts as known spam, for example a link to mypennystockscam.com), you are likely to be spam too.
This would be distinct from a rule-based system which checks the message for many specific conditions like {Contains the phrase “penny stocks”} and would be faster and less susceptible to spammer trickery like misspelling words and moving pieces around randomly.