<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" > <channel><title>Comments on: Cloudmark Fingerprinting Algorithm</title> <atom:link href="http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/feed/" rel="self" type="application/rss+xml" /><link>http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/</link> <description>MailChimp, email marketing, and monkeys!</description> <lastBuildDate>Thu, 09 Feb 2012 21:21:24 +0000</lastBuildDate> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <item><title>By: ครีมมะขามพะเยา</title><link>http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/#comment-33801</link> <dc:creator>ครีมมะขามพะเยา</dc:creator> <pubDate>Sat, 24 Sep 2011 04:42:27 +0000</pubDate> <guid isPermaLink="false">http://blog.mailchimp.com/?p=2580#comment-33801</guid> <description>I’d guess from the description and that little diagram that they take certain HTML nodes in the message content (links, headers, images, maybe even paragraphs) and run them through a hashing function to get a numeric code describing the content. Then they see if that same code has been entered into their database of naughty email pieces. If enough of them match (which means your email has a number of the same parts as known spam, for example a link to mypennystockscam.com), you are likely to be spam too.This would be distinct from a rule-based system which checks the message for many specific conditions like {Contains the phrase “penny stocks”} and would be faster and less susceptible to spammer trickery like misspelling words and moving pieces around randomly.</description> <content:encoded><![CDATA[<p>I’d guess from the description and that little diagram that they take certain HTML nodes in the message content (links, headers, images, maybe even paragraphs) and run them through a hashing function to get a numeric code describing the content. Then they see if that same code has been entered into their database of naughty email pieces. If enough of them match (which means your email has a number of the same parts as known spam, for example a link to mypennystockscam.com), you are likely to be spam too.</p><p>This would be distinct from a rule-based system which checks the message for many specific conditions like {Contains the phrase “penny stocks”} and would be faster and less susceptible to spammer trickery like misspelling words and moving pieces around randomly.</p> ]]></content:encoded> </item> <item><title>By: Jrad5221</title><link>http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/#comment-21369</link> <dc:creator>Jrad5221</dc:creator> <pubDate>Wed, 10 Aug 2011 18:33:34 +0000</pubDate> <guid isPermaLink="false">http://blog.mailchimp.com/?p=2580#comment-21369</guid> <description>Does Cloudmark actually &quot;block&quot; emails from reaching their destination or do they filter emails to SPAM if its fingerprinted? Also, if the better part of your email deploy is fingerprinted (most hit the SPAM box) can that result in a domain block or do consumers actually have to mark it as a SPAM complaint?</description> <content:encoded><![CDATA[<p>Does Cloudmark actually &#8220;block&#8221; emails from reaching their destination or do they filter emails to SPAM if its fingerprinted? Also, if the better part of your email deploy is fingerprinted (most hit the SPAM box) can that result in a domain block or do consumers actually have to mark it as a SPAM complaint?</p> ]]></content:encoded> </item> <item><title>By: Tiago</title><link>http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/#comment-3203</link> <dc:creator>Tiago</dc:creator> <pubDate>Sat, 07 Mar 2009 02:25:17 +0000</pubDate> <guid isPermaLink="false">http://blog.mailchimp.com/?p=2580#comment-3203</guid> <description>I also would like to thank Christopher and Ryan for the explanations.</description> <content:encoded><![CDATA[<p>I also would like to thank Christopher and Ryan for the explanations.</p> ]]></content:encoded> </item> <item><title>By: Ben</title><link>http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/#comment-3201</link> <dc:creator>Ben</dc:creator> <pubDate>Sat, 07 Mar 2009 00:57:56 +0000</pubDate> <guid isPermaLink="false">http://blog.mailchimp.com/?p=2580#comment-3201</guid> <description>@Christopher and @Ryan - Nerd alert! Can&#039;t believe you guys fell for this.Seriously, thanks very much for the scientific explanations.</description> <content:encoded><![CDATA[<p>@Christopher and @Ryan &#8211; Nerd alert! Can&#8217;t believe you guys fell for this.</p><p>Seriously, thanks very much for the scientific explanations.</p> ]]></content:encoded> </item> <item><title>By: Ben</title><link>http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/#comment-3200</link> <dc:creator>Ben</dc:creator> <pubDate>Sat, 07 Mar 2009 00:20:12 +0000</pubDate> <guid isPermaLink="false">http://blog.mailchimp.com/?p=2580#comment-3200</guid> <description>@Ryan - that sounds pretty good to me!</description> <content:encoded><![CDATA[<p>@Ryan &#8211; that sounds pretty good to me!</p> ]]></content:encoded> </item> <item><title>By: Ben</title><link>http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/#comment-3199</link> <dc:creator>Ben</dc:creator> <pubDate>Sat, 07 Mar 2009 00:19:21 +0000</pubDate> <guid isPermaLink="false">http://blog.mailchimp.com/?p=2580#comment-3199</guid> <description>When I wrote this post, I was worried that customers would think that they were gonna have access to our Cloudmark system.Because this is strictly for our abuse desk.I just received word from ReturnPath that they&#039;re adding Cloudmark and Barracuda to our Inbox Inspector tool. Woo-hoo!We&#039;re gonna look into getting these up and running. Super exciting news!</description> <content:encoded><![CDATA[<p>When I wrote this post, I was worried that customers would think that they were gonna have access to our Cloudmark system.</p><p>Because this is strictly for our abuse desk.</p><p>I just received word from ReturnPath that they&#8217;re adding Cloudmark and Barracuda to our Inbox Inspector tool. Woo-hoo!</p><p>We&#8217;re gonna look into getting these up and running. Super exciting news!</p> ]]></content:encoded> </item> <item><title>By: Christopher</title><link>http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/#comment-3198</link> <dc:creator>Christopher</dc:creator> <pubDate>Sat, 07 Mar 2009 00:18:08 +0000</pubDate> <guid isPermaLink="false">http://blog.mailchimp.com/?p=2580#comment-3198</guid> <description>The simplest kind of &quot;fingerprinting&quot; you can do on a piece of data is to compute its hash using an algorithm such as SHA-1.  This reduces a piece of text, of any length, into a 20-byte &quot;fingerprint&quot;.Because changing the input to a hash algorithm even by one letter completely alters the output, the fingerprints generated are - for any given piece of text - essentially unique and unpredictable.This is probably why Cloudmark extract numerous chunks from the message rather than computing the fingerprint from the entire message body.  Otherwise your merge tags would make each persons&#039; email within a single campaign ever-so-slightly different, radically altering the &quot;fingerprint&quot; for each mail.This is why spammers insert random garbage text into the body or subject line of spam email: they&#039;re trying to evade tools that blacklist emails by only generating a single fingerprint against the entire message body.With numerous fingerprints calculated, CloudMark can then see how many of them are spammy according to their database. So even if your mail does have some text/fingerprints that happen to coincide with some spam (e.g. common phrases like &quot;You are receiving this email because...&quot;), hopefully you&#039;ll have a much higher concentration of good, unique, non-spammy content.Probably much over-simplified from how CloudMark works, and possibly a bit too geeky, but there you are.. :)</description> <content:encoded><![CDATA[<p>The simplest kind of &#8220;fingerprinting&#8221; you can do on a piece of data is to compute its hash using an algorithm such as SHA-1.  This reduces a piece of text, of any length, into a 20-byte &#8220;fingerprint&#8221;.</p><p>Because changing the input to a hash algorithm even by one letter completely alters the output, the fingerprints generated are &#8211; for any given piece of text &#8211; essentially unique and unpredictable.</p><p>This is probably why Cloudmark extract numerous chunks from the message rather than computing the fingerprint from the entire message body.  Otherwise your merge tags would make each persons&#8217; email within a single campaign ever-so-slightly different, radically altering the &#8220;fingerprint&#8221; for each mail.</p><p>This is why spammers insert random garbage text into the body or subject line of spam email: they&#8217;re trying to evade tools that blacklist emails by only generating a single fingerprint against the entire message body.</p><p>With numerous fingerprints calculated, CloudMark can then see how many of them are spammy according to their database.<br /> So even if your mail does have some text/fingerprints that happen to coincide with some spam (e.g. common phrases like &#8220;You are receiving this email because&#8230;&#8221;), hopefully you&#8217;ll have a much higher concentration of good, unique, non-spammy content.</p><p>Probably much over-simplified from how CloudMark works, and possibly a bit too geeky, but there you are.. <img src='http://blog.mailchimp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p> ]]></content:encoded> </item> <item><title>By: Ryan</title><link>http://blog.mailchimp.com/cloudmark-fingerprinting-algorithm/#comment-3197</link> <dc:creator>Ryan</dc:creator> <pubDate>Fri, 06 Mar 2009 23:42:44 +0000</pubDate> <guid isPermaLink="false">http://blog.mailchimp.com/?p=2580#comment-3197</guid> <description>I&#039;d guess from the description and that little diagram that they take certain HTML nodes in the message content (links, headers, images, maybe even paragraphs) and run them through a hashing function to get a numeric code describing the content. Then they see if that same code has been entered into their database of naughty email pieces. If enough of them match (which means your email has a number of the same parts as known spam, for example a link to mypennystockscam.com), you are likely to be spam too.This would be distinct from a rule-based system which checks the message for many specific conditions like {Contains the phrase &quot;penny stocks&quot;} and would be faster and less susceptible to spammer trickery like misspelling words and moving pieces around randomly.</description> <content:encoded><![CDATA[<p>I&#8217;d guess from the description and that little diagram that they take certain HTML nodes in the message content (links, headers, images, maybe even paragraphs) and run them through a hashing function to get a numeric code describing the content. Then they see if that same code has been entered into their database of naughty email pieces. If enough of them match (which means your email has a number of the same parts as known spam, for example a link to mypennystockscam.com), you are likely to be spam too.</p><p>This would be distinct from a rule-based system which checks the message for many specific conditions like {Contains the phrase &#8220;penny stocks&#8221;} and would be faster and less susceptible to spammer trickery like misspelling words and moving pieces around randomly.</p> ]]></content:encoded> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Object Caching 289/289 objects using disk: basic

Served from: blog.mailchimp.com @ 2012-02-09 17:33:57 -->
