Feb 4, 2009

Most Common Spam Filter Triggers

We’re working on an experiment in the MailChimp Lab to help us automatically detect when someone’s about to send something too spammy from MailChimp (no, this is not what the supercomputer is for). We’re using Cloudmark, Barracuda, and Spam Assassin (and possibly Postini in the near future). We picked those, because they’re the most commonly used—and vexing—spam filters.

We’re not planning to expose any secret formulas, or help customers "get around spam filters." It’s more of a behind-the-scenes, "big brother" tool to help us catch exceptionally bad campaigns before they get sent. That’s the idea, at least, and we’re not sure when this’ll go live.

For now, we’re doing research. We’re currently scanning a few hundred thousand campaigns sent through MailChimp over the years, to see how many "false positives" we might trigger.

In the process, we’re uncovering a lot of innocent mistakes made by senders, plus a few surprises.

We’ve written about How Spam Filters Work in the past. Basically, spam filters look for certain "spammy criteria" in your messages. Each criteria gets a different score. Your message’s total score determines whether or not you’re blocked.

For example, putting the word "viagra" in your subject line is dangerous, for obvious reasons.

There are other, not-so-obvious criteria used by spam filters too. Like poorly coded HTML (spammers are notoriously bad coders). Or my personal favorite, using Microsoft Front Page. Ha. Also, simply using the word "Oprah" will get you a few points (for the record, the spam filters probably have nothing against Oprah—methinks her name is just used a lot by spammers).

If this is new and fascinating to you, I encourage you to read How Spam Filters Work.

Anyway, we’re looking at the most common triggers that MailChimp customers have been setting off.

Some of them are pretty surprising.

Top 10 Most Common Spam Filter Triggers

By far, the most common reason MailChimp customers have been flagged by spam filters is "too many images, not enough text." This is a very common mistake (see: Stupid HTML Email Design Mistakes), and I’ve blogged about this in the past. Over and over. (See: How Your Email Designs Can Get You Blacklisted, and this and this).

Anyway, here’s the top 10 list of spam filter criteria that MailChimp users are most guilty of. I’ve included the corresponding number of detected matches (keep in mind the system is not done scanning—it might take another week to finish):

  1. BODY: HTML has a low ratio of text to image area    (1,217 matches)
  2. BODY: Message only has text/html MIME parts    (971)
  3. BODY: HTML has a low ratio of text to image area    (729)
  4. BODY: HTML and text parts are different    (625)
  5. Subject is all capitals    (324)
  6. BODY: HTML and text parts are different    (279)
  7. BODY: HTML: images with 2400-2800 bytes of words    (211)
  8. BODY: HTML: images with 2000-2400 bytes of words    (194)
  9. BODY: HTML: images with 1200-1600 bytes of words    (178)
  10. BODY: HTML: images with 1600-2000 bytes of words    (178)

Number 5 is just idiotic. TYPING IN ALL CAPS = SCREAMING AND IS RUDE. Don’t type in all caps in your emails, please. Who does that?

Number 2 means somebody was lazy, and only included the HTML or the plain-text version of their emails, instead of both. I think that’s what it means. Spam filter rules can be cryptic sometimes (intentionally, perhaps).

But the rest of the detections on that list basically mean that the senders sent way, way too many images, and not enough readable text. Spam filters can’t read images. Spammers know that, so they often send spam that’s nothing but a big, ginormous image. And spam filters know that, so they in turn block email that they can’t read.

The battle between spam filters and spammers is brutal and never ending, and sometimes legit marketers get caught in the crossfire. Understand how both sides work, and do your best to cope.

But don’t try too hard to appease the spam filters. They don’t like that either (looks needy).

Not-So-Common Spam Filter Triggers

During our user research, we found some surprising spam filter triggers. Here are some examples:

  • The phrase, "extra inches" will get you a score of 3.1 by spam assassin. The phrase sounds like it came from some kind of "appendage enhancement" pharma-spam, right? Turns out it popped up 4 times in MailChimp, from relaxation & beauty spas. As in, "if your new years resolution is to shed some extra inches off your waistline, come in and…"
  • Dear FNAME, = "not very dear at all!" Do you merge the recipient’s FNAME into your messages? If so, don’t use the d-word. Turns out "Dear" will get you 2.7 spam points. That’s about halfway to getting your email blocked. Use something else, like "Howdy." At MailChimp, we use "dear" in just about all our demo videos and tutorials, because it’s the easiest way to explain mail merge tags. When we say, "Dear *|FNAME|*," people just get it. We might stop using this example. I’ve written about how  salutations can waste valuable space anyway.
  • "Stop Further Distribution" – In your footer, when you give people that unsubscribe link, don’t try to be all official and corporate sounding. The phrase, "stop further distribution" will get you 3.1 spammy points. By the way—"distribution?" Nobody says that.
  • "You registered with a partner" – If the body of your email contains that phrase, chances are very good that your email list is not permission-based. This actually sets off a few red flags in MailChimp’s list setup process,  (we get alerted when people enter that into their permission reminder), and I was pleasantly surprised to see that spam filters look for it too.

As you can see, your emails can get flagged as spam, even if you’re not a spammer. Your email delivery can suffer, even from an innocent mistake. If enough innocent mistakes happen, MailChimp’s overall deliverability can suffer. So we’re working on preventing that. Hopefully, you won’t be hearing from us soon.