Aug 2, 2005

Cleaning Microsoft Word’s HTML

Do you create simple HTML email newsletters by typing them up in Microsoft Word, then saving as a web page? You may have noticed that the HTML that Microsoft generates is horribly bloated with lots of mysterious code…

So What IS All That Code Tht Microsoft Inserts?
We’ve seen cases where HTML that should normally be one line of code is turned into more than 100 lines! According to Microsoft, all that "extra" code they place is for "roundtripping," which means you can export a Word Doc as HTML, then import the HTML back into Word again, without losing any formatting.

Microsoft’s Office 2000 HTML Cleaner
But they admit the code takes a whole lot longer to download, and so they’ve created a tool (The Office 2000 HTML Filter) that serves as an add-in to Word 2000. The tool is designed to export leaner HTML code.

We gave it a try, and it did a pretty decent job. It trimmed a lot of the "fat" and brought a 150-line .htm file down to about 60. If you’re making fairly simple HTML emails from Word, but your code seems to take a while to download and render, we recommend you at least give the Office filter a try.

Textism’s Word Cleaner
Here’s another nice little tool you can use (it’s free, and it’s browser-based) to clean Microsoft’s extra code from your HTML:

It’s really, really strict, and it deletes a whole lot more code from your file, so you might end up adding some back (like font tags and CSS). But if you want really pure code, this is pretty nifty.

Text Editors Are Still Best
If you’re a little more willing to invest some time into learning HTML, you should look into some tools better suited to HTML email coding. Raw text editors work best, because they keep everything simple and "pure." We highly recommend BBEdit (for Mac), NoteTab Pro (PC), HomeSite (PC) and Dreamweaver (Mac and PC). If you prefer to use Adobe GoLive, here’s a useful HTML email guide we found for you.

HTML Cheat Sheets
Also, bookmark’s useful HTML cheatsheet and special characters guide.

Common Mistakes from Users of Microsoft Word

Here are common problems we’ve experienced with emails generated by Microsoft Word:

  • Code is too bloated with Microsoft-specific tags. Run it through a cleaner.
  • Using fonts that aren’t "web safe." Hey, just because you have Comic-Sans, doesn’t mean everyone else does. Only specify fonts that are on a wide spectrum of machines. Fonts like Arial, Helvetica, Times, Courier, Trebuchet and Verdana are safe.
  • Wacky and outrageous colors. Don’t make every other word in your email a different color. We’re not sure what it is about Word, but it seems to tempt people into going berserk with the font colors. Remember that when you assign different colors to fonts (especially blue and red, according to some sources), you’ll set off more spam filters (because that’s what the spammers do, remember?)

Word is really only good for generating very simple HTML email newsletters, with simple font formatting. It’s much, much better to use a real text editor to code your emails by hand. If you don’t know an ounce of HTML, it’s usually worth the investment to have a professional designer create a template for you. Then, with MailChimp, you can simply re-use the template for every campaign.