Simplify, Simplify (HTML portion)

If your E-book’s .html file has too much junk in it, take Thoreau’s advice and “simplify, simplify.”

Here’s what we’ll do.

  1. Make multiple backup copies.
  2. Change our HTML file to HTML5.
  3. Tidy up our HTML code automatically.
  4. Tidy up our HTML code manually.
  5. We’ll tidy up the CSS part next.

1. Make multiple copies of our .html (or .htm) file

Do it. We’ll be performing major surgery on these files. E-mail one to yourself. Put one on a USB thumb drive. Put one on DropBox. Redundancy defines our age. I’m starting with the file “YBR_1_MSWord_Style_Formatted.htm“. All Yellow Buick Review source files are free to download.

2. Fire up an HTML editor and convert to HTML5.

Once you’ve settled on an HTML editor, let’s set/change our Doctype declaration (DTD) to HTML5, which is, at this writing, the standard for E-book HTML. Nearly any HTML editor can do this through a menu command. In Dreamweaver, that goes “Modify -> Page Properties,” then this:

html5 dialog box

HTML5 Doctype change in Dreamweaver

(I’ve added in a title, since Dreamweaver asked.) That changes our header info from this:


<meta name=Title content="">
<meta name=Keywords content="">
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=Generator content="Microsoft Word 14 (filtered)">


<!doctype html>
<meta name="Title" content="Yellow Buick Review">
<meta name="Keywords" content="">
<meta charset="UTF-8">
<title>Yellow Buick Review</title>
<meta name="Generator" content="Microsoft Word 14 (filtered)">

(If the “Title” change in Dreamweaver doesn’t stick, just type it in after “content.”) Actually, the Doctype portion of that is just the <!doctype html> line. HTML5 is pretty terse about doctype. This small change affects everything in the document. Scroll down to around line 346. (Note: Your line numbers ought to be the same as mine, if you’re using my files. But different HTML editors can make their changes in slightly different ways, so allow a few lines +/- if you’ve made any edits.) Before the change, this line read:

<h2>April Contest Winners</h2>

<p class=ProseStanza>We are delighted to announce the winners of our third

Now that I’ve converted it to HTML5, the same lines start around line 348, and they read:

<h2>April Contest Winners</h2>

<p class="ProseStanza">We are delighted to announce the winners of our third

Notice the quotes around “ProseStanza”? Obviously, I couldn’t possibly hand-code hundreds or thousands of quotation marks for my classes. And those quotation marks need to be there. A Web browser might be forgiving enough to render the page anyway, but an E-reader almost certainly will not.

3. Tidy up the HTML Code.

MS Word, regardless of the settings you use for export, puts a lot of extra junk into its HTML code, particularly in the CSS portion. (Remember, at the moment, we’re working from one big file, so the CSS language and the HTML language are in the same .html file.) Fortunately, Dreamweaver has some built-in tools to help me clear through some of the clutter. I go to the Dreamweaver menu “Commands -> Clean Up Word HTML.”

dialog window screenshot

It’s not perfect, but it sure is easy.

These fixes and consolidations go all throughout my .html document. A few minutes ago, “April Contest Winners” was on line 346:

lines 346-375 of our code

Before Word Cleanup, the code had a lot of white space and blanks.

Now it’s on line 219:

code screenshot after cleanup

Dreamweaver tightens up the code and reduces MS Word’s bloat

Dreamweaver removed many of the blank lines in the code. That may seem efficient or claustrophobic depending on your point of view. Please note that blank lines in HTML or CSS code do not show up in the final presentation. There will be no difference in how a Web browser or a Kindle interprets the following:

<p class="PoemStanzaIndent1">lorem quis tristique imperdiet,</p>
<p class="PoemLineIndent1">nulla dolor auctor augue,</p>
<p class="PoemLineIndent1">quis egestas elit ante id nisi.</p>
<p class="PoemStanza">integer pretium non</p>
<p class="PoemLine">nunc et tincidunt.</p>
<p class="PoemLine">morbi ante magna,</p>

And this:

<p class="PoemStanzaIndent1">lorem quis tristique imperdiet,</p>

<p class="PoemLineIndent1">nulla dolor auctor augue,</p>

<p class="PoemLineIndent1">quis egestas elit ante id nisi.</p>

<p class="PoemStanza">integer pretium non</p>

<p class="PoemLine">nunc et tincidunt.</p>

<p class="PoemLine">morbi ante magna,</p>

Personally, I don’t care if my <p> lines are single-spaced, with no spaces. But I usually put a blank line of “breather” before an <h2> element or a new section. It doesn’t show up in the E-book, but it does make my own code easier to read and maintain.

I’ll change the title of this document to “Yellow Buick Review cleaned up” and save it as “YBR_2_DW_Word_Cleanup.html” if you’re following along at home from the sample files. Note that I’ve added an “l” (lowercase L) to the file extension, just down to personal preference.

These files are now technically cleaner, but that doesn’t mean they look better in a web browser. Far from it! Consider how the MS Word formatted .htm file compares with the Dreamweaver “cleanup” .html file:

ms word html screenshot

The html file straight from Word.

dreamweaver html screenshot

The HTML file “cleaned” by Dreamweaver.

See that? The Word file has extra spaces in the code, but the proper line spacing in the rendered document. The Dreamweaver file has no extra spaces in the code, but terrible line spacing in the rendered document. Even worse, the Dreamweaver document has clipped the left margin, so that the first letter of every line is cut off!

It’s okay. Truly it is. After all, Word is just a crutch. The fault isn’t in the HTML portion of our file, which is the portion we’ll (mostly) be keeping. The fault is in the terrible CSS portion, which we’ll (mostly) be tossing out. I mean, look at this garbage:

css junk screenshot

What IS this junk?

We’ll clear out the CSS junk in a future blog post.

4. Tidy up our HTML code manually.

The cleanup from Dreamweaver was just a first step, not a complete fix. Even after Dreamweaver has loosened the pickle jar lid for me, I have extra and outdated code. Here’s where I’ll use my find/replace tools. I’ll look for <i> tags and replace those with <em> tags. They both mean italicize these words, but <em> seems to be the standard these days. Kindle is okay either way. Same deal with <b> and <strong>; Kindle specifically states that these are equivalent, but why take chances? <span>, in my opinion, is just asking for trouble. I replace 8x <i> tags and 8x </i> tags. I replace 5x <b> tags with <strong>, and 5x </b> tags with </strong>. It’s important that my < > and my </ > tags match. It’s as if I were “balancing the checkbook” and making sure I don’t have unclosed tags. I’ll save this file as “YBR_3_DW_FindReplace.html“. Switch to that file now if you’re following along.

Next, I’ll look for <span>s I don’t want or need. Line 172 reads

 <div class="WordSection1”>

and that’s almost certainly useless Word junk. Lines 176-177 say

<br clear="all" style='page-break-before:always'>

That can go for sure. I’ll replace all seven of those in the document with plain old <br /> tags for now. If I want page breaks, I’ll define special .classes for that. Probably on my <h2> tags, for example. And I’ll delete spans and needless breaks, such as these:

screenshot of code to replace

I can delete the highlit lines.

When I delete these things, I also ougt to delete any tags that close them. When I delete line 172 up above, I’ll also go delete the </div> line that concludes it way down in lines 410-411. After deleting these lines, I’ll save my .html file as “YBR_4_DW_DivSpan.html“.

We’re halfway there! We’ve deleted most of the junk from the HTML half of this file. Next up, we’ll work on the CSS half.

Updated March 1, 2015 with revised line numbers and artwork.

1 Comment

Filed under Code Samples, Look and Feel

One response to “Simplify, Simplify (HTML portion)

  1. Pingback: After Step 3 Comes…Step 3? | The Yellow Buick Review

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s