## May 11, 2003

### Bullet-Proofing II

For the second in my series of “How-To” articles on MovableType, I’ll continue on the theme of bullet-proofing against the inclusion of invalid content. Aside from the content you, yourself, write, there’s stuff other people write that gets included in your blog. Even if you trust yourself to produce valid markup, you can’t necessarily trust others to do the same. Hence the need for bullet-proofing.

Last week, we dealt with comments posted to your blog. In this case, the answer was pretty. Since you call the shots, all you have to do is run the comment through the Validator and ask the poster to correct the errors before allowing them to post the comment to your blog. Since Alexei Kosut was kind enough to wrap the W3C Validator Script as a MovableType plugin, the job of setting this up was much-simplified.

Next on the list are Trackbacks and Syndicated RSS Feeds. Since these are, by definition, stuff written elsewhere, you don’t have any control over the content. If it’s invalid, you can’t ask the author to correct it; you just have to deal. Consequently, our solution will be more ham-handed.

Let’s look at the snippet of template code for listing a Trackback on my blog (before any bullet-proofing)

<div class="trackback" id="p<$MTPingID$>">
Read the post <a href="<$MTPingURL$>" target="new"><$MTPingTitle$></a><br />
<b>Weblog:</b> <$MTPingBlogName$><br />
<b>Excerpt:</b> <$MTPingExcerpt$><br />
<b>Tracked:</b> <$MTPingDate$></div>

Of the various <$MTPing*$> tags in the above code snippet, the ones supplied by the person who sent the Trackback are

Let’s start with the last item. What evil stuff might the <$MTPingPingExcerpt$> contain? You name it: invalid HTML markup, unescaped entities (eg, &) and control characters.

“Control characters?” you say, “Who would insert control characters in their blog?” Well, if you copied and pasted the previous sentence from your browser window into the composition window of your blog and posted it, depending on what OS you are using, you probably did just that. The trouble is the way non-ascii characters (like the curly quotes above) are handled by your OS. If you want to do it right, do a “View Source” on this page and copy from there. Needless to say, most people don’t do it right, and control characters in blogs are as common as dirt.

<$MTPingURL$> could very well contain unescaped &s, and you never know what people will put in the title of their posts.

So, what to do?

MovableType provides global filters to strip HTML, encode entities, and last week, I wrote a plugin to strip control characters. The mt-safe-href plugin takes care of escaping &s in URLs. You can use it to to protect your own content with constructions like <$MTEntryBody safe_urls="1"$>, or here to protect just a single URL.

Let’s change the above code to

<div class="trackback" id="p<$MTPingID$>">
Read the post <a href="<$MTPingURL safe_url="1"$>" target="new"><$MTPingTitle strip_controlchars="1" remove_html="1" encode_html="1"$></a><br />
<b>Weblog:</b> <$MTPingBlogName$><br />
<b>Excerpt:</b> <$MTPingExcerpt strip_controlchars="1" remove_html="1" encode_html="1"$><br />
<b>Tracked:</b> <$MTPingDate$></div>

Voila! Bullet-proofed.

Well, … erm … I didn’t do anything to bullet-proof the Blog Name. I haven’t seen an invalid one yet. I’m kinda curious to see whether any exist. A more cautious sort would bullet-proof that one too.

A similar story with the Syndicated RSS Feeds in my Blogroll. The mt-rssfeed plugin provides the tags

<$MTRSSFeedItemLink$>
<$MTRSSFeedItemTitle$>
<$MTRSSFeedItemDescription$>

These need to be replaced by

<$MTRSSFeedItemLink safe_url="1"$>
<$MTRSSFeedItemTitle strip_controlchars="1" remove_html="1" encode_html="1"$>
<$MTRSSFeedItemDescription strip_controlchars="1" remove_html="1" encode_html="1"$>

Similar techniques should take care of other included content you might have.

That leaves only your own content to validate. Guess that will have to wait for another post.

Update (5/13/2003): I just installed Adam Kalsey’s Technorati plugin. This is another brilliant example of how invalid HTML on other people’s blogs — served up via the Technorati API — can mess with an XML parser (in this case, the one used by Adam’s plugin). I found the plugin practically unusable until I applied this patch, which escapes ampersands. Not a complete bullet-proofing job, but good enough.

Update (5/14/2003): The fix is in.

Update (9/13/2003): Well, it finally happened! I got a trackback with an invalid <$MTPingBlogName$>. I’m afraid, dear readers, that you need to bulletproof that one too.

Update (4/10/2004): I’ve released a new version of the MTStripControlChars plugin, with somewhat more sophisticated behavior.

Update (6/10/2004): Actually, with very recent version of Perl, there is a problem with the technique explained above. With previous version of Perl, the global filters would be executed in a predicable order. Not so any longer! If you have an MT tag with multiple filters applied to it, they will execute in a random order. If those filters “conflict” in some way, you will get random problems when you rebuild your pages. Sometimes it will work right, sometimes it won’t. To fix this, we need to enforce a certain order of execution of the filters. In particular, we want the strip_controlchars filter to execute before the encode_html filter. To do this, we use the MTBlock plugin. For instance, we want to write

<b>Excerpt:</b> <MTBlock encode_html="1"><$MTPingExcerpt remove_html="1" strip_controlchars="2"$></MTBlock><br />

in the third line of above code snippet (and similarly for other occurences of strip_controlchars and encode_html).

#### Update (2/20/2005):

With my new, “internationalized” trackback setup, bulletproofing trackbacks is a bit easier. Escaping is handled internally, so all we need to do in the templates is
<div class="trackback" id="p<$MTPingID$>">
Read the post <a href="<$MTPingURL safe_url="1"$>" target="new"><$MTPingTitle remove_html="1" strip_controlchars="2"$></a><br />
<b>Weblog:</b> <$MTPingBlogName remove_html="1" strip_controlchars="2"$><br />
<b>Excerpt:</b> <$MTPingExcerpt remove_html="1" strip_controlchars="2"$><br />
<b>Tracked:</b> <$MTPingDate$></div>
Posted by distler at May 11, 2003 9:08 AM

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/161

Weblog: kurcula.com
Excerpt: I like Josh's idea of dumping links in blog entries. So here is the list: CSS3 modules to Candidate Recommendation:...
Tracked: May 15, 2003 2:58 PM
Weblog: ecrosstexas: the texas blog
Excerpt: Jacques Distler, Physics Professor at UT Austin provides the first two parts in a series of "How-to" articles on Validating Comments and Bullet-Proofing II in Moveable Type....
Tracked: May 19, 2003 2:40 PM
Weblog: Reflective Reality.
Excerpt: Technorati plugin v1 - This is the best plugin yet! Thank you Adam Kalsey! Using the Technorati API it creates links to the blogs that link you. Top toy! No referral logs, no dodgy blog title problems (c/o Musings) just...
Tracked: August 16, 2003 4:40 AM
Weblog: ecrosstexas: the texas blog
Excerpt: Jacques Distler, Physics Professor at UT Austin provides the first two parts in a series of "How-to" articles on Validating Comments and Bullet-Proofing II in Moveable Type....
Tracked: October 27, 2003 12:50 PM
Read the post Time to convert XHTML 1.0 Strict
Weblog: Eclectic Echoes
Excerpt: After the New York trip, I think it will be time to change the site over from XHTML 1.0 Transitional to XHTML 1.1. The changes should be limited, mostly changing the javascript and form areas. The main area’s right now to break would be:&lt;s...
Tracked: January 13, 2004 3:37 PM
Read the post Front-end and Back-end Changes
Weblog: Procrastination
Excerpt: There have been a lot of changes here recently, most of them on the back-end. Most of this work was related to having a bilingual (English and Urdu) blog along with MathML equations. This required valid XHTML 1.1 and serving...
Tracked: February 11, 2005 8:24 AM
Read the post Are You Being Served? -- Part II
Weblog: Upon Reflection
Excerpt: Step 3: MathML At this point you should have valid XHTML 1.1 pages served with the correct MIME type to all the good little browsers.... It is much easier to use some kind of shorthand to enter equations, and let Movable Type convert it to MathML when...
Tracked: September 30, 2005 1:02 PM
Read the post Are You Being Served? -- Part II
Weblog: Upon Reflection
Excerpt: In Part I, we discussed how to validate your web site and configure your server to send math-able web pages. At this point you should have valid XHTML 1.1 pages served with the correct MIME type to all the good...
Tracked: August 29, 2008 1:49 AM

Post a New Comment