## April 29, 2003

### Yummy, Yummy Tag Soup

A while back, Evan Goer said of this blog that

Jacques Distler may well be the only person on the planet who understands the XHTML 1 specification and uses it properly.

While flattering, this is surely hyperbole. There are plenty who understand XHTML much better than I. And I don’t believe that people can’t do XHTML properly. Most people simply don’t need XHTML. So there’s no incentive for them to do it right.

If they use it anyway, it’s probably a matter of Geek-chic. Slapping an XHTML DOCTYPE on your weblog is like wearing sunglasses at night. It looks cool! But isn’t necessarily very functional.

Anyway, true to his scientific training, Evan decided to test the quality of the XHTML “in the wild”. He decided to focus on the weblogs of the “Alpha Geeks” — the programmers, web-designers, and web-standards advocates — who are, surely, the most hip, Standards-savvy, knowledgeable folks around. If anyone can do XHTML right, they can.

The results of his survey of 119 Alpha Geek XHTML websites was pretty dismal. Only one site passed his 3 tests (he decided not to apply his 4th, somewhat subjective, “Why Are You Doing This?” test) and a startling 74% didn’t even have a main-page which validated.

So, what should we conclude? Wearing sunglasses at night is not only useless, it can be downright dangerous.

Posted by distler at April 29, 2003 10:48 AM

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/150

### jhttp is not an IANA-registered protocol

Did you spike the “survey” link with that j before the protocol for effect?

Posted by: Phil Ringnalda on April 30, 2003 3:01 AM | Permalink | Reply to this

### Re: jhttp is not an IANA-registered protocol

Thanks for the catch!

As you probably know, the W3C Validator makes no attempt to check the validity of attribute values.

Posted by: Jacques Distler on April 30, 2003 7:31 AM | Permalink | Reply to this

### Why XHTML

I wonder how long it will be before those of us with comments enabled think twice about linking to a weblog post without comments, knowing that we will become the comment host for that entry, for our readers.

You are certainly the only person I know of with a evident need for XHTML. However, there are also uses for XHTML that aren’t apparent at first glance. I switched over because I neeeded to do some self-scraping, and doing it with an XML parser seemed saner than using a huge chain of regexes or writing my own SGML parser.

I’d also quibble with his third test: the spec doesn’t say MUST use application/xhtml+xml, it says SHOULD. The word the isn’t normative, SHOULD is. MUST means must, SHOULD means should know why not and what it will break. I don’t use application/xhtml+xml because the gains are minimal, and since I’m not using a real XHTML tool to produce my pages, there’s the risk of delivering Mozilla’s XML error page between the time that I save a screwed up entry, or someone leaves a comment that gets past Sanitize, and the time I see it and fix it. By delivering XHTML as text/html, I get the safety net of browser error handling while still getting the benefits of XML for myself. Hixie’s understandably focused on browsers and nothing but, but that’s not why I need or want XHTML.

Posted by: Phil Ringnalda on April 30, 2003 3:29 AM | Permalink | Reply to this

### Re: Why XHTML

You are certainly the only person I know of with a evident need for XHTML. However, there are also uses for XHTML that aren’t apparent at first glance. I switched over because I neeeded to do some self-scraping, and doing it with an XML parser seemed saner than using a huge chain of regexes or writing my own SGML parser.

If you are going to use an XML parser, you need to have well-formed XML (not, necessarily valid XHTML, just well-formed). That’s what the application/xhtml+xml doctype advertises: this here is a well-formed X(HT)ML document.

I’d also quibble with his third test: the spec doesn’t say MUST use application/xhtml+xml, it says SHOULD…

This is partly a meta-comment. Evan’s been thinking about XHTML 2. In XHTML 2.0, “SHOULD” becomes “MUST”. Hixie does present arguments why serving XHTML as text/html is the wrong thing to do (both from the content-provider and the client’s perspectives). I tend to disagree somewhat with Evan on the persuasiveness of those arguments.

Anyway, only 12 of 119 sites passed tests 1&2, but failed test3. That’s a pretty disastrous record.

By delivering XHTML as text/html, I get the safety net of browser error handling while still getting the benefits of XML for myself.

If you have well-formed XML, then Mozilla will render it when sent as application/xhtml+xml. Your code does not need to be valid, just well-formed. If you don’t have well-formed XML, then your server-side XML parser will barf. So it seems to me that Phil Ringnalda has nothing to lose (and his readers may have something to gain — they, too can do XML-parsing) by serving his blog as application/xhtml+xml.

Posted by: Jacques Distler on April 30, 2003 7:54 AM | Permalink | Reply to this

### Why not application/xhtml+xml

I just typed an ampersand in your comment entry textarea, clicked Preview, and said “yep, MT’s still not quite ready for application/xhtml+xml.”

Give me a tool that refuse to allow me or anyone else to save not-well-formed or invalid XHTML, and I’m in.

Posted by: Phil Ringnalda on April 30, 2003 11:37 AM | Permalink | Reply to this

### Re: Why not application/xhtml+xml

Hmmmm.

I do use the mt-safe-href plugin, which takes care of unescaped ampersands in URLs. That’s been the only source of problems that I’ve had with ill-formed XHTML in comments.

I’ve disabled the post button, so that folks have to preview their comments before posting them. This will catch ill-formed XHTML submitted by Mozilla users. It won’t do squat to prevent IE users from posting crap, though.

I just haven’t (yet) had a problem with that. YMMV.

Posted by: Jacques Distler on April 30, 2003 1:13 PM | Permalink | Reply to this

### Re: Why not application/xhtml+xml

This should help.

Try posting some invalid XHTML in your comments now!

Posted by: Jacques Distler on April 30, 2003 5:16 PM | Permalink | Reply to this

### Re: Yummy, Yummy Tag Soup

On “spiking the survey”: No, Jacques’s site was not included. :)

As for the “SHOULD” versus “MUST” – yup, I recognize that. And I understand the need to serve up pages as text/html to XHTML-unaware browsers. That’s why Test #3 simply checks whether you’re serving up the right MIME-type to a XHTML-aware agent, namely Mozilla. My feeling is that if you’re advanced enough to use XHTML, you’re advanced enough to add a couple of lines to your .htaccess file.

But okay, yes – I suppose one can quibble with test #3. There is a lot of confusion out there on this issue, and the unhappy state of browser compliance is not helping matters. Let’s say for the sake of argument that I completely drop test #3. In that case, what Jacques says still stands: A) even on pure validation, compliance is pretty miserable (about 10%, among the Alpha Geeks) and B) the point of my exercise is really to look ahead to XHTML2, where the “SHOULD” becomes a “MUST”. At least badly-formed, improperly MIME-typed XHTML1 can fall back into tag-soup HTML. XHTML2 is when it starts getting scary.

Posted by: Evan on May 1, 2003 12:03 PM | Permalink | Reply to this

### Re: Yummy, Yummy Tag Soup

I want more data. Why did the sites fail? was it because of reasons not immediately under the author’s control? (i.e. poorly written user comments)

I think it would be useful to release research which shows why these sites are failing validation. The fact is, most people’s blogs are personal projects and therefore may not get the time and attention they need, and just as a clean bedroom slowly becomes messy as clothes accumulate on the floor, many content authors do not have the time to go through their code and clean it up all the time.

Posted by: Chason on May 15, 2003 1:40 PM | Permalink | Reply to this
Read the post In brief: bread machine edition
Weblog: dive into mark
Excerpt: I didn't know yeast ever expired. I thought it was like Twinkies, and fruitcake.
Tracked: May 15, 2003 5:06 PM

Post a New Comment