## November 21, 2004

### Poke a Stick in it

I thought about posting something about politics. Say, about how Rep. Istook (R, Oklahoma) managed to insert a paragraph into the 1300 page Omnibus Spending Bill, granting Congressional Committee Chairmen and their staff the right to examine the tax returns of anyone in the country, free of the restraints of privacy regulations. Or about how the House Republicans just voted to allow indicted felons to retain their Committee Chairmanships (how convenient: they could turn around and scrutinize the tax returns of the Prosecutor who indicted them).

Nah, I thought, just another typical week on Capitol Hill. I’ll only really get worried when the Republican Caucus votes to allow convicted felons to retain their Chairmanships and the Appropriations Bill contains language authorizing them to order FBI surveillance of whomever they choose.

Nope, if I want to choose a controversial topic, I should write something about XHTML and MIME types, what we, here at Musings, like to call the “third rail” of Web Design. There are those who say serving XHTML as text/html is just fine and those who say it’s evil incarnate. Sort of a Red State/Blue State thing, but with much lower stakes.

Me, I figure it’s pointless to argue about whether people currently serving XHTML as text/html should switch to serving it as application/xhtml+xml. 99% of it is ill-formed tag soup, which would instantly produce a yellow screen of death if served with the correct MIME type.

But what about the remaining 1%? “I’ve got this XHTML thing down,” they say, “I could switch anytime I want. I just don’t feel the need to … yet.” Yes, pick up a case of the Patch at the pharmacy, and you’ll do fine.

Except …

Aside from the obvious difference that well-formedness now become an absolute requirement (and not an easy one to satisfy) there are lots of subtler differences between otherwise identical documents, served as text/html and application/xhtml+xml. You may not have heard about it, since most discussions get bogged down well before they get to the subtle stuff.

Fortunately, Gez Lemon has started putting together a test suite to illustrate the differences. Help him out by contributing your own examples of how switching MIME types affects the rendering of XHTML documents.

[Gez also has a service which lets you test serving up any page as application/xhtml+xml. It’s somewhat more advanced than the one I linked to above.]

Posted by distler at November 21, 2004 11:54 PM

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/473

### Re: Poke a Stick in it

I’ve been thinking I might add an entry “How is the treatment of application/xhtml+xml documents different from the treatment of text/html documents?” in the Mozilla Web Author FAQ. It’s not a frequent newsgroup question but is seems to be a frequent blog issue.

Perhaps “Should I serve application/xhtml+xml to Mozilla?” would also warrant an entry in the FAQ.

Posted by: Henri Sivonen on November 22, 2004 6:34 AM | Permalink | Reply to this

### Re: Poke a Stick in it

Tracked by bug 271261. First draft of the new material is attached to the bug.

Posted by: Henri Sivonen on November 22, 2004 1:17 PM | Permalink | Reply to this

### Re: Poke a Stick in it

A good start. There are items on your list that aren’t (yet) in Gez’s test suite, and vice-versa.

It would be good to coordinate…

Posted by: Jacques Distler on November 22, 2004 3:10 PM | Permalink | PGP Sig | Reply to this

### Re: Poke a Stick in it

I’ve switched back from application/xhtml+xml to text/html because of my live Textile preview for comments. With HTML I can simply use innerHTML to inject the Textile generated HTML on the page, but with XHTML I have to use CreateContextualFragment, which, paradoxically, doesn’t work with the application/xhtml+xml MIME.

(BTW, the @code@ tag seems to be not working with the Textile text filter here…)

Posted by: Roberto on November 22, 2004 9:17 AM | Permalink | Reply to this

### Textile formatting

(BTW, the @code@ tag seems to be not working with the Textile text filter here…)

Sorry. We’re still using Textile 1.0 here. I wrote to Brad Choate about fixing Textile 2.0 to work with MathML content, but I never heard back.

Personally, I prefer Markdown these days. And John Gruber answers his emails.

…but with XHTML I have to use CreateContextualFragment, which, paradoxically, doesn’t work with the application/xhtml+xml MIME.

Yet another case for Gez’s list? It’s equally as non-Standard as innerHTML, albeit less-widely used.

Posted by: Jacques Distler on November 22, 2004 9:59 AM | Permalink | PGP Sig | Reply to this

### Re: Textile formatting

Sorry. We’re still using Textile 1.0 here. I wrote to Brad Choate about fixing Textile 2.0 to work with MathML content, but I never heard back.

That’s ok, I just made myself a Textile favelet. :)

Posted by: Roberto on November 24, 2004 12:07 PM | Permalink | Reply to this

### XHTML MIME type in Internet Explorer

Were you aware of this?

“Although a number of people seem to think that Internet Explorer doesn’t support it, the real answer is that it is just missing an entry in the registry to tell it what to do with that MIME type. At least, that is all I had to do to get it to work. (I’m sure some XHTML purist will say that IE is still broken even with this hack, but I don’t really care. All I wanted was to be able to browse the XSLTunit web site!)”

Posted by: Chris W. on November 22, 2004 10:56 AM | Permalink | Reply to this

### Re: XHTML MIME type in Internet Explorer

Cool!

Though, for an even better-tasting browser experience, you could download the MathPlayer plugin, and then IE/6 will handle, not just XHTML, but MathML as well.

Posted by: Jacques Distler on November 22, 2004 11:07 AM | Permalink | PGP Sig | Reply to this

### Re: XHTML MIME type in Internet Explorer

Does that send the doc down the tag soup code path? Parsing application/xhtml+xml with a tag soup parser is very wrong.

Posted by: Henri Sivonen on November 22, 2004 11:50 AM | Permalink | Reply to this

### Re: XHTML MIME type in Internet Explorer

IE does, AFAIK, have an XML parser and will handle XML files opened locally. It just won’t handle XML files received over the web.

That’s what this registry tweak, or the MathPlayer plugin, fixes.

Posted by: Jacques Distler on November 22, 2004 12:01 PM | Permalink | PGP Sig | Reply to this

### Re: XHTML MIME type in Internet Explorer

Have you verified that? If this did allow application/xhtml+xml as tag soup, we could be one default registry change away from killing XHTML-as-XML altogether.

Posted by: jgraham on November 22, 2004 12:06 PM | Permalink | Reply to this

### Dum da dum dum…

If IE renders the page, pull the knife out, pitch it in a nearby trashcan, and walk away, whistling…

Posted by: Jacques Distler on November 22, 2004 12:17 PM | Permalink | PGP Sig | Reply to this

### Fraud!

And, indeed, it turns out that this hack is a fraud: it tricks IE into thinking a document of type application/xhtml+xml is actually of type text/html.

Thus the document gets sent to the tag-soup parser, exactly as Henri and James feared.

This is, at best, useless. At worst, it is dangerous, if the user assumes, mistakenly, that the document has been parsed by IE’s XML parser.

Posted by: Jacques Distler on January 10, 2005 7:40 PM | Permalink | PGP Sig | Reply to this

### et tu, Bruté ?

To be utterly fair, MathPlayer 2.0 doesn’t turn IE into an XHTML UA either.

I thought it did, but Rikkert Koppes straightened me out.

MathPlayer 2.0 also sends the document down the tag-soup code-path. The MathML fragments are parsed using the XML parser (and must be well-formed). But the rest of the document can be tag-soup.

So, a registry hack is unnecessary. With a simple (and useful!) plugin, the world’s most popular browser parses application/xhtml+xml as tag-soup.

Posted by: Jacques Distler on February 11, 2005 5:51 PM | Permalink | PGP Sig | Reply to this

### Re: Poke a Stick in it

Factoid: The FIR thing mentioned on Mike Davidson’s page actually removes the element it replaces. Since this typically means a heading element, this technique has the property of removing all the heading semantics from a document just before a UA has the opportunity to put them to use. It would be funny if the alternative (display:none) wasn’t so blindingly obvious as to imply that no one using the technique has any idea what the point of document semantics is or indeed who might make use of those semantics (i.e. not just users without javascript/flash).

Differences between XHTML and text/html with the wrong doctype and a broken syntax: Evaluation of XPath expressions break. I may be the only person in the world who has ever encountered this problem (since no-one else uses XPath with HTML) but with XHTML you need to remember to include both fully qualified names (e.g. html:h1 rather than just h1) and a namespace resolving function that takes the prefix (e.g. html:)and returns the XHTML Namespace URI.

Posted by: jgraham on November 22, 2004 12:03 PM | Permalink | Reply to this

### SIFR

Factoid: The FIR thing mentioned on Mike Davidson’s page actually removes the element it replaces. Since this typically means a heading element, this technique has the property of removing all the heading semantics from a document just before a UA has the opportunity to put them to use. It would be funny if the alternative (display:none) wasn’t so blindingly obvious as to imply that no one using the technique has any idea what the point of document semantics is or indeed who might make use of those semantics (i.e. not just users without javascript/flash).

I’m stunned.

You mean they don’t just do a {whatever}.style.display="none" ?? …[checks the source of the latest version] … Damnit! You seem to be right:

Document semantics? We don’t need no steenkin’ semantics!

Posted by: Jacques Distler on November 23, 2004 12:03 AM | Permalink | PGP Sig | Reply to this

### Re: SIFR

Actually sIFR moves the text to a span inside the original element (header), which is then moved off screen. In no way the text is removed or display-none-d.

Posted by: Mark Wubben on November 23, 2004 2:38 PM | Permalink | Reply to this

### Re: SIFR

Maybe I misunderstand your javascript code.

But Mozilla’s DOM inspector says the headers are gone and Opera’s S and W key navigation is disabled on the sIFR-“replaced” headers on Mike’s blog.

Posted by: Jacques Distler on November 23, 2004 2:50 PM | Permalink | PGP Sig | Reply to this

### Re: Poke a Stick in it

I am apparently not smart enough to even figure out if you’re dissing sIFR or complimenting it. Judging from Jacques’s “tag soup” link and associated trackback, I assume whatever conversation is going on here has a negative connotation.

Oh well, at least I don’t care. :)

Posted by: Mike D. on November 23, 2004 1:06 PM | Permalink | Reply to this

Hint: Fire up Opera. Go to the main page of my blog. Try navigating it using the S and W keys. Now try the same thing on your blog.

(I believe there’s an extension for Mozilla, and various IE add-ons that achieve the same thing, but Opera comes with this functionality built in.)

Any questions?

Posted by: Jacques Distler on November 23, 2004 1:17 PM | Permalink | PGP Sig | Reply to this

To be fair to Mike, hiding the header text with {element}.style.display="none" disables this functionality in Opera. But the (by now) more accepted

{element}.style.position="absolute";
{element}.style.left="-999px";
{element}.style.width="990px";

has the desired effect of hiding the text, while still leaving header navigation via the keyboard enabled.

Posted by: Jacques Distler on November 23, 2004 2:16 PM | Permalink | PGP Sig | Reply to this

### Re: Poke a Stick in it

Interesting, when I fire up your blog in Opera, I receive a popup window (?!) telling me that IE doesn’t support MathML. I’m not sure why you’d want this to come up for Opera users. If you’re using a user agent sniffer to sniff for IE, it’s easy to tell when Opera is masquerading as IE, as it apparently is in this case. Note: I never use Opera and I haven’t changed any default settings, so this is the experience you’re providing for at least my default installation of Opera. Also, when I click around to other pages, I receive another popup window prompting me to download the IE version of the MathML plugin every time. Anyways, I’m sure this isn’t intentional, but I thought I’d let you know.

Also, with regards to the S and W keys… fair enough. Not an important feature to me, and not one that I think 99.9% of the world uses, but if you use it and you can’t use it on my site, then I guess it’s technically a shortcoming of my site. I can live with that though.

Also, keep in mind that my site has not been converted over to sIFR yet. Until the release version of sIFR 2.0 comes out (very soon), it still uses the old IFR code. sIFR does not use display: none to hide text.

Posted by: Mike D. on November 23, 2004 2:58 PM | Permalink | Reply to this

### Popups

when I fire up your blog in Opera, I receive a popup window (?!) telling me that IE doesn’t support MathML. I’m not sure why you’d want this to come up for Opera users. If you’re using a user agent sniffer to sniff for IE, it’s easy to tell when Opera is masquerading as IE, as it apparently is in this case.

Opera doesn’t support MathML either, so it is not unreasonable to inform them that my site’s functionality is broken for them, too.

I probably could be more discriminating about customizing the message for the UA that is actually being used. But it’s Opera Software ASA that decided to spoof IE, not me. I think their users ought to be given a hint about that.

Also, when I click around to other pages, I receive another popup window prompting me to download the IE version of the MathML plugin every time.

You should only receive that second popup once (unless you have cookies disabled).

Also, with regards to the S and W keys… fair enough. Not an important feature to me, and not one that I think 99.9% of the world uses, but if you use it and you can’t use it on my site, then I guess it’s technically a shortcoming of my site. I can live with that though.

There are a dozen other reasons why good document semantics can be helpful to your users (and conversely, why deliberately stripping out those semantics is bad for your users).

But I’m not going to argue the point. You’ll probably just tell me that those “features” aren’t important to you either.

Also, keep in mind that my site has not been converted over to sIFR yet. Until the release version of sIFR 2.0 comes out (very soon), it still uses the old IFR code.

The Javascript code I read in sIFR 2.0RC1 seemed to remove the header. Mark says I misread it. If so, I apologize.

In any case, I trust that the final release of sIFR 2.0 will not remove, turn into semantically-meaningless <span>s, or display:none the “replaced” <hn> elements.

Posted by: Jacques Distler on November 23, 2004 4:17 PM | Permalink | PGP Sig | Reply to this

### Re: Poke a Stick in it

Yeah, I’m not a big Opera fan either, and it’s ok to let them know their browser doesn’t support MathML, but to fully denigrate them to the status of IE users is a bit harsh :) . It’s kind of like telling someone they are illiterate for not knowing a certain isolated word. My personal policy is that when I want to use a feature which requires something above and beyond the browser (like a plug-in), I have it fail silently using javascript… but to each his own. I don’t begrudge anyone anything they want to do on their own sites.

With regarding to detection though, Opera still includes the word “opera” in the user agent string, even when it’s spoofing IE, so detecting it shouldn’t be a problem. It’s not so much hiding the fact that it is Opera… it’s more saying “I’m Opera and IE”.

As for document semantics, I’m all for including them when it is reasonable to do so. I’m not entirely convinced they are a panacea or that we will ever see this grand Semantic Web that some people believe in, but I think they clearly do a bit of good and not really any harm at this point in time. Thus, it is worth creating documents with good semantics with the hope that the future value of semantics is greater than the current value. I am convinced that this will be the case, but just not to what degree.

sIFR is designed to give you rich typography while leaving your document structure and HTML/XHTML completely untouched. As Mark pointed out, that is exactly what it does.

Posted by: Mike D. on November 23, 2004 5:15 PM | Permalink | Reply to this

### Opera Users

Yeah, I’m not a big Opera fan either, and it’s ok to let them know their browser doesn’t support MathML, but to fully denigrate them to the status of IE users is a bit harsh :) .

You are absolutely right. I will look into doing a bit more sophisticated job of browser detection.

Safari doesn’t support MathML either and I don’t subject Safari users to dunning popups. Opera users deserve at least the same treatment.

My objective of increasing the percentage of readers using MathML-capable browsers from the current 40% (Sep 2004 figures) to 70% is not really advanced by sending incomprehensible dunning notices to Opera users. (Both because they’re incomprehensible and because I have too few Opera users to make a difference.)

sIFR is designed to give you rich typography while leaving your document structure and HTML/XHTML completely untouched. As Mark pointed out, that is exactly what it does.

I got a chance to reread the sIFR source code (planes are great for that) and can confirm that it does do the right thing.

I’m glad to hear you’ll be upgrading from whatever it is you’re running now to sIFR in the near future.

What counts is not what the orginal (X)HTML markup looked like, but what the DOM tree looks like after all the Javascript rewriting is done. That’s what the UA actually uses. As this little Opera example shows, functionality can be lost if you strip out semantics at the DOM-scripting level, as your current setup does.

Posted by: Jacques Distler on November 24, 2004 12:30 AM | Permalink | PGP Sig | Reply to this

### Re: Opera Users

I got a chance to reread the sIFR source code (planes are great for that) and can confirm that it does do the right thing.

Ah, excellent! I’m sorry for asserting that it was misbehaving; like Jacques, a misreading of the code and ‘confirmation’ via the DOM inspector gave me entirely the wrong impression.

Posted by: jgraham on November 24, 2004 4:24 AM | Permalink | Reply to this

Post a New Comment