Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

April 18, 2006

Atom Torture Test

For somewhat inexplicable reasons, I got into a conversation about RSS feeds in general, and mine in particular, over at Sam’s blog.

As you might have noticed, I’ve deprecated several of my RSS 2.0 feeds, in favour of their Atom alternatives. There’s a reason for that. If you’ve subscribed to a full-content feed, it’s probably because you want to be able to read the content of my posts in your feedreader. That’s cool with me, but for a variety of reasons, those RSS 2.0 feeds will not display the content correctly. And there is really no way to fix them.

At least in principle, the Atom alternatives ought to be better. But are they?

Here are 4 areas where your RSS feedreader fell on the floor. Let’s see if its Atom support is any better. For the first one, there are already tests in the suite of Atom conformance tests. For the latter three, I wrote a little test feed of my own.

  1. Relative URLs: relative URLs in my posts (or in comments thereon) should have been interpreted as relative to the URL in the <link> element of each <item>. Since such eventualities are not actually covered in the RSS 2.0 “Spec”, chances are those links were broken. Atom adds explicit support for xml:base. Does your feedreader actually implement it properly?
  2. XHTML: RSS 2.0 has no mechanism for telling your feedreader that the markup in the posts is anything other than tag-soup. In fact, without explicit extensions, like <content:encoded>, it doesn’t even have a mechanism for telling the feedreader that it contains markup at all (the feedreader has to guess). Atom includes a type="xhtml" attribute which tells the feedreader that the content is actually XHTML. Does your feedreader pay attention, or does it just assume that everything anyone writes is tag-soup?
  3. MathML: if your feedreader recognizes the content is XHTML, and the rendering engine it uses is MathML-aware, then you might just be able to see the equations. (Thunderbird ought to fall into this class, as do, I am told, some Windows-based feedreaders when the MathPlayer Plugin is installed.)
  4. SVG: I use SVG for figures. No, I don’t include the SVG inline (thought that would make a nice torture-test). I include them using the <object> element. Nested as the content of the <object> element is a GIF image, to be used as a fallback alternative.
    <object type="image/svg+xml" data="http://golem.ph.utexas.edu/~distler/blog/svg/bhformation.svg">
      <img src="http://golem.ph.utexas.edu/~distler/blog/svg/bhformation.gif" alt="..." />
    </object>
    Now, many feedreaders quite properly sanitize their inputs to eliminate hostile applets (loaded via the <object> element). I suppose I can forgive feedreader authors for stripping out all <object> elements, rather than trying to distinguish between potentially hostile ones, like applets, and obviously benign ones, like SVG files. But, either way, it is inexcusable to also strip out the fallback content of the <object> element. You should, at the very least, see the GIF image.

So how did your feedreader do? Did switching to Atom actually fix any of these issues for you?

Update (4/20/2006):

In response to a reader request, I added another test of XHTML-compliance, one that doesn’t use peculiarities of the handling of the TBODY element.
Posted by distler at April 18, 2006 8:08 AM

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/793

25 Comments & 4 Trackbacks

Re: Atom Torture Test

Care to create a ConformanceTest page on the Atom wiki, and seed it with a row or two of results? If you do, I’m sure that others will add additional rows and maintain it from there.

The hope behind capturing pointers to all such tests in one place is that eventually all actively maintained aggregators will converge on the correct behavior.

Posted by: Sam Ruby on April 18, 2006 8:46 AM | Permalink | Reply to this

Re: Atom Torture Test

Well, the first question I have is: what ought to count as conformance?

  1. It’s clear what xml:base support means. But that’s already covered.
  2. It’s also fairly clear what supporting type="xhtml" ought to mean: the content should be parsed using XHTML rules. (My testcase tests for just one of several instances where this departs from HTML parsing behaviour.)
  3. MathML support? Feedreader authors, generally, don’t write their rendering engine, so they are dependent on their platform for MathML support. Those whose platforms support MathML (Thunderbird, Windows-based feedreaders with the MathPlayer plugin installed) ought to support it. But it’s probably unreasonable to demand it as a general conformance requirement.
  4. The real question is: what should be the correct behaviour with respect to the SVG figure test? At a minimum, I would demand that feedreaders display the fallback GIF image. Is it reasonable to ask for more? Should feedreaders distinguish between malicious <object> elements and innocuous ones, stripping out the former but not the latter?
Posted by: Jacques Distler on April 18, 2006 9:21 AM | Permalink | PGP Sig | Reply to this

Re: Atom Torture Test

It sounds like you have a clear idea of what a “meets minimum” standard is for #2 and #4.

If you look at the LinkConformanceTests page, you will see an informational table allowing individual readers to voluntarily indicate whether they support something that is optional.

I would imagine that support for MathML would be low initially, but perhaps some of the high end (i.e., for purchase) aggregators may someday want to consider adding support for this.

Posted by: Sam Ruby on April 18, 2006 1:20 PM | Permalink | Reply to this

Re: Atom Torture Test

I would imagine that support for MathML would be low initially, but perhaps some of the high end (i.e., for purchase) aggregators may someday want to consider adding support for this.

I don’t think it has to be high-end. I think any feed reader that is based on Gecko and does not use the XML code path for displaying type='xhtml' (and, therefore, does not get free MathML support) is broken.

Unfortunately, it is rather normal that feed readers using an XML-capable XHTML-aware rendering engine still use the text/html code path for type='xhtml' and lose correctness in doing so.

Posted by: Henri Sivonen on April 18, 2006 2:42 PM | Permalink | Reply to this

Re: Atom Torture Test

FWIW, work on the development branch of Liferea currently revolves around making everything XHTML, motivated mainly by the need for xml:base to make the many-items-at-once view work correctly. Upon my suggestion the aggregator now uses libxml2’s HTML parsing mode reserialise tag soup from Atom and RSS items as XHTML, so that the HTML widget can use Gecko’s XHTML renderer.

The upshot is that Atom type='xhtml' content will also be preserved faithfully, properly exposing MathML and SVG content to Gecko.

It’s all very cool.

Posted by: Aristotle Pagaltzis on April 18, 2006 9:03 PM | Permalink | Reply to this

The one and only

So is Liferea the only aggregator which treats type="xhtml" as XHTML?

None of the aggregators I’ve looked at on the Mac (including the cross-platform Thunderbird and Sage aggregators) do so.

(In addition to the ones tested below, I also tested Shrook, with same results as NewsFire.)

I’d be surprised if any Windows-based aggregators passed the test either.

Which leaves … ?

Posted by: Jacques Distler on April 20, 2006 2:06 AM | Permalink | PGP Sig | Reply to this

The future one and only

So is Liferea the only aggregator which treats type="xhtml" as XHTML?

To be precise, not even Liferea does that yet. In 1.0.x, the stable branch, Gecko is invoked with text/html. The tag soup fixup for RSS which makes invocation of Gecko with application/xhtml+xml feasible in the first place is new to SVN trunk@HEAD, which is slated to become 1.1.

(The main impetus for this was fixing base URIs when simultaneously displaying items from multiple feeds. In that case, an aggregator only has two options: either munging relative URIs, or using xml:base in XHTML.)

Posted by: Aristotle Pagaltzis on April 20, 2006 8:49 AM | Permalink | Reply to this

Re: The future one and only

I’m running Liferea 1.2.10 on Ubuntu 7.04 and the test rendered beautifully.

Just letting you know.

Posted by: Ed Hou on May 16, 2007 7:08 PM | Permalink | Reply to this

Re: Atom Torture Test

Sam: did you find the tuits to implement my suggestion for on-wiki maintenance of the test suite?

This has, in fact, gotten a little more important to me since my web host recently implemented mod_security, causing loads of 403s on my feeds – very annoying, and it will be a while before I can do anything about it.

Posted by: Aristotle Pagaltzis on April 18, 2006 8:54 PM | Permalink | Reply to this

Re: Atom Torture Test

For some reason, Bloglines has a problem with the Atom feed…but it’s a minor one. Namely, there are no spaces in the feed before or after a link, e.g., “I wrote a littletest feedof my own.”

I’m not sure why this is happening…it’s very likely a Bloglines problem, but I don’t see it on other Atom feeds.

Posted by: TheMatt on April 18, 2006 1:41 PM | Permalink | Reply to this

Compliance

I’m not sure why this is happening…it’s very likely a Bloglines problem, but I don’t see it on other Atom feeds.

Probably because most feeds don’t use type="xhtml".

Your comment, and Henri’s, point to the need for a more comprehensive test suite for type="xhtml" compliance. Clearly, there are a lot of XHTML-capable aggregators that are broken in various different ways. My single, quick 'n dirty, test is insufficient.

If someone (perhaps Henri or Aristotle) were willing to work on a comprehensive test suite, I’d be happy to contribute suggestions.

There would remain my tests for MathML and <object> support.

Neither of those are really tied to the Atom specification, per se. But they are things that I, and perhaps others, would find desirable in an Atom aggregator.

Posted by: Jacques Distler on April 18, 2006 3:57 PM | Permalink | PGP Sig | Reply to this

Re: Atom Torture Test

I tried your test with latest avaliable Safari, it didn’t show the SVG image (e.g. with Adobe SVG) or the GIF image, stripped the MathML markup out (leaving the contents of the tags), but the table test worked (text was a nice shade of black)

Posted by: Alex Mc-J on April 18, 2006 6:54 PM | Permalink | Reply to this

Test Results

Indeed, Safari is a bit of a disappointment.

  • With the latest release version of Safari (Version 2.0.3 (417.9.2)), neither the SVG, nor the GIF figure loads. Of course, MathML is not supported. But, as you said, the text in the XHTML text is black, indicating that it is being treated using XHTML parsing rules.
  • NetNewsWire fails all three tests.(NNW 2.1b30, released on 4/19/2006, passes the SVG test, displaying the SVG figure.)
  • Thunderbird fails the XHTML test and (hence) also the MathML test. It displays the GIF image in the SVG test.
  • NewsFire is our hero for tonight. MathML is not supported (as, alas, will be the case for any WebKit-based app). But it passesfails the XHTML test, and displays the SVG figure (using the Adobe Plugin).
Posted by: Jacques Distler on April 19, 2006 12:12 AM | Permalink | PGP Sig | Reply to this

Re: Test Results

I just tried NewsFire (most recent), but my copy failed the XHTML test.

And i tried the Firefox extension Sage, due to the way it generates the page it cant pass the XHTML test (or the MathML test) as it generates a HTML4 file, regardless of input, And Firefox renders the SVG incorrectly.

And Firefox doesn’t appear to be able to handle the feed, as highlighting the feed in my bookmarks menu results in a “LiveBookmark failed to load.” item.

Posted by: Alex Mc-J on April 19, 2006 7:22 AM | Permalink | Reply to this

Re: Test Results

I just tried NewsFire (most recent), but my copy failed the XHTML test.

Hmmm. Weird. So does mine now. Was I hallucinating?

Firefox renders the SVG incorrectly

Yes, the built-in SVG support is broken in various respects. If you have the Adobe plugin, there’s a hidden Firefox pref to turn off the built-in SVG support.

Posted by: Jacques Distler on April 19, 2006 8:58 AM | Permalink | PGP Sig | Reply to this

Tweak

And Firefox doesn’t appear to be able to handle the feed, as highlighting the feed in my bookmarks menu results in a “LiveBookmark failed to load.” item.

I tweaked the feed to add a <link rel="alternate"> element to each <entry>. Firefox/Sage seems to like this better (as may other aggregators).

Posted by: Jacques Distler on April 19, 2006 12:55 PM | Permalink | PGP Sig | Reply to this

Safari

… but the table test worked (text was a nice shade of black).

That seems to have been a fluke (a false-positive). As you can see Safari fails the second XHTML-compliance test, indicating that it is, apparently, treating the content as tag-soup.

So far, we’re still looking for any aggregator that treats type="xhtml" as XHTML.

Posted by: Jacques Distler on April 20, 2006 2:46 PM | Permalink | PGP Sig | Reply to this

Re: Atom Torture Test

I’m the author of Shrook. I think your TBODY test is kind of bullshit - nitpicky and fantastically unimportant. That you have to go to such lengths to find holes in interpreting Atom shows how far we’ve come since RSS.

Posted by: Graham Parks on April 20, 2006 6:44 AM | Permalink | Reply to this

Bullshit

I think your TBODY test is kind of bullshit - nitpicky and fantastically unimportant.

I assume you mean, not that the test itself is bullshit, but that the goal — having type="xhtml" treated using the XML code-path — is bullshit. (The test, as I freely admit, is a quick-'n-dirty hack.)

From my point of view, it’s all about MathML support. Since WebKit doesn’t support MathML and, judging by the lack of progress, probably won’t for a while, you may be forgiven for thinking that supporting XHTML is “bullshit.”

Well … almost.

The development version of WebKit has built-in SVG support. Which means that we could have been talking about inline SVG, instead of MathML. Shrook loses any chance of (what would have been automatic) support for inline SVG by treating type="xhtml" as tag-soup.

In the meantime, you can enjoy the utterly-broken rendering of my actual Atom feed in Shrook.

Posted by: Jacques Distler on April 20, 2006 8:18 AM | Permalink | PGP Sig | Reply to this

Re: Bullshit

So you’re basically just making a feature request that Shrook and other aggregators support MathML and/or SVG? Fair enough.

But you don’t have to embellish it by making up all the bullshit complaints about internal implementation details or TBODY behaviour or claiming that everything is fundamentally “broken”. It’s just not constructive.

Posted by: Graham Parks on April 20, 2006 9:42 AM | Permalink | Reply to this

Details

Neither you, nor any other aggregator author (as far as I am aware), writes their own rendering engine. So this “feature request” is a request that, on encountering type="xhtml", you envoke the XML code-path in your chosen rendering engine.

Asking for MathML support in Shrook would, realistically, be an (additional) request that you switch rendering engines from WebKit to Gecko.

I can imagine that you would not take that request too seriously.

As to inline-SVG, that’s not something I use. So it’s not a “feature request” on my part. It’s a free bonus that comes from treating type="xhtml" correctly.

Anyway, it was Henri Sivonen, not I, who said:

I think any feed reader that is based on Gecko and does not use the XML code path for displaying type='xhtml' (and, therefore, does not get free MathML support) is broken.

though I can’t say that I disagree.

Posted by: Jacques Distler on April 20, 2006 10:30 AM | Permalink | PGP Sig | Reply to this

Re: Atom Torture Test

Nice tests. Some comments on how they’re interpreted by Snarfer

For the first test we strip the style element (for security reasons) so although the text doesn’t show in green this has got nothing to do with xhtml support. I imagine you’ll find several other aggregators do the same thing - it’s not an uncommon practice.

In the second test, we display both sentences in bold. The fact that the uppercase B is still treated as bold is deliberate. I think it’s more likely that someone will accidently use the wrong case for bold markup as opposed to deliberately using an uppercase B that isn’t meant to be interpreted as bold. Not that this isn’t a valid test, but I don’t see it as something we would want to “fix”.

The MathML test we fail because (again for security reasons) we strip all markup that we don’t recognise or trust and MathML falls into the first category. It’s possible that we could add MathML elements to our white list, but even then it would still probably fail because we’re using Internet Explorer for our renderer. At some point we would like to offer support for the Mozilla renderer as an option, so this may be doable in the long term.

We semi-pass the last test in that we display the GIF image. The reason for not rendering the SVG natively is because (as mentioned before) we strip markup that we don’t trust and object tags definitely fall into that category. That behaviour is not likely to change anytime soon.

Posted by: James Holderness on April 20, 2006 7:30 PM | Permalink | Reply to this

Snarfer

For the first test we strip the style element (for security reasons) so although the text doesn’t show in green this has got nothing to do with xhtml support.

I suspect that’s why Safari gave a false positive.

Your explanation of Snarfer’s behaviour illustrates the pitfalls of using tricks of this sort to try to detect whether the content is being treated as XHTML. In any individual instance, a false-positive result may result from some unrelated behaviour of the aggregator (in your case, stripping out style elements).

The MathML test we fail because (again for security reasons) we strip all markup that we don’t recognise or trust and MathML falls into the first category. It’s possible that we could add MathML elements to our white list,

My whitelist, including all the MathML elements and attributes produced by itex2MML is here.

but even then it would still probably fail because we’re using Internet Explorer for our renderer.

The Design Science people assure me that their plugin yields system-wide MathML support to applications which use IE as a renderer.

I haven’t seen this in action, so I don’t know the ins-and-outs of triggering the plugin from other applications. For documents served to the browser over the web, you need to set the MIME-type to application/xhtml+xml.

Posted by: Jacques Distler on April 20, 2006 8:47 PM | Permalink | PGP Sig | Reply to this

Re: Design Science plugin

I think there must be a trick to getting the plugin working. I just ran your feed through FeedDemon and RSSBandit as a test. I’m fairly sure they both use IE for their HTML rendering and neither filter our the MathML, but I’m still just seeing the unformatted equations.

In both cases the markup they’re passing to the renderer appears to be XHTML. Eveything is namespaced properly and at least one of them even had an XHTML DTD at the top. RSSBandit had ActiveX controls disabled by default, but turning that off didn’t make any difference.

I also checked your webpage itself in Internet Explorer and the plugin worked fine there. It didn’t handle the G character from Unicode Plane 1, but then again Firefox’s native MathML renderer had problems with your square root symbols so it wasn’t perfect either.

When I get a chance I’ll experiment some more in my own code. If I get it working I’ll let you know.

Posted by: James Holderness on April 22, 2006 9:19 PM | Permalink | Reply to this

FeedDemon

Here are their instructions for using MathPlayer with FeedDemon.

In an email (which I hope is OK for me to excerpt), Paul Topping explained the situation thusly;

Just FYI, the fact that MathPlayer enhances MSHTML, not just IE, means that there is MathML support in virtually any application that supports HTML rendering. This includes most help engines, mail clients, weblog clients, and so on. There are ways that hosts of MSHTML can defeat this, however, sometimes on purpose and other times by accident. Outlook, the most popular Windows email client, uses MSHTML to display HTML email. Unfortunately, it disables extensions like MathPlayer as a security precaution. FeedDemon, the weblog client, also uses MSHTML. It disabled MathML support by copying the <body> of the HTML page containing MathML in to a local file, thereby stripping out the <head> plumbing that connects MathPlayer to the MathML.

Posted by: Jacques Distler on April 22, 2006 9:40 PM | Permalink | PGP Sig | Reply to this
Read the post Adding Atom support to PlanetPlanet
Weblog: Sam Ruby
Excerpt: This weekend’s recreational programming project involved PlanetPlanet and ensuring that there is adequate support for Atom. And there’s nothing like live data to help identify integration issues. It turns out that upgrading to the latest F
Tracked: April 23, 2006 8:33 PM
Read the post Quality of Implementation
Weblog: franklinmint.fm
Excerpt: Reading Tim, Sam, and Jacques made me kind of irritated at first. None of that stuff is Really Simple. I...
Tracked: April 24, 2006 6:16 AM
Read the post Re-syndicating vs sanitizing
Weblog: Sam Ruby
Excerpt: Just over a month ago, Tim Bray pointed both to Jacques' Atom Torture Test, and noted with apparent delight that NetNewsWire was able to tell him which entries he had already seen due to the fact that Planet made an effort to retain atom id’s. Un
Tracked: May 29, 2006 10:52 PM
Read the post 5 Years
Weblog: Musings
Excerpt: The honeymoon is definitely over.
Tracked: October 19, 2007 3:56 PM

Post a New Comment