Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

March 13, 2004

Core Dump

Some random computer notes.

  1. Installing Crypt::OpenPGP under MacOSX is a real bear. The basic steps are as follows.
    1. Install libpari.
    2. Install Math::Pari.
    3. Use CPAN to install Crypt::OpenPGP and all of its prerequisites.
    There’s a page on installing Math::Pari on MacOSX which will guide you through steps 1,2. Unfortunately, it badly needs to be updated for Pather, but it ought to give you the general idea. Once you’ve got Math::Pari installed, the rest is fairly easy. Just used CPAN to install Crypt::OpenPGP. It will prompt you to install all of the prerequisite modules first. There are a zillion of them. Many are required, but some are optional. All of the optional modules will compile except Crypt::IDEA. When it asks you whether to install a list of optional module which includes Crypt::IDEA, answer “no”. You’ll get asked again, later on, about the other optional module, but you don’t want it to even attempt to install Crypt::IDEA. After quite a bit of churning away, you should finally have a working copy of Crypt::OpenPGP.
  2. I have an experimental Atom feed for this blog. It contains both a <summary type="text/plain"> and a <content type="application/xhtml+xml" mode="escaped"> element. The latter means, theoretically, that if there were a client which supported it, people could read my MathML-enabled posts in their Aggregator. This sounds far-fetched, but it really isn’t. Dave Hyatt has, at least, talked about the possibility of MathML support in Safari. If he and his team ever deliver on that, NetNewsWire users will get MathML support “for free.”

    My feed validates, but I would still like some feedback from real Atom mavens as to what I might be doing wrong and what could be improved.

    For instance, I think I am using the xml:base attribute incorrectly:

    <content type="application/xhtml+xml" mode="escaped" xml:lang="en" xml:base="<$MTBlogURL encode_xml="1"$>">

    That’s taken straight from the default MovableType Atom Template. Shouldn’t it be xml:base="<$MTEntryLink$>"?

    Unfortunately, NetNewsWire doesn’t seem to support xml:base at all, which makes it difficult to test my assumption in practice.

    (Update: Oh, to heck with it! The MT template is plainly wrong, and I shouldn’t need NetNewsWire to figure that out. Fixed.)

    If I decide to keep the new Atom feed, who would object if I were to drop the RSS 0.91 feed, and replace it with this one?

  3. Speaking of validating feeds. Mark and Sam’s Validator has long complained about the onclick and onkeypress attributes which occur in certain anchor tags in my full-content RSS feed. These are not, strictly speaking, invalid (how could they be?), but they are flagged as examples of poor sportsmanship, anyway, much to my chagrin.

    I finally realized that I could use the tagmogrify plugin to strip these attributes out my feeds, and now the Feed Validator no longer complains.

  4. Oh, yeah, OpenSSH 3.8p1 is out. Gotta keep up with the Joneses.
Posted by distler at March 13, 2004 10:24 PM

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/332

15 Comments & 1 Trackback

Re: Core Dump

There are at least a couple of Mozilla based aggregators which could theoretically support MathML today. In fact, I suspect they do support MathML, but I’m not so sure they support Atom. Still, I’d be fairly confident that a Mozilla based news reader will support Atom before Safari supports MathML - although it would be nice if I were wrong.

Posted by: jgraham on March 14, 2004 6:48 PM | Permalink | Reply to this

No XML parsing

In fact, I suspect they do support MathML.

I doubt that.

Mozilla only supports MathML, when served with the correct MIME-type. Since there’s no way to indicate, in any RSS flavour, that the content of a <content:encoded> element is anything other that HTML tag-soup (if you’re lucky), I don’t expect that such Aggregators will ever trigger the XML parser to handle it.

With Atom, that becomes possible. I have no doubt that some Mozilla-based Aggregator may be the first to get MathML support.

Posted by: Jacques Distler on March 14, 2004 10:01 PM | Permalink | PGP Sig | Reply to this

Re: No XML parsing

NewsMonster claims to support a “RSS module” called XHTML:body which seems to allow inline XHTML in posts. I guess that one could use that to get MathML-in-RSS (although it might screw over everyone else).

Having said that, I hadn’t considered the MIME type issue.

Posted by: jgraham on March 15, 2004 4:28 AM | Permalink | Reply to this

Re: Core Dump

You’re right, MT’s Atom template is subtlely wrong.

My Universal Feed Parser supports xml:base. http://diveintomark.org/projects/feed_parser/feedparser-3.0-beta-19.zip

Posted by: Mark on March 18, 2004 10:16 PM | Permalink | Reply to this

Re: Core Dump

Ah, this was one of the open tabs during my last browser crash, wasn’t it? (Note: if a bug in Bugzilla says it’s a crasher, bookmark everything before you test it.)

Does it actually make a difference? If you don’t have your archives in the same directory as your main page, you can only do server-rooted relative links anyway, and /~distler/foo/bar/whatever resolves to the same thing whether your base is golem.ph.utexas.edu/~distler/blog/ or golem.ph.utexas.edu/~distler/blog/archives/

Or am I missing the subtletly?

Posted by: Phil Ringnalda on March 18, 2004 10:44 PM | Permalink | Reply to this

Base URL

If the xml:base URI is the archive page of your post, then internal links (<a href="#foo">) will work correctly, as will server-relative links (<a href="/foo.html">) and path-relative links (<a href="foo.html">)

See RFC 2396, section 5.2 for how these are resolved.

Posted by: Jacques Distler on March 18, 2004 11:00 PM | Permalink | PGP Sig | Reply to this

Re: Base URL

I’ll grant you that one might want to avoid path-relative links, given the way typical MT blogs are set up.

Posted by: Jacques Distler on March 18, 2004 11:05 PM | Permalink | PGP Sig | Reply to this

Re: Base URL

I only see two choices: either <MTBlogURL> and <MTEntryLink> only differ in filename, in which case relative links resolve the same way for either one, or it’s just not possible to use path-relative links, because they’ll either be broken in the main page or in the archives.

Now, #foo is an interesting question. My reading of RFC 2396 is that it refers to “the current document”. What that means (other than probably “leave it alone, and hope it works when you throw it in a browser”), I’m not quite sure.

Posted by: Phil Ringnalda on March 19, 2004 12:29 AM | Permalink | PGP Sig | Reply to this

Re: Base URL

“Leave it alone” indeed. I should have been looking at section 4.2 for that part, where they are more emphatic about it. So I can’t see a case where BlogURL wouldn’t work if your HTML pages work, and I can see one (<base href="<MTBlogURL>"> in your archive HTML template) where EntryLink wouldn’t work.

Posted by: Phil Ringnalda on March 19, 2004 1:32 AM | Permalink | PGP Sig | Reply to this

Re: Base URL

I’m sorry, Phil. I’m not clear on what your use-case is.

Every <entry> has a

<link rel="alternate" href="<$MTEntryPermalink$>" />

Within an <entry>, an internal hyperlink,

<content> ... <a href="#foo"></a>...</content>

should resolve to <$MTEntryLink$>#foo.

There is no guarantee that <$MTBlogURL$>#foo is a valid URL.

Posted by: Jacques Distler on March 19, 2004 8:34 AM | Permalink | PGP Sig | Reply to this

Re: Base URL

Nope, #foo shouldn’t get relative URI resolution at all. 5.2 2) isn’t quite as obvious, but from 4.2

A URI reference that does not contain a URI is a reference to the current document. In other words, an empty URI reference within a document is interpreted as a reference to the start of that document, and a reference containing only a fragment identifier is a reference to the identified fragment of that document.

So if your Atom feed itself is being rendered, #foo means index.atom#foo, and if your entry is being rendered as part of a bunch of other stuff in myfeeds.html, it means myfeeds.html#foo. Which would seem to mean that if you use named anchors for footnotes, you should probably namespace them, so you don’t collide with someone else’s footnotes.

Posted by: Phil Ringnalda on March 19, 2004 10:21 AM | Permalink | PGP Sig | Reply to this

Current Document

I think you’ve just explained why, in the context of parsing the XHTML in the <content type="application/xhtml+xml" xml:base=""> element, interpreting “the current document” as anything other than the document referred to in xml:base="" is fraught with peril.

How, to pick one of your points, can one otherwise guarantee that the id attributes therein are unique to the “current document”?

In one of your scenarios, the feed author doesn’t even control what the “current document” is!

Posted by: Jacques Distler on March 19, 2004 10:43 AM | Permalink | PGP Sig | Reply to this

Re: Current Document

Fraught with peril doesn’t mean it’s wrong, just that it’s something you should only do under rather limited circumstances.

First problem: <MTBlogURL>#foo isn’t (roughly the same as) <MTEntryLink>#foo. Well, you’re already toast, since when you saved that entry href="#foo" went into both places, in your HTML.

Second problem: id="foo" exists in both places, but outside the entry itself. Sorry, but I can’t find a single word in 2396 or the XML Base spec to say that #foo means anything other than “in the current document” no matter what that document may be. If you need to have #license refer to the footer of either your main page or any given entry archive page, then you need to regex it into whatever.html#license yourself before you put it into a feed, because a correctly done XML Base resolver shouldn’t touch #license at all.

Third problem: ish. I probably wouldn’t ever have thought of that. Whether or not you namespace your anchors, an aggregator author needs to namespace them and any references to them on top of you.

Posted by: Phil Ringnalda on March 19, 2004 4:32 PM | Permalink | PGP Sig | Reply to this

Re: Current Document

OK, I’ll concede the point about fragment identifiers, though I could try to find some solace in

4.2 … However, if the URI reference occurs in a context that is always intended to result in a new request, as in the case of HTML’s FORM element, then an empty URI reference represents the base URI of the current document and should be replaced by that URI when transformed into a request.

NetNewsWire acts this way. Clicking on a link <a href="#foo"></a> opens the blog main page#foo in the browser. At least, it does for RSS2 feeds, where the Base URI is assumed to be the blog main page. NetNewsWire’s xml:base support for Atom feeds in currently broken.

In any case, relative URIs found in <$MTEntryMore$> (which appears only on the individual archive page, not on the blog main page) ought to resolve correctly. Hence the Base URI should be <$MTEntryLink$>, as I’ve said.

(Obviously, using relative URIs in <$MTEntrybody$>, which occurs both places, is a little less useful.)

Posted by: Jacques Distler on March 19, 2004 11:13 PM | Permalink | PGP Sig | Reply to this

Isn’t this to be expected?

This is rather tangential to the current topic, but it seems to me that your discussion supports my (hithero unexpressed) position that both RSS and Atom are fundamentally overengineered, at least from the point of view of an end user.

Now, I’ll be the first to admit that I don’t actually use a newsreader. But I have tried to do so and found it to be, at best, a frustrating experience:

  • All feeds are different. Some people syndicate everything, some people just syndicate links to the article, some people syndicate summaries and lots of people have 17 different feeds that you’re expected to choose from
  • Feedreading also needs a browser. Even where the full content of a feed is being syndicated, links point to content on the web that the feedreader can’t handle. So one has to continually switch backward and forward between browser and newsreader. For title-only syndication, the situation is even worse.
  • Switching between browser and feedreader sucks. Even when it’s set up perfectly, having to switch apps to read content is dreadful. It’s slow. Short of drag and drop, there’s no way to specify that a link should open in a new browser tab (or not).
  • Finding feeds sucks. People hide the links to their feeds on secondary pages or behind nondescript icons. This makes finding the link almost impossible. I know autodiscovery is supposed to magically fix this, but it’s still a huge pain.
  • The content is different. When you read a feed, you typically won’t get the same content you would on the actual site. For example Jacques’ MathML is broken. The presentation is always broken (although the merits of this are debatable)

Moreover, newsreading doesn’t consistently allow me to do things that I want to do. For example I can’t see any way (in general) to follow the addition of comments to an item that am interested in. So I still have to manually note which threads I’m following and open them in a browser later.

To make matters worse, I see problems on the dstribution side - authors are expected to maintain multiple versions of their content and aggregators have to download the entire feed everytime that the smallest possible change is made.

The problem, as my uneducated eyes see it, is that RSS/Atom break the fundamental model of the web. As far as I can see, the two underliyng concepts in the web are resources (denoted by a URI) and links to connect different resources. However, current syndication formats break the idea of links binding resources together by throwing together a pile of seperate resources, creating a big blob of content and pretending that it’s just a single resource. Linking to this new resource is useless (because it doesn’t stay static) and linking from the new resource is difficult because it’s really a collection of seperate resources and so the assumptions behind the URI syntax break down.

Given that, trying to fixup the URIs in atom feeds using namespacing or whatever begins to look like papering over the cracks in a broken format.

Personally, as a user, I would prefer a feed that made use of the fact that syndicated resources typically already exist on the web and that the preferred way of refering to them is by URI. Essentially what this means is that rather than syndicating all the content, all that would be syndicated would be the metadata. In particular, I would like a syndication format that provided:

At the feed level: A title. A URI. The URI of the feed.
Optionally: Some other gunk like a description or whatever else people want.

At the entry level: A title. A URI. A published date. A last-modified date.
Optionally: A description / summary. A last modified type (edit, comment, trackback, etc.), an author, some categories, a list of all previous edits and their times / types, a list of all assosiated URIs and their relationship (could be ‘external’ for a link external to the article, ‘comment’ for the URI of a comment attached to the article, etc.)

So what are the advantages?

  • All feeds would be equal. The amount of information in the feed wouldn’t be dependent on the author (at least not to the extent witnessed today).
  • The obvious place to put a feedreader would be in the browser. That would make for a seamless net-surfing experience. People could still choose to use non browser based readers, if they desired.
  • People would be able to keep up with threads they were interested in (at least via the last-modified date, even if the author didn’t include any extra information).
  • Less data to transfer, so better performance. Fewer problems for bandwidth heavy sites.
  • Better for small devices. No hulking great RSS full-text feeds to parse.
  • No difficulties synchronising different versions of the content - the user gets the same content if they visit on the web or via the feedreader (a clever browser could use history to ensure that the updated / unchanged status of a resource was changed even if the person visited the resource without going through the feedreader UI).
  • No working around the design of the URI scheme or the other issues that plauge Atom.
  • Syndication of any formats (XML, binary, plain text, whatever) could be acheived without resorting to special hacks or performing difficult transformations (e.g. base64 encoding PDF files) - all the content would simply open in the user’s default application. All the author would need to do is assign the content a URI.

Right, hopefully no-one’s reading this because otherwise I’ll probably be flamed by all the Atom and RSS advocates…

Posted by: jgraham on March 22, 2004 3:10 PM | Permalink | Reply to this
Read the post A Tale of woe
Weblog: Transcendent Ether
Excerpt: In order to provide server-side verification using OpenPGPComments, the Perl module Crypt::OpenPGP is required. Among the prerequisite modules for Crypt::OpenPGP (there are several) is Math::Pari....
Tracked: May 14, 2004 1:16 PM

Post a New Comment