## December 23, 2006

### Instiki

#### Update (2/15/2007):

My branch of Instiki has its own website, now.

The discussion of Wiki software in my previous post really crystallized things for me. The bottom line was that, if I was going to “invest” in some Wiki software, I was, almost inevitably, going to pop the hood and meddle with the source code. And, in all my experience, the source code for any project written in PHP is going to look like the racoons have gotten loose in the trash again.

So, if I went down the list of Wiki software, and eliminated all those written in PHP, that would drastically narrow the scope of my indecision. Of the handful of packages left, Instiki, written in Ruby on Rails seemed like a good bet.

What does it look like? Well, here’s a copy of the front page of my wiki (the wiki itself is password-protected). More details below the fold.

#### Instiki

There were some prerequisites.

MacOSX 10.4 comes with Ruby 1.8.2. But Instiki requires 1.8.4. Fink to the rescue:

fink install ruby
fink install sqlite3-rb18
fink install ruby18-dev

For reasons that will be clear in a moment, I also had to install SWIG 1.3.

Getting Instiki, itself, up and running was absurdly easy.

1. Download and unpack the latest source. (There’s a binary installer for an older MacOSX version; don’t bother.)
2. Type
./instiki
3. Connect to http://localhost:2500 and configure your wiki.
4. Done!

#### itex2MML

Next, I used SWIG to turn itex2MML into a Ruby extension1. itex2MML 1.1.8 has two new make targets.

make
sudo make install

will, as before, create a stream filter usable by lots of different software, including my MovableType plugin. But now

make ruby
make test_ruby
sudo make install_ruby

will produce a Ruby extension. To use it, you do something like

require 'itextomml'
itex = Itex2MML::Parser.new itex.html_filter(string)

which return the filtered text.

Anyway, the procedure for building a Ruby extension using SWIG is a little bit platform-dependent. To accommodate the idiosyncrasies of different platforms, the Makefile relies on special features of GNU Make (the default version of make on Linux and MacOSX).

#### Update (12/24/2006):

Justin Bonnar wrote some wrapper code, which made my original code thread-safe. I added some unit tests (one nice thing about Ruby is the ease of including unit tests) and packaged the whole thing up for easy installation. You can get the latest itex2MML either as the usual gzipped tar file (including source and MacOSX + Linux binaries) or you can get the latest source via BZR.

#### XHTML

Modifying Instiki to emit well-formed XHTML was pretty easy. (This is why I wanted to stay away from all those spaghetti-code PHP projects.) The entire patch file (including the code to create a Markdown+itex2MML filter) is a mere 1/7 the size of my MovableType patch file. Update: Please don’t use this. Instead, use this package, based on the current Instiki trunk. I can’t (and haven’t tried to) back-port all of my fixes to Instiki 0.11.

Of course, I don’t have anything nearly as bullet-proof. And I currently don’t even bother to do content-negotiation. Everyone gets application/xhtml+xml; since the Wiki is a private research tool, I don’t care about browsers that don’t support XHTML+MathML.Update: My version of Instiki now does Content-Negotiation, serving application/xhtml+xml to browsers that support it, including IE+MathPlayer.

Still, it bugs me that Markdown does little to ensure well-formedness, so it’s still painfully easy to screw up and produce a Yellow-Screen-of-Death. At some point, I’ll have to figure out how to hook in the Validator.

#### TODO

1. Either make the thing bullet-proof, or use a hack to serve MathML in a tex/html environment.
2. One of the features that appealed to me about Instiki was its “Export to TeX” feature. That doesn’t really work right (I don’t think the author was anticipating actual equations in the text.) But it could be fixed. Which would be very cool.
3. Since Instiki (by default, at least), runs in its own webserver, I can’t rely on Apache for access-control. You can publish multiple wikis (“Webs” is what Instiki calls them) under the same installation, and give each one its own password. But this has it precisely backwards. I don’t want to have to remember N passwords for N different wikis. I want to assign each person a password, and then use access-controls to determine what areas that person can access.

Probably there’s other stuff, but right now I should probably add some more notes…

#### Update (1/9/2007): Ruby Sucks

Having spent some more time with Instiki, I begin to see its limitations.

First of all, Ruby’s Regex parser is incredibly lame. If you search the web, you’ll find plenty of complaints about its stack overflowing should you happen to sneeze in its vicinity. In my case, BlueCloth would choke on a modest 3-row, 10-column XHTML table with the error:

Stack overflow in regexp matcher: /
^				# Start of line
<(p|div|h[1-6]|blockquote|pre|table|tr|td|dl|ol|ul|script) # Start tag: \2
\b				# word break
(.*\n)*?			# Any number of lines, minimal match
<\/\1>				# Matching end tag
[ ]*				# trailing spaces
(?=\n+|\Z)			# End of line or document
/ix

Moreover, if I tweaked BlueCloth to add [itex] to the list of tags that it passes through unmolested (something that is standard in the Perl version of Markdown)

BlockTags = %w[ p div h[1-6] blockquote pre table dl ol ul script math]

then all Hell would break loose, with stack overflows right and left.

I quickly retreated from that idea, though I did manage to accomodate the table by the following amusing hack. I disabled the special processing of HTML comments in BlueCloth:

# Matching constructs for tokenizing X/HTML
HTMLCommentRegexp  = %r{ <! ( -- .*? -- \s* )+ > }mx
XMLProcInstRegexp  = %r{ <\? .*? \?> }mx
- MetaTag = Regexp::union( HTMLCommentRegexp, XMLProcInstRegexp )
+ MetaTag = XMLProcInstRegexp
HTMLTagOpenRegexp = %r{ < [a-z/!$] [^<>]* }mx HTMLTagCloseRegexp = %r{ > }x and then inserted <!--> Hack to work around Ruby's feeble Regex parser </table> <table> <--> inside the table, to break it into manageable chunks. That “works,” but is lame beyond belief. More troublesome is that Instiki’s speed deteriorates drastically with the size of the page. Modest-sized pages are still very fast. But a more substantial page (19 KB Markdown+itex, 96 KB, after being processed by itex2MML.rb and 101 KB when rendered to XHTML+MathML by BlueCloth) takes nearly 5 minutes (!) to render. Once rendered, it’s kept in memory, and Instiki delivers it lightning-fast. But every edit of the page is painful. The Perl version of Markdown (as used in MovableType) has no trouble processing such a page. I don’t think the problem is incompetence on the part of the authors of Instiki and BlueCloth. I think the problem is that Ruby sucks (at string processing). #### Update (1/23/2007): Fixed! Replacing the default BlueCloth by another Markdown implementation proved to be the solution. The wiki I’m now running is based on the development version of Instiki (which contains some essential improvements over 0.11) You can get it, either as a tarball, or by doing a bzr pull from my BZR repository. 1 Yes, I know about Ritex. But that’s based on a really old version of itex2MML, the author does not seem interested in updating it to incorporate the features of more recent versions, and I am not interested in trying to keep two different code-bases in sync. Posted by distler at December 23, 2006 11:50 PM TrackBack URL for this Entry: http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1088 ## 10 Comments & 2 Trackbacks ### Re: Instiki Have you looked into RedCloth? Claims to produce XHTML strict. Posted by: Sam Ruby on December 24, 2006 9:27 AM | Permalink | Reply to this ### RedCloth, BlueCloth Neither RedCloth (Textile) nor BlueCloth (Markdown) guarantee well-formedness. It is quite possible (easy, in fact) to hand them strings which will result in ill-formed gobbledygook as output. Instiki has both RedCloth and BlueCloth built in. In my experience, Textile requires more extensive patching than Markdown in order to work with MathML. I believe the current version of Markdown works out-of-the-box. BlueCloth is based on a slightly older version and requires a 1-line patch. Posted by: Jacques Distler on December 24, 2006 10:09 AM | Permalink | PGP Sig | Reply to this ### Re: Instiki Jacques Distler wrote: And, in all my experience, the source code for any project written in PHP is going to look like the racoons have gotten loose in the trash again. This deserves to be made into a meme! I want to assign each person a password, and then use access-controls to determine what areas that person can access. This tallies with some things I have been wanting to do, so I will probably set about implementing some kind of user/user group-based access control, in my copious free time. A working “Export to TeX” feature would also address the issues brought up over in this Cafe thread. Posted by: Blake Stacey on December 24, 2006 9:04 PM | Permalink | Reply to this ### Export to LaTeX A working “Export to TeX” feature would also address the issues brought up over in this Cafe thread. One approach is to fix the existing Textile→LaTeX conversion to work with Markdown+itex→LaTeX. Alternatively, since we presumably have well-formed XHTML+MathML, we could use an XSLT stylesheet to convert that to LaTeX. Posted by: Jacques Distler on December 25, 2006 12:05 AM | Permalink | PGP Sig | Reply to this ### Re: Instiki There’s a new Ruby Markdown interpreter called Maruku that might be of use. I believe it uses a line-by-line parser for block-level processing and only resorts to regular expressions for the span-level stuff. Posted by: Norman Gerre on January 11, 2007 12:01 AM | Permalink | Reply to this ### Maruku Thanks! I’ll take a look. But, from the description, I see a big flaw. Markdown treats [itex] as a “block-level element” which, in practical terms, means that it ignores characters like “*” that are significant in Markdown, when they occur (an you better believe they occur!) in a MathML formula. BlueCloth also passes block-level elements through, without trying to “interpret” (i.e. spooge) their content. It’s trivial to add [itex] to the list of block-level elements. In BlueCloth, that causes Ruby’s Regexp parser cough up a hairball, but hey. In Maruku, it (apparently) doesn’t matter whether you declare [itex] a “block-level” element, it will go ahead and happily spooge it anyway. I’d be happy meeting Maruku (or Bluecloth) halfway. If they would pass through the contents of $$…$$ and$…\$, unmolested, then I could apply the itex2MML filter after the Maruku/BlueCloth filter, and everyone would be happy.

Posted by: Jacques Distler on January 11, 2007 3:04 AM | Permalink | PGP Sig | Reply to this

### Re: Instiki

I think the problem is that Ruby sucks (at string processing).

I think everyone knows at this point that the main Ruby implementation in C is dog slow. Most of the time it doesn’t matter… but when I hear reports like yours, I guess it matters more often than I assumed.

You could certainly shell out to perl and read the output back in in much less time than 5 minutes. Heh.

Posted by: Aristotle Pagaltzis on January 13, 2007 11:42 PM | Permalink | Reply to this

### Re: Instiki

I’ve been in feverish discussions with the author of Maruku, an alternate Markdown implementationin Ruby which — rather than doing string manipulations, using Regexp’s — parses the input, and then uses REXML to assemble the XHTML (+MathML) output.

It’s no speed-demon, but the time to handle the aforementioned Wiki page is measured in seconds, rather than minutes.

So, anyway, I now have Instiki+Maruku+itex2MML working tolerably-well. (And there’s even a mostly working LateX output. Since he’s using REXML internally, improving that output should be a lot easier than the alternatives I was thinking about.)

Posted by: Jacques Distler on January 14, 2007 12:14 AM | Permalink | PGP Sig | Reply to this

### Re: Instiki

Automatic equation-numbering test.

This equation is labeled ‘foo’.

(1)$a = b$

This one is unlabelled.

(2)$u =v$

Unnumbered equations are delimited by $$…$$$w=z$Numbered ones are delimited by $…$ and can have an optional $…\label{bar}…$ :

(3)$c = d$

We reference equations by writing \eqref{foo} or (eq:bar), which produce, respectively, (1) or (3).

This all works in the “itex2MML with parbreaks”, “Markdown with itex2MML” and “Textile with itex2MML” text filters.

More to follow…

Posted by: Jacques Distler on January 16, 2007 1:04 AM | Permalink | PGP Sig | Reply to this
Read the post One Thing Led to Another ...
Weblog: Musings
Excerpt: New software for a new year.
Tracked: January 18, 2007 3:07 AM

### Re: Instiki

To solve your web server problem you could run Apache to proxy the Instiki web server, restrict Instiki to serve only to localhost, and then use Apache’s security directives to apply Basic authentication on a per-path basis. With the new Apache mod_proxy it is pretty easy to set up and it’s used in lots of situations.

Posted by: Kevin on October 30, 2008 1:34 PM | Permalink | Reply to this