Software Forum - Recent Posts by Andrew Stacey

posted 11 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

There’s a bug with maruku’s table handling: it doesn’t like whitespace at the end of the line (a previous version did, and it would seem to me that whitespace here should be fine). This causes tables that used to render to no longer do so.

The fix is to modify the regexp for splitting cells: line 515 of lib/maruku/input/parse_block.rb should read:


if (/^[|].*[|]\s*$/ =~ s)

(I notice that this isn’t the same as the version of maruku on github, so don’t know if this would be superseded by updating to the latest version from there.)

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

With the above changes, then expire_caches triggers an error when renaming a page. I think it is due to expiring pages that redirect to the given page. This is called in expire_caches in after_save. If the page name has changed, then the function self.pages_redirected_to in wiki_references.rb is called with the old name but that no longer exists in the database. So the page = web.page(page_name) returns null. I’ve put in a conditional to test for this which at least removes the error but I don’t know if there are some caches that aren’t now being expired that ought to be - I’ll check the logs to trace this.


  def self.pages_redirected_to(web, page_name)
    names = []
    redirected_pages = []
    if web.has_page?(page_name)
      page = web.page(page_name)
      redirected_pages.concat page.redirects
      redirected_pages.concat Thread.current[:page_redirects][page] if
            Thread.current[:page_redirects] && Thread.current[:page_redirects][page]
    end
    redirected_pages.uniq.each { |name| names.concat self.pages_that_reference(web, name) }
    names.uniq    
  end

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

I’ve put those changes in place and will keep an eye on what happens.

Incidentally, when manually clearing out the cache I note that / in page names gets translated into subfolders. So a page name Sandbox/Set creates a folder Sandbox with a file Set.cache. Should the page name be completely sanitised under such circumstances?

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

Spoke too soon. The cache bug is not dead.

I shoved in a load of logger.info statements to the revise method of page.rb and to the before_save and after_save methods of revision_sweeper.rb to see what was happening. I noticed that the cache sweep only happens when a revision is saved, not when a page is saved. At least, that’s how I interpret the if record.is_a?(Revision) conditional.

Anyway, what I found was interesting. I got different behaviour depending on whether I was creating a new revision or updating an old one (to switch between the two I used two author strings).

Here’s the “same author” sequence of events:


Processing WikiController#edit (for 127.0.0.1 at 2013-05-29 21:56:25) [GET]
Cache Bug Log: Before save page called for previous name.
Cache Bug Log: After save page called for previous name.
Processing WikiController#save (for 127.0.0.1 at 2013-05-29 21:56:32) [POST]
Cache Bug Log: Renaming 'previous name' to 'current name'.
Cache Bug Log: Updating previous revision.
Cache Bug Log: Before save revision called for previous name.
Cache Bug Log: After save revision called for previous name.
Cache Bug Log: Saving new page.
Cache Bug Log: Before save page called for current name.
Cache Bug Log: After save page called for current name.
Cache Bug Log: Before save page called for previous name.
Cache Bug Log: After save page called for previous name.
Processing WikiController#show (for 127.0.0.1 at 2013-05-29 21:56:33) [GET]

As I said above, only the Before save revision and After save revision actually seem to do any cache sweeping. So in this sequence, the previous name and only the previous name gets swept. This is okay, but not optimal: if there is a stale cache file then renaming a page to the name of a stale file means that the stale file doesn’t get swept (I checked this).

But here’s what happens when I change the author name to force a new revision:


Processing WikiController#edit (for 127.0.0.1 at 2013-05-29 21:57:34) [GET]
Cache Bug Log: Before save page called for previous name.
Cache Bug Log: After save page called for previous name.
Processing WikiController#save (for 127.0.0.1 at 2013-05-29 21:57:45) [POST]
Cache Bug Log: Renaming 'previous name' to 'current name'.
Cache Bug Log: Creating new revision.
Cache Bug Log: Saving new page.
Cache Bug Log: Before save page called for current name.
Cache Bug Log: Before save revision called for current name.
Cache Bug Log: After save revision called for current name.
Cache Bug Log: After save page called for current name.
Cache Bug Log: Before save page called for previous name.
Cache Bug Log: After save page called for previous name.
Processing WikiController#show (for 127.0.0.1 at 2013-05-29 21:57:45) [GET]

Notice that the Saving new page message now occurs before the save revision steps. The Saving new page message was inserted just before the save statement in the revise method. So this ought to happen after the new revision is created. However, it would appear that the new revision is not actually saved until the page is saved.

With this sequence then the before_save and after_save functions for the revision are called with the current name meaning that the old name does not get swept. This is “the cache bug”.

Looking at the two different sequences of events, it would seem that the safest place to sweep the cache is when before_save is called when saving a page, not a revision. Moreover, as the before_save and after_save are always called with the same page name then it seems overkill to put the sweep in both.

Of course, my analysis may well be incorrect as instiki (well, ruby-on-rails) is very much a black box to me so I have no idea as to what’s going on under the bonnet.

Incidentally, the above was carried out using sqlite as the database, but I got the same results with mysql with the updated gem.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

I put 2.9.1 in the Gemfile but I don’t know if that’s the minimum value. Interestingly, bundle won’t update to the latest version unless you specify such a minimum.

Comparing the logs, then it would appear that something in the mysql module was returning the updated page name when instiki was expecting the old page name, so when instiki swept the cache using what it thought was the old page name, it was actually using the new one and thus the old cache page wasn’t being deleted.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

I think I’ve killed the cache bug!

I did a fresh install of Instiki+MySQL on a debian virtual machine and … no cache bug. So I went back to my live installs and … cache bug.

Poking around, I discovered that my fresh install was using version 2.9.1 of the mysql gem but the live installs were using 2.8.1. So, on a hunch, I updated to 2.9.1 and tried reproducing the cache bug … and it had gone.

I don’t know if I’ve well and truly killed it, but the steps I put above were reliably showing it for me on the nlab and on mathsnotes and now that I’ve updated the mysql gem then those steps no longer exhibit it so I figure that’s enough for a small celebration.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Feature Requests

Thanks.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Feature Requests

Okay, so that was a pretty dubious feature request!

How about this one: if a page exists (meaning, really exists - not just a redirect) then a request to <web>/new/page should redirect either to <web>/edit/page or to <web>/show/page. If the page does exist then the effect of going to <web>/new/page and submitting stuff is the same as submitting an edit except that you don’t get the previous edit in the text box so there’s nothing to show that you’re replacing something already there. The argument for <web>/show/page being that if a page exists and you didn’t know it then you should probably have a good look at what’s already there before writing something new.

(This came up most recently because a Google search for a page led to the /new/ link even though the page exists - Google had clearly found the link somewhere and added it to its list, it does this even if a robots.txt file exists since the link exists on a page that it can read.)

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

… and in the time since you asked for clarification, it would appear that you’ve fixed it anyway as it no longer appears having just updated instiki.

Thanks.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

Distillation for Distler (sorry, …)

Create a page with an apostrophe in the title, say apostrophe's.
Edit page.
Page magically becomes apostrophe's.
Edit page, changing its name back to apostrophe's.
Page name is now correct.
Edit page.
Page name is mangled again.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Feature Requests

Not sure if this is a bug or a feature request …

Google searches now include author information which it tries to glean from the page. It would appear that it uses the “Revised by XYZ” information to do this. It’s been suggested that this is because that is in a div with class name byline. I’m going to try changing this to see if it stops Google from assuming that to be the author. I don’t yet know how to override Google’s ad hoc method (which really does seem ad hoc if it uses a CSS class name as evidence).

I’ll report back on whether or not it works. If it does, consider this a feature request for changing byline to something like revisedby.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

There would appear to be a bug on pages with apostrophes in their names. See http://nforum.mathforge.org/discussion/4757/apostrophes-in-page-titles-lead-to-weird-behaviour for details.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

I just tried on a completely fresh install and found that it happened just as you described: the name used was the old name (and it gets swept twice). That was a sqlite3 database.

I’ve tried to install mysql on my mac to test this there, but get to a crash when I run instiki. Not sure why, seems to be related to the gem not finding my mysql lib files but it’s taken too much of my time already to try to fix it to test further. I can try it on my linux machine later.

However, the evidence certainly suggests that there is a difference between mysql and sqlite3 on this one.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

I know you don’t believe in the Cache Bug …

I had a page on the nLab which I renamed. Looking at the logs, then I see the save command. It ends with:

 "alter_title"=>"1", "new_name"=>"adequate subcategory > history", "author"=>"Andrew Stacey", "web"=>"nlab", "id"=>"adequate subcategory"}

The next log items are:

24808: 2012-10-18 14:22:26 +0400: Reading page 'adequate subcategory' from web '
nlab'
24808: 2012-10-18 14:22:26 +0400: Page 'adequate subcategory'  found
24808: 2012-10-18 14:22:26 +0400: Checking DNSBL 214.7.76.192.bl.spamcop.net
24808: 2012-10-18 14:22:26 +0400: Checking DNSBL 214.7.76.192.sbl-xbl.spamhaus.org
24808: 2012-10-18 14:22:27 +0400: 192.76.7.214 added to DNSBL passed cache
24808: 2012-10-18 14:22:27 +0400: 192.76.7.214
24808: 2012-10-18 14:22:27 +0400: Reading page 'adequate subcategory' from web 'nlab'
24808: 2012-10-18 14:22:27 +0400: Page 'adequate subcategory'  found
24808: 2012-10-18 14:22:27 +0400: Maruku took 0.148016386 seconds.
24808: 2012-10-18 14:22:27 +0400: Maruku took 0.165203678 seconds.
24808: 2012-10-18 14:22:28 +0400: Expired fragment: views/nlab/show/adequate+subcategory+>+history (0.3ms)

There are then a slew of more expirations, the first ones being adequate+subcategory+>+history. The last ones are also adequate+subcategory+>+history.

I just tried on my course wiki. Here are the steps I took:

Edit a page and change it’s name.
Remove the automatically-inserted redirect.
Save the page.

Result: the expiration sweep does not include the old name and includes the new name twice.

I’m wondering if this could be the culprit: app/controllers/revision_sweeper.rb:

  def before_save(record)
    if record.is_a?(Revision)
      expire_cached_page(record.page.web, record.page.name) 
      expire_cached_revisions(record.page)
    end
  end

I notice that in the save post data then the name is the new name and the id is the old name. Could it be that the record object is populated with the new name before this action takes place? I’m not very good at tracing through ruby code, but if my guesses about what happens are right then the before_save action takes place just before the save action is executed in page.revise. If so, then by this time the name is the new name and the old name has been forgotten: in wiki.rb then the call is page.revise(content, new_name, revised_at, author, renderer) and very early on in the revise method then I see self.name = name.

Perhaps the revise should save its name in old_name and then the before_save can use that if it differs from the current name?

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Debugging uninterruptible sleep

Agree completely.

The processes that were handling the list call were entering uninterruptible sleep and were using a large amount of memory - of the order of 300Mb to 500Mb. The system would get bogged down if there were more than one of them, but even one would take a reasonable amount of time to complete.

My suspicion is therefore that it relates to writing the file to disk for caching. So I suspect that there really is a problem with the hardware and that having several processes trying to write the same file was exposing it.

(There was a change in hardware underpinning the VPS recently.)

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Debugging uninterruptible sleep

I’ve added the timestamps and process ids to the logs and that’s made things a lot clearer. It looks as though it is the “All Pages” request that is clogging up the works, and there appear to be some spiders that don’t respect robots.txt and find “All Pages” fairly early on in their crawl.

You’ve mentioned before the possibility of adding a pageinate routine to “All Pages”. Would that help me, do you think? Or is it easier just to disable it (at 7000 pages then it’s a bit cumbersome, to say the least, so I’ve no compunction at simply disabling it altogether).

Incidentally, I found I’d forgotten that bzr doesn’t set permissions so some stuff in public was unreadable. That might explain why the SVG editor wasn’t working for me as there were a couple of files from that affected.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Debugging uninterruptible sleep

I strongly doubt that it is due to instiki itself, but it would be nice to isolate exactly what causes the problem. Thanks for the link.

After watching top all day, I think things are going into D state far more than I would expect so I’m going to contact our server provider.

To help in debugging this, I’m going to add the process pid to the logging messages as that’ll make it easier to link what I see in top to what I see in the production.log.

posted 12 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Debugging uninterruptible sleep

I’m getting a slew of processes getting into “uninterruptible sleep” and staying there. They sit there, eating CPU and memory, until the system slows down enough that folks complain.

Do you happen to know how to debug these? From what I’ve read, this is likely to be something getting stuck on I/O.

One thing that occurred to me was that all the processes are logging to the same file. Could they get stuck in some sort of queue for that?

(Also from my reading around, it would appear that the root cause of this is more likely to be at the kernel end, and thus an issue with drivers and hardware, than with instiki. Still, I’d like to know what it is that is triggering the sleep.)

posted almost 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

And another one, this time in how maruku parses its meta-data. It would seem that not leaving a space at the end of {: #identifier} means that the } gets into the identifier.

Presumably this is the case here as well:

What’s the id of this element?

I get:

<ul>
<li id='list}'>What’s the <code>id</code> of this element?</li>
</ul>

in the source.

posted almost 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Bugs

I’m unable to run the inbuilt SVG editor on my computer (running Mac OS X, Lion). The window launches but none of the icons are present and although the buttons highlight when I hover over them, nothing happens when I click on one.

This is with Firefox, Chrome, and Safari. Not sure what additional information you would like on this.

posted 13 years ago

Andrew Stacey 118 posts

Forum: itex2MML – Topic: itex and other languages

I thought it might be worth noting that the nForum (and the other mathematical forums that I run) now use itex directly in PHP. (I probably won’t get the words right on this) That is, I’ve compiled itex2MML into a PHP extension (using swig) and am calling that now instead of farming the conversion off to the nLab. I’ve also installed MathJaX to support (as best I can) non-compliant browsers.

posted 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Feature Requests

I was checking this forum to see if you’d posted a notice that you’d fixed the bug, but I didn’t see that you’d edited your previous comment rather than posting a new one - and at the start of the week then I don’t always click through links or check RSS feeds. All of my instiki installations are now up to date. Thanks.

posted 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Feature Requests

Wrong thread, then!

posted 13 years ago

Andrew Stacey 118 posts

edited 13 years ago

Forum: Instiki – Topic: Feature Requests

Take a look at http://ncatlab.org/nlab/show/Sandbox (feel free to ignore the request, I put that there to ensure that no-one messed with them before you’d seen it). Each theorem has .num_theorem style so the CSS counts them. However, the counting that is done by Instiki to find the theorem numbers for use in \ref{...} commands only counts those with ID tags. So the labels for the theorems that I do want to refer to are miscounted because I prefer to have all my theorems numbered rather than only number those that I explicitly refer to.

Does the same issue happen here. Let’s experiment:

Theorem

The first letter of the English alphabet is A.

Theorem

The second letter of the English alphabet is B.

Theorem

The third letter of the English alphabet is C.

Theorem

The fourth letter of the English alphabet is D.

Theorem

The fifth letter of the English alphabet is E.

Theorem

The sixth letter of the English alphabet is F.

Theorem 1 refers to A, Theorem 3 to C, and Theorem 6 to F.

Yes, so it’s the same here. Thus, no need to check the Sandbox. Here’s the source of what I just typed:


+-- {: .num_theorem #a}
###### Theorem

The first letter of the English alphabet is A.
=--

+-- {: .num_theorem}
###### Theorem

The second letter of the English alphabet is B.
=--

+-- {: .num_theorem #c}
###### Theorem

The third letter of the English alphabet is C.
=--

+-- {: .num_theorem}
###### Theorem

The fourth letter of the English alphabet is D.
=--

+-- {: .num_theorem}
###### Theorem

The fifth letter of the English alphabet is E.
=--

+-- {: .num_theorem #f}
###### Theorem

The sixth letter of the English alphabet is F.
=--

Theorem \ref{a} refers to A, Theorem \ref{c} to C, and Theorem \ref{f} to F.

posted 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Feature Requests

The issue of needing ids to keep theorem numbers in step has been raised (again). Mike came up with a suggestion which seems reasonable on first reading which is that instiki (or whichever part of it is appropriate) adds an automatic id if one isn’t present, much in the way that the table of contents part does.

So if I write

+-- {: .num_theorem}
###### Theorem
$X$ is nuclear and Banach if and only if it is finite dimensional
=--

then instiki adds an automatic id, say id="theorem1" where the 1 gets incremented according to the internal counter (which it has, if I remember right, since it uses that to figure out the ref number). Would this be feasible?

posted 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Feature Requests

That sounds like a request for macro-support in itex. For a variety of reasons, that’s unlikely to happen.

I, for one, would be against this. It would actually hinder collaboration as everyone would have to learn the local conventions every time they wanted to edit a page, and stuff that worked on one wouldn’t work on another.

posted 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: Feature Requests

(Hopefully minor) feature request: this came up in a discussion on citing the nLab. It would be convenient to have the current revision number displayed on the page somewhere obvious. Perhaps the footer could read:

Version 144, revised on December 1, 2011 11:24:39 by …

I know it’s easy to deduce - take the number after “Back in time” and add 1 - which is why I said “convenient” rather than anything stronger.

posted 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: nlab

The “save”s were all due to one bot and none actually made it to the database.

posted 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: nlab

Okay, so looking through the week’s log for bots (bot, spider, crawler), I get 33,517 hits (actual time period: 11th December 6:25am to 16th December 11:27am, so that’s an average of a little over 4 hits per minute). These break down as follows:

25690: show
2953: new
1345: history
1290: edit
1029: source
395: cancel_edit
81: files
67: atom_with_headlines
53: recently_revised
15: save
13: atom_with_content

There’s a few that I’ve missed out in between - there are clearly some bad links to the nlab.

I’d say that only show should show in that list. source could, but I don’t really see why. The saves are a bit worrying - I’m going to check those!

Next is to analyse how those are distributed.

posted 13 years ago

Andrew Stacey 118 posts

Forum: Instiki – Topic: nlab

Any thoughts on the following idea?

From time to time, the nLab gets a whole host of spiders and other bots crawling all over it. While I understand that they’re part of what makes the internet work, they can be a bit annoying and slow down the server for everyone else. So I thought of channelling requests a little more cleverly than I currently do. At the moment, I use a global queue in passenger which is fine until all the slots get a slow request. So what I thought was to have a semi-global queue with slow requests (like feeds and lists) and bots being handled by a few dedicated processes, normal requests by some others, and maybe a “priority” list as well. Since passenger doesn’t do this itself (it either has global queue or individual queues) I think that what I’d have to do is to have three virtual versions of the nLab, at least as far as apache and passenger are concerned. Then apache would examine the request and classify it according to which type it was and send it to the right version of the nLab. Passenger wouldn’t know that these are the same so would have a global queue for each, and that way requests get segregated and so don’t hold up others in other segments. The way that I’d have three virtual versions is simply with symlinks in the filesystem: “nlab”, “nlabPriority”, and “nlabSlow” would all be symlinks to the same instiki installation.

Can you see any immediate problems with that? As far as Instiki is concerned, it’s just like being run under passenger as there will be multiple instances of instiki running concurrently, which is what already happens. So that shouldn’t be affected. Apache, also, eats this sort of thing for breakfast, and passenger can cope with different programs as well. So I don’t see an immediate flaw.

(Of course, it may be that this won’t solve the blockage, but it’s less drastic than moving servers which is the other option.)