Recent Posts

Subscribe to Recent Posts 455 posts found

posted 12 years ago
Andrew Stacey 118 posts

Forum: Heterotic Beast – Topic: Bugs

Deleting the first post in a topic leads to a “page does not exist” error.

 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Instiki – Topic: nlab

Bother. I can’t count. I saw that the last log file was numbered 23 and assumed that that meant I hadn’t missed any log files. Sadly not. So my logs aren’t complete. Ah well. Incidentally, why name them numerically? Why not name them by a date-and-time stamp? Then they wouldn’t have to all be renamed when the next one is created.

Thinking about Recently Revised and All Pages, you suggested (somewhere) taking them out of the sweeper as a way of stopping them being regenerated every time a page is edited (I don’t know if this was one of you “If you’re going to do something crazy, here’s a way of limiting how crazy you’re going to be” suggestions or if you thought this was actually a good idea). Then I’d have to manually regenerate them every, say, hour by deleting the cached copy so that the next hit recreated it. It occurred to me that the same cron job that deleted the cache could also hit the page in the server to force the regeneration. It then occurred to me that it was silly having that go via the webserver when it was on the same machine as the program.

With my current knowledge of instiki, what I thought of for getting round this was to have an instiki process running invoked “from the command line” and listening on, say, port 2500. I block that port from all outside traffic and use it only for localhost. Then the cron job hits that port, avoiding the webserver. The alternative would be to have it so that I could call instiki nlab/recentlyrevised on the commandline, whereupon instiki fires up, renders the page, and disappears in to the ether again.

 
posted 12 years ago
distler 123 posts

Forum: Instiki – Topic: nlab

Did a lot of profiling this weekend, and produced a few tweaks to Maruku’s parsing, which speeded it up a little.

Unfortunately, the main discovery was that (with that test page as input), 3/4 of Maruku’s time is spent in the #to_html output method; only 1/4 is spent in parsing the original input. Thus, my efforts, which maybe improved the parsing speed by 5%, contributed at best a 1% speedup in the total Instiki processing time, i.e something you would never notice.

I hope that one of your guys finds formal grammars sufficiently “categorical” to be worthy of a small bit of their attention.

 
posted 12 years ago
distler 123 posts

Forum: Instiki – Topic: nlab

Pretty much everything beyond the standard Markdown syntax needs to be written, though some folks in the peg-markdown network seem to have included Michel Fortin’s Markdown-Extra extensions (albeit, along with some of their own, incompatible, extensions).

Presumably, this fork of peg-markdown would be directly linked to the itex2MML functions which process inline and display equations (ie, it would not use the Ruby bindings provided by the itextomml gem). So there’s a little bit more to do than write the peg grammar. But not a lot more …

As to whether you want to ask someone to do this, I’ve already explained my desire to replace Maruku (for licensing reasons). Here’s another motivation, from efficiency. Instiki’s performance does genuinely suck, in this instance.

The question is, are they (your nlab colleagues) willing to do anything about improving it?

Feel free to point them to this discussion, and to the previous one on Markdown alternatives.

 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Heterotic Beast – Topic: Bugs

May I suggest ”Other users online: XYZ”.


Okay, that’s odd. I suggested the above having seen the “Users online: distler” message, and not seeing my own name. Now I click back to the forums list and I am listed there this time. So either it got it wrong first time, or you’re playing with the code and I should shut up and let you get on with it.

 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Instiki – Topic: nlab

Now that I see the break-down, I agree with you that it’s a waste of time optimising the wikilinks for now. That’s an astonishing amount of time for maruku to take! I wonder how much of that is itex; I guess I can test that for myself by running maruku on a few files on my machine here, with and without itex.

I could ask on the nForum about volunteers for writing a PEG grammar. That sounds like a good, specific task that someone might just be willing to do. Shall I ask?

Do you have a clear idea of which bits of maruku’s syntax are missing, or would the task involve determining that?

 
posted 12 years ago
distler 123 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

On golem (the iMac on my desk), rendering a copy of that page takes

Completed in 27588ms (View: 27356, DB: 192)

Note that it does spend (what I consider to be) a significant amount of time querying the database, but it is totally dwarfed by the rendering time. (I don’t know why yours is spending an order of magnitude longer querying the database. Seems like something’s very wrong, there, even though the conclusion is the same.)

The page, of course, contains a number of markup errors, like

... [[transversal map]s ...

which sent Maruku into convulsions. Surprisingly, correcting those errors did not appreciably affect the rendering time (the above-reported time is after making the relevant markup corrections).

On my laptop, a typical time was

Completed in 48775ms (View: 48635, DB: 77)

SQlite3 is faster than MySQL, but the machine itself is significantly slower than the iMac.

Of those 49 seconds, spent rendering the page, 43 of them were spent in Maruku (for obscure reasons, Maruku has to be run twice, so really, we’re talking about 22 seconds to process the 175KB source).

Maruku doesn’t particularly care about the number of WikiLinks, so that has nothing to do with why it takes so long render this particular page.

Of the remaining 6 seconds, 4 seconds were spent in the Instiki Sanitizer. I don’t think there’s much to optimize there.

The remaining 2 seconds were, largely, spent in the Chunk-Handler – the thing that processes Wikilinks (and, presumably, cares about how many of them there are). 2 seconds is still a long time, but it’s not surprising. Doing on the order of 10 3 RegExp substitutions (5360 chunk-masking and 686 chunk-unmasking operations, to be precise) on a 175KB string, takes significant time. Using Regexps to process long strings sucks.

I have looked at various optimizations of the Chunk-Handler code, but nothing I can do will contribute much to the speedup of rendering this page, which is dominated by Maruku.

Now, if one of your nLab folks were to volunteer to write a PEG grammar for Maruku’s extended Markdown syntax, …

Update:

Now, if one of your nLab folks were to volunteer to write a PEG grammar for Maruku’s extended Markdown syntax, …

Since I’m not gonna hold my breath for that to happen, I decided to spend some time (alas, more than I expected) making Maruku faster. The new rendering times for that page are

Completed in 13228ms (View: 13024, DB: 198)

on golem and

Completed in 21666ms (View: 20979, DB: 83)

on my laptop. Roughly a factor of 2 improvement in the total rendering time, in both cases.

Still not great, but it’s the best that I am going to achieve.

 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Heterotic Beast – Topic: Bugs

Vanilla stores some information in the user database, including the last comment in each discussion that you read.

 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Heterotic Beast – Topic: Bugs

Incidentally, if you want to get a feel for what Vanilla looks like but don’t want to sign on to the nForum, I have a test forum set up: http://www.math.ntnu.no/~stacey/Mathforge/Test/. I can easily add any plugins from the nForum that you might want to play with.

 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Instiki – Topic: nlab

The “print” view wasn’t much faster: 71303ms.

 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Instiki – Topic: nlab

Okay, let’s take http://ncatlab.org/nlab/show/smooth%20infinity-groupoid%20--%20structures which, according to grep, has the order of 409 wikilinks. (Actually, it has 409 hits for the string [[.) On first load, it was already cached and produced the following figures: 169ms first time, 62ms second. I got 0.026s for “time to first byte” and 0.663s for “from click to complete display”.

Now I delete it from the cache, and try again. A cup-of-tea later, and I get the following: 74074ms (View 72773, DB: 1276). On the receiving end, I get 82s and 87s for the delivery times. Second time, similar figures.

The time this gets a bit annoying is when editing a page, since then it has to regenerate it each time. That’s a fair wait if you’ve only changed a couple of spelling mistakes.

 
posted 12 years ago
distler 123 posts

edited 12 years ago

Forum: Heterotic Beast – Topic: Bugs

Layer 1

It would be more sensible if it took me to the point where I last read up to (which I presume it knows).

How does Vanilla keep track of what posts you’ve read?

 
posted 12 years ago
distler 123 posts

Forum: Instiki – Topic: nlab

One thing that I am pretty sure that slows down a page load is if the page has a lot of wikilinks on it. I don’t know how it checks all the links, but is there some way that that could be speeded up?

Could you compare (by deleting the cached page and reloading ) how long it takes to build a wikilink-heavy page, versus a “normal” one?

I’m particularly interested in the database-lookup times. As I said via email, the WikiReferences model uses a lot of raw SQL queries (which, therefore, do not benefit from ActiveRecord caching). But I am a little skeptical that is the cause of much of a slowdown.

 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Heterotic Beast – Topic: Bugs

When I click on a discussion/whatever then it takes me to the top of the page. It would be more sensible if it took me to the point where I last read up to (which I presume it knows).

 
posted 12 years ago
Andrew Stacey 118 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

Right, so on the crashing then I’m just watching the new system and waiting to see what it does and how it responds. I’ll keep a hold of all the logs for statistical purposes (not that I’ve any real idea about statistics …).

One thing that I am pretty sure that slows down a page load is if the page has a lot of wikilinks on it. I don’t know how it checks all the links, but is there some way that that could be speeded up? Urs has some pages with loads and loads of links, and there’s talk of having some pages where everything possible is linked (with CSS to lessen the visual impact).

 
posted 12 years ago
Andrew Stacey 118 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

Yes, I was thinking that 1Mb was a bit low. When the googlebot hit last night then it was creating a new log file every 20 minutes or so!

 
posted 12 years ago
distler 123 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

It keeps 25 files, of size 1MB each.

Both of these numbers are configurable.

 
posted 12 years ago
Andrew Stacey 118 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

It’s even possible that it didn’t crash this morning. It was very slow loading the main page so I restarted it. I couldn’t see from the logs anything special, though.

I’m not inured to instiki crashing! I would love to get to the bottom of it, but it’s hard to know what to monitor, and how to monitor it (especially as it may have been my monitoring procedures that contributed to its crashing).

As you say, let’s see how long it can last.

By the way, how many log files does it keep? Does it just keep renumbering them each time it rotates them? If so, I’d better move them out of the log directory each day.

 
posted 12 years ago
distler 123 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

I wouldn’t count this morning’s crash as anything special.

You seem to be inured to the idea of Instiki crashing. I am not. It shouldn’t crash, and there’s something wrong if it does. I’m not even convinced that my PassengerPoolIdleTime theory explains the phenomenon. Let’s see if it can go a week without hiccup.

 
posted 12 years ago
Andrew Stacey 118 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

I’ve set that one that kills off processes after 20 requests (PassengerMaxRequests), and I’ve set it up to allow 10 concurrent processes (PassengerMaxPoolSize).

I wouldn’t count this morning’s crash as anything special. More likely just teething problems.

 
posted 12 years ago
admin 63 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

atom_with_content renders the last 15 updated pages. So it should take approximately 15 times as long as it takes to render 1 page.

Incidentally, the nLab had crashed when I came in this morning…

Ouch. OK. Now I am puzzled.

Is there a process-limit that gets exceeded on your VPS? (Seems unlikely: if there were too many active processes, you wouldn’t be able to login.) Could it be that Passenger is killing off a long-running response, because it thinks Instiki is “inactive”. If so, try setting PassengerPoolIdleTime to 0.

The default (300) clearly won’t work for you if

  1. A list request (say) takes more than 300s to complete.
  2. That’s the only request received during the 5-minute period in question. You receive, on average, 3 requests/minute, so it’s not inconceivable that you go for long stretches with no requests at all.
 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Heterotic Beast – Topic: Bugs

Is the colour of the icon next to the forum title meant to tell me if there’s new stuff there? If so, it’s not always in sync.

 
posted 12 years ago
Andrew Stacey 118 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

I was blocking them at the Apache level. That seemed to be what you were complaining about! Maybe I misunderstood.

I can understand list being so big, but atom_with_content is limited to the last … whatever … revisions, so that shouldn’t be so big, should it?

(I think that to get a better picture from the logs then I need to figure out a way to distinguish those that got cached from those that needed a serious operation. But as I’m back on “vanilla” instiki, my hacks to allow this have been taken out.)

Incidentally, the nLab had crashed when I came in this morning, but it’s been running fairly stably all day. So that might just have been first night jitters (or the rampaging googlebot that came through at about midnight).

 
posted 12 years ago
admin 63 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

But it didn’t crash, or become otherwise unresponsive. Which is better than what you were doing before your “clean” install.

Some actions, like list, are O(N). With a Wiki your size, they will take a long time to complete1. If you want prevent people from calling the list action (or other O(N) actions), I suggest blocking/redirecting it at the Apache level.

I assume you know how to use mod_rewrite to good effect.

But it appears that the nLab operates acceptably, even when you allow such actions.


  1. Incorporating will_paginate (which is used by Heterotic Beast, for just this reason) in those actions will make them O(1), again.

 
posted 12 years ago
Andrew Stacey 118 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

Okay, so after a day’s trading at the nLab, here are the closing prices. This is with nothing disabled.

atom_with_content: 192 requests, 176 taking less than 10s to process, 16 taking 50s to process.

list: 6 requests, 2 taking less than 10s to process, 4 taking 5 minutes to process.

Total number of requests: 4223, breakdown of time taken to process (rounded down to nearest 10s):

  0: 4142
  1: 47
  2: 8
  3: 3
  4: 1
  5: 17
  6: 1
  7: 1
  30: 1
  31: 2
  32: 1

So those atom_with_content and list are a major part of the requests that take longer than 30s to deliver.

 
posted 12 years ago
distler 123 posts

Forum: Instiki – Topic: Feature Requests

As far as I can tell, updating the application files on a running Rails application (in production mode) has no effect, until the application is restarted. I, honestly, haven’t thought about the bundled Gems, but I expect the answer is the same.

In any case, ruby bundle is pretty fast (of course, that’s because it’s actually superfluous) if the Gemfile hasn’t changed. I suppose you can stat the Gemfile to see whether you actually need to run ruby bundle at all.

But I don’t think that’s your issue…

 
posted 12 years ago
Andrew Stacey 118 posts

Forum: Instiki – Topic: Feature Requests

Okay. So simpler just to have separate copies of the code.

I guess what I’m stumbling around with this is some way to make the update cycle a little simpler. At the moment, I update a live instiki installation from golem, and with the ruby bundle step as well, then the update can take a noticeable length of time. What would be better would be to pull the changes to an offline copy on that machine, then update the live versions from the offline copy.

I can easily do that with the bzr stuff, but how would it work with the gems? Perhaps if the live versions weren’t bzr repositories but were more simply synced with the offline version in some fashion (thinking a bit like how rsync works). Hmm, so the workflow would be:

  1. Pull the latest instiki revisions to the “dead” code
  2. Run ruby bundle on that
  3. Copy any updated files from the “dead” version to the live versions, being sure not to clobber anything special
  4. Restart instiki on the live versions

would that work, do you think?

Or is this another case of me thinking something is important which really isn’t.

 
posted 12 years ago
Andrew Stacey 118 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

I have a script that exports the day’s revisions to a set of bzr repositories (this is to make it easier for people to keep incremental backups of their webs). I haven’t reinstalled it as a cron job yet, and again I can run it manually.

Root has its usual cron jobs (at least, it does now: in the installed image then it doesn’t run them but lets anacron do it, which is fine except that on a server, anacron isn’t running; however, cron just checks for the existence of anacron, not whether or not it is running). But that’s all.

 
posted 12 years ago
distler 123 posts

edited 12 years ago

Forum: Instiki – Topic: Feature Requests

Given that instiki can be installed as a gem …

Instiki cannot be installed as a Gem.

There’s an old (~0.10.x) version, which worked as a Gem, and which is probably still floating around (on the internets, nothing ever really disappears). But that was long before my time, and I have not even thought about packaging the current version as a Gem.

Of course, under Passenger, you can run multiple instances of a Rails application (including Instiki), under different subdirectories (or subdomains, if you have virtual hosts enabled).

(At least with Instiki, that would require separate copies of the code, as each instance would have to point to its own database (in config/database.yml). I suppose one could use soft-links astutely, so that there was really only one copy of the source code, shared by these different instances.)

 
posted 12 years ago
distler 123 posts

edited 12 years ago

Forum: Instiki – Topic: nlab

Let’s keep it “off” for the time-being; I don’t have any plans to change anything for the next week, at least.

Do you have any other scripts/cron-jobs running?