nlab
Andrew Stacey
118 posts
edited 13 years ago |
I’ve now upgraded the nLab server and reinstalled everything. In particular, I pulled a fresh copy of instiki from the repository and made only two changes:
In particular, I’m not mucking about with the logs just yet. We’ll see how this goes for now. Now, should I enable the daily check on the instiki source code, or shall I keep that off for the time being and update manually? |
distler
Moderator
123 posts
edited 13 years ago |
Let’s keep it “off” for the time-being; I don’t have any plans to change anything for the next week, at least. Do you have any other scripts/cron-jobs running? |
Andrew Stacey
118 posts
edited 13 years ago |
I have a script that exports the day’s revisions to a set of bzr repositories (this is to make it easier for people to keep incremental backups of their webs). I haven’t reinstalled it as a cron job yet, and again I can run it manually. Root has its usual cron jobs (at least, it does now: in the installed image then it doesn’t run them but lets anacron do it, which is fine except that on a server, anacron isn’t running; however, cron just checks for the existence of anacron, not whether or not it is running). But that’s all. |
Andrew Stacey
118 posts
edited 13 years ago |
Okay, so after a day’s trading at the nLab, here are the closing prices. This is with nothing disabled.
Total number of requests: 4223, breakdown of time taken to process (rounded down to nearest 10s):
So those |
admin
Administator
63 posts
edited 13 years ago |
But it didn’t crash, or become otherwise unresponsive. Which is better than what you were doing before your “clean” install. Some actions, like I assume you know how to use But it appears that the nLab operates acceptably, even when you allow such actions.
|
Andrew Stacey
118 posts
edited 13 years ago |
I was blocking them at the Apache level. That seemed to be what you were complaining about! Maybe I misunderstood. I can understand (I think that to get a better picture from the logs then I need to figure out a way to distinguish those that got cached from those that needed a serious operation. But as I’m back on “vanilla” instiki, my hacks to allow this have been taken out.) Incidentally, the nLab had crashed when I came in this morning, but it’s been running fairly stably all day. So that might just have been first night jitters (or the rampaging googlebot that came through at about midnight). |
admin
Administator
63 posts
edited 13 years ago |
Ouch. OK. Now I am puzzled. Is there a process-limit that gets exceeded on your VPS? (Seems unlikely: if there were too many active processes, you wouldn’t be able to login.) Could it be that Passenger is killing off a long-running response, because it thinks Instiki is “inactive”. If so, try setting The default (300) clearly won’t work for you if
|
Andrew Stacey
118 posts
edited 13 years ago |
I’ve set that one that kills off processes after 20 requests (PassengerMaxRequests), and I’ve set it up to allow 10 concurrent processes (PassengerMaxPoolSize). I wouldn’t count this morning’s crash as anything special. More likely just teething problems. |
distler
Moderator
123 posts
edited 13 years ago |
You seem to be inured to the idea of Instiki crashing. I am not. It shouldn’t crash, and there’s something wrong if it does. I’m not even convinced that my |
Andrew Stacey
118 posts
edited 13 years ago |
It’s even possible that it didn’t crash this morning. It was very slow loading the main page so I restarted it. I couldn’t see from the logs anything special, though. I’m not inured to instiki crashing! I would love to get to the bottom of it, but it’s hard to know what to monitor, and how to monitor it (especially as it may have been my monitoring procedures that contributed to its crashing). As you say, let’s see how long it can last. By the way, how many log files does it keep? Does it just keep renumbering them each time it rotates them? If so, I’d better move them out of the log directory each day. |
distler
Moderator
123 posts
edited 13 years ago |
It keeps 25 files, of size 1MB each. Both of these numbers are configurable. |
Andrew Stacey
118 posts
edited 13 years ago |
Yes, I was thinking that 1Mb was a bit low. When the googlebot hit last night then it was creating a new log file every 20 minutes or so! |
Andrew Stacey
118 posts
edited 13 years ago |
Right, so on the crashing then I’m just watching the new system and waiting to see what it does and how it responds. I’ll keep a hold of all the logs for statistical purposes (not that I’ve any real idea about statistics …). One thing that I am pretty sure that slows down a page load is if the page has a lot of wikilinks on it. I don’t know how it checks all the links, but is there some way that that could be speeded up? Urs has some pages with loads and loads of links, and there’s talk of having some pages where everything possible is linked (with CSS to lessen the visual impact). |
distler Moderator 123 posts |
Could you compare (by deleting the cached page and reloading ) how long it takes to build a wikilink-heavy page, versus a “normal” one? I’m particularly interested in the database-lookup times. As I said via email, the WikiReferences model uses a lot of raw SQL queries (which, therefore, do not benefit from ActiveRecord caching). But I am a little skeptical that is the cause of much of a slowdown. |
Andrew Stacey 118 posts |
Okay, let’s take http://ncatlab.org/nlab/show/smooth%20infinity-groupoid%20--%20structures which, according to grep, has the order of 409 wikilinks. (Actually, it has 409 hits for the string Now I delete it from the cache, and try again. A cup-of-tea later, and I get the following: 74074ms (View 72773, DB: 1276). On the receiving end, I get 82s and 87s for the delivery times. Second time, similar figures. The time this gets a bit annoying is when editing a page, since then it has to regenerate it each time. That’s a fair wait if you’ve only changed a couple of spelling mistakes. |
Andrew Stacey 118 posts |
The “print” view wasn’t much faster: 71303ms. |
distler
Moderator
123 posts
edited 13 years ago |
On
Note that it does spend (what I consider to be) a significant amount of time querying the database, but it is totally dwarfed by the rendering time. (I don’t know why yours is spending an order of magnitude longer querying the database. Seems like something’s very wrong, there, even though the conclusion is the same.) The page, of course, contains a number of markup errors, like
which sent Maruku into convulsions. Surprisingly, correcting those errors did not appreciably affect the rendering time (the above-reported time is after making the relevant markup corrections). On my laptop, a typical time was
SQlite3 is faster than MySQL, but the machine itself is significantly slower than the iMac. Of those 49 seconds, spent rendering the page, 43 of them were spent in Maruku (for obscure reasons, Maruku has to be run twice, so really, we’re talking about 22 seconds to process the 175KB source). Maruku doesn’t particularly care about the number of WikiLinks, so that has nothing to do with why it takes so long render this particular page. Of the remaining 6 seconds, 4 seconds were spent in the Instiki Sanitizer. I don’t think there’s much to optimize there. The remaining 2 seconds were, largely, spent in the Chunk-Handler – the thing that processes Wikilinks (and, presumably, cares about how many of them there are). 2 seconds is still a long time, but it’s not surprising. Doing on the order of RegExp substitutions (5360 chunk-masking and 686 chunk-unmasking operations, to be precise) on a 175KB string, takes significant time. Using Regexps to process long strings sucks. I have looked at various optimizations of the Chunk-Handler code, but nothing I can do will contribute much to the speedup of rendering this page, which is dominated by Maruku. Now, if one of your nLab folks were to volunteer to write a PEG grammar for Maruku’s extended Markdown syntax, … Update:
Since I’m not gonna hold my breath for that to happen, I decided to spend some time (alas, more than I expected) making Maruku faster. The new rendering times for that page are
on
on my laptop. Roughly a factor of 2 improvement in the total rendering time, in both cases. Still not great, but it’s the best that I am going to achieve. |
Andrew Stacey 118 posts |
Now that I see the break-down, I agree with you that it’s a waste of time optimising the wikilinks for now. That’s an astonishing amount of time for maruku to take! I wonder how much of that is itex; I guess I can test that for myself by running maruku on a few files on my machine here, with and without itex. I could ask on the nForum about volunteers for writing a PEG grammar. That sounds like a good, specific task that someone might just be willing to do. Shall I ask? Do you have a clear idea of which bits of maruku’s syntax are missing, or would the task involve determining that? |
distler Moderator 123 posts |
Pretty much everything beyond the standard Markdown syntax needs to be written, though some folks in the peg-markdown network seem to have included Michel Fortin’s Markdown-Extra extensions (albeit, along with some of their own, incompatible, extensions). Presumably, this fork of peg-markdown would be directly linked to the itex2MML functions which process inline and display equations (ie, it would not use the Ruby bindings provided by the itextomml gem). So there’s a little bit more to do than write the peg grammar. But not a lot more … As to whether you want to ask someone to do this, I’ve already explained my desire to replace Maruku (for licensing reasons). Here’s another motivation, from efficiency. Instiki’s performance does genuinely suck, in this instance. The question is, are they (your nlab colleagues) willing to do anything about improving it? Feel free to point them to this discussion, and to the previous one on Markdown alternatives. |
distler Moderator 123 posts |
Did a lot of profiling this weekend, and produced a few tweaks to Maruku’s parsing, which speeded it up a little. Unfortunately, the main discovery was that (with that test page as input), 3/4 of Maruku’s time is spent in the I hope that one of your guys finds formal grammars sufficiently “categorical” to be worthy of a small bit of their attention. |
Andrew Stacey 118 posts |
Bother. I can’t count. I saw that the last log file was numbered 23 and assumed that that meant I hadn’t missed any log files. Sadly not. So my logs aren’t complete. Ah well. Incidentally, why name them numerically? Why not name them by a date-and-time stamp? Then they wouldn’t have to all be renamed when the next one is created. Thinking about Recently Revised and All Pages, you suggested (somewhere) taking them out of the sweeper as a way of stopping them being regenerated every time a page is edited (I don’t know if this was one of you “If you’re going to do something crazy, here’s a way of limiting how crazy you’re going to be” suggestions or if you thought this was actually a good idea). Then I’d have to manually regenerate them every, say, hour by deleting the cached copy so that the next hit recreated it. It occurred to me that the same cron job that deleted the cache could also hit the page in the server to force the regeneration. It then occurred to me that it was silly having that go via the webserver when it was on the same machine as the program. With my current knowledge of instiki, what I thought of for getting round this was to have an instiki process running invoked “from the command line” and listening on, say, port 2500. I block that port from all outside traffic and use it only for localhost. Then the cron job hits that port, avoiding the webserver. The alternative would be to have it so that I could call |
distler
Moderator
123 posts
edited 13 years ago |
The former. You’re trading off workload on the server for stale data. Since computers are supposed to serve humans, rather than the other way around, the question is: does this improve the user experience? Say you implement the above suggestion. On the one hand, the user always (or almost always, depending on implementation) receives the cached page, i.e. gets a quick response. On the other hand, the data is invariably stale. Leaving these alone, the user is guaranteed to receive fresh data, but there could be a significant delay, if the page has to be regenerated. What percentage of requests for these pages hit the cache? A better solution is to pull in the As to moving away from Maruku, to some
|
Andrew Stacey 118 posts |
(Not so much a bug, but also not a feature request, so gets put here.) Is there a reason why the italic style in theorems is done via:
rather than:
The
will come up as italic, I think. Actually, I can test it here: TheoremYes, it did. So the normal CSS inheritance is effectively bypassed by the I’m generally reluctant to modify stuff that you’ve put in place! Is there a reason for the |
admin Administator 63 posts |
Works for me. TheoremThis is in italics. This is normal. And this is italic, again. This whole paragraph is normal. This paragraph is in italics. was generated by
Aside from the kludgy bit about applying styles to spans, I have no idea what your issue is. |
Andrew Stacey 118 posts |
It’s about inheritance. Let me do my example again.
TheoremThis is italic text. This is normal, but this is not.. The link should inherit the |
admin Administator 63 posts |
Ah. I see. I was deceived by your example TheoremThis is italic text. But this is not.
In general, the more specific CSS selector wins, and there was some circumstance where inheritance from |
Andrew Stacey 118 posts |
If you can’t remember, then I’ll experiment with taking it out and see who complains, and about what. |
Andrew Stacey
118 posts
edited 13 years ago |
Urs is complaining about slow times again … Looking at the logs, I see a heck of a log of “Expired fragment”s. Do these add to a page’s rendering time? Or does the cache sweeper act non-synchronously (or whatever it’s called)? For example, in the logs I see 815 lines saying “Expired fragment:” and the request that it seems to be a part of (can’t be totally sure due to several processes running simultaneously) takes about 6s to render. A more systematic search of the logs shows it’s not guaranteed, but often the case that the following request is slow - and the times when it isn’t could be due to it being a parallel process (would need a more sophisticated search). |
admin Administator 63 posts |
The “Expired fragment” log-entries all have times (in ms) attached to them, so you would be in a better position than I to tell how long they took to execute. But, yes, when you ask for new features (like |
Andrew Stacey 118 posts |
Okay, so I just did a crude time count. I do have some lists totalling in the order of seconds. I get one at 20s, three at 10s, and about 30 over 1s. (Usual caveats that I can’t tell that all of these are from the same request. They occur concurrently in the logs.) |