nlab
distler
Moderator
123 posts
edited 13 years ago |
On my machine, each Expired fragment takes 0.1-0.2 ms. So to get 10-20 s, you’re talking about expiring fragments?! Wow! That’s impressive. |
Andrew Stacey 118 posts |
On occasion an expiration takes longer than 0.2ms, and when the number is in the thousands then the likelihood of that happening increases considerably, so is an overestimate. The largest number that I see in my slightly refined test is about 6000. That takes about 3s normally. In this log run, I had one taking 7s with 1400 expirations. What are the rules for which pages need fragments expiring when a page is saved? I’m getting a heck of a lot of expired fragments in the logs. Looking a little further then the above figures are underestimates because they don’t take into account the fact that the logs might be split over several files, or be separated by the logs for other requests. I have one log file consisting of 16876 lines. 15581 of them are ‘Expired fragment’s. There appear to be quite a lot of duplicates as well: In that lot, then |
distler Moderator 123 posts |
That would be a consequence of
The first is further-complicated by the facility for renaming pages. That means we need to expire all the pages that refer to the old page and all the pages that refer to the new page. I guess that could be optimized better for the case where the page doesn’t change names, as we don’t have to expire the same pages twice. I think the current procedure was motivated by complaints (from y’all) that, in some circumstances, pages were not being expired when they should. |
Andrew Stacey 118 posts |
Given the number of links that Urs has on some of the nLab pages then I think this might well be a case for optimisation. If the page name doesn’t change, surely then you don’t have to expire any of the pages that refer to it? So it’s not a “don’t have to expire twice”, it’s a “don’t have to expire once”, isn’t it? Or am I missing something. |
distler Moderator 123 posts |
It’s the number of inbound links that matters, but yeah.
You do, for a newly-created page… but not, I agree, for a revision of an existing page. I was, somewhat crudely, not distinguishing between those cases. It occurs to me that I can use an |
Andrew Stacey 118 posts |
No, no! Thank you. Let’s see if this makes Urs a little happier! |
Andrew Stacey 118 posts |
Back on the CSS thing. I’m going to experiment with taking it out on my course wiki (safer than on the nLab). Since it’s in the main instiki file I guess I have to take it out system-wide (though I could put it back on a per-web basis, I guess). What’s the safest way to do that given that this is a file in the VCS? Should I comment out the line, or delete it? (I want to avoid - as much as possible - breaking things when I do a |
admin Administator 63 posts |
I suppose there is a marginally higher probability that a ‘merge’ will succeed with some lines commented-out, instead of deleted. But I expect that it’s a small effect; hardly worth obsessing-over. |
Andrew Stacey 118 posts |
This isn’t nLab-specific, but it’s neither a bug nor a feature requestion: more of a “How do I?”. There’s an effect that I’d like to put on a page (or a family of pages). It’s achievable in CSS using some fancy pseudo-classes, but some browsers don’t support it (notably mobile browsers) so I was pondering a javascript solution. Essentially, it would just modify some CSS properties of certain elements (selected by class) when a link was clicked upon. The details aren’t particularly important. What I want to know is whether or not there is an easy way to add a bit of javascript to a page. I suppose it could be added to all pages, but then it would be better if it were only all pages in a particular web. Something a bit like the stylesheet tweaks, but for javascript. |
admin Administator 63 posts |
Sorry. There isn’t a way to localize the Javascript. You could, however, add some site-wide Javascript, which attaches an event listener to some element(s), based on the request-URL. |
Andrew Stacey 118 posts |
Okay, I’ll take a look at that. Is it obvious which file to add it to, or should I create a new file and add it to the page template? |
Andrew Stacey 118 posts |
The azimuth project just got a massive spam hit, 317 pages in total. To deal with that, I ended up working on the database level. What I did was to try to simulate “rollbacks”: copy the data from the last decent copy and paste it as a new row in the “revisions” table. That seemed the safest approach. But it did get me thinking about the database and specifically the “revisions” table. Two questions:
|
Andrew Stacey 118 posts |
Just noticed that you got hit by the same spammer. I think that instiki.org also got hit, but then it’s hard to tell with that site anymore. Seems as though this spammer has gone for every instiki installation under the sun! |
admin Administator 63 posts |
The
The history of a page is reconstructed by sorting on the |
Andrew Stacey 118 posts |
Ah, I’d better fix that first one then. I’ll keep the second in mind for next time this happens and choose my dates more precisely. |
Andrew Stacey 118 posts |
Errr … the |
Andrew Stacey 118 posts |
Any thoughts on the following idea? From time to time, the nLab gets a whole host of spiders and other bots crawling all over it. While I understand that they’re part of what makes the internet work, they can be a bit annoying and slow down the server for everyone else. So I thought of channelling requests a little more cleverly than I currently do. At the moment, I use a global queue in passenger which is fine until all the slots get a slow request. So what I thought was to have a semi-global queue with slow requests (like feeds and lists) and bots being handled by a few dedicated processes, normal requests by some others, and maybe a “priority” list as well. Since passenger doesn’t do this itself (it either has global queue or individual queues) I think that what I’d have to do is to have three virtual versions of the nLab, at least as far as apache and passenger are concerned. Then apache would examine the request and classify it according to which type it was and send it to the right version of the nLab. Passenger wouldn’t know that these are the same so would have a global queue for each, and that way requests get segregated and so don’t hold up others in other segments. The way that I’d have three virtual versions is simply with symlinks in the filesystem: “nlab”, “nlabPriority”, and “nlabSlow” would all be symlinks to the same instiki installation. Can you see any immediate problems with that? As far as Instiki is concerned, it’s just like being run under passenger as there will be multiple instances of instiki running concurrently, which is what already happens. So that shouldn’t be affected. Apache, also, eats this sort of thing for breakfast, and passenger can cope with different programs as well. So I don’t see an immediate flaw. (Of course, it may be that this won’t solve the blockage, but it’s less drastic than moving servers which is the other option.) |
distler Moderator 123 posts |
Is it obvious why the spider aren’t just hitting the cache (in which case, they should not slow down the system at all)? Are they asking for all revisions of some page (or whatever), that would entail a large percentage of cache-misses? I ask, just because it seems to me that, if they are operating correctly, spiders shouldn’t lead to an undue slowdown. Maybe I’ve been remiss about
directives. In any case, is it clear that your 3-queue scheme is better than having one queue, with a larger number of worker processes? (I.e., do these spiders insist on making multiple simultaneous connections, or do they access the nlab serially?) |
Andrew Stacey 118 posts |
Okay, so looking through the week’s log for bots (bot, spider, crawler), I get 33,517 hits (actual time period: 11th December 6:25am to 16th December 11:27am, so that’s an average of a little over 4 hits per minute). These break down as follows:
There’s a few that I’ve missed out in between - there are clearly some bad links to the nlab. I’d say that only Next is to analyse how those are distributed. |
Andrew Stacey 118 posts |
The “save”s were all due to one bot and none actually made it to the database. |