The arXiv in your pocket.
Posted by Robert M.
As pointed out on several other blogs, Joanna Karczmarek has been testing the waters with a downloadable version of the eprint arXiv. For the last few months, anyone with bittorrent installed on their machine has been able to download all of 2004’s papers in one go. But, as of yesterday, the whole shebang is now up for grabs.
You can grab the .torrent file either here or here. If you decide to download the arXiv, please let the bittorrent software continue to run, even after your download has finished. This makes the process faster and easier for everyone. In fact, it would be great if you just let the software run, in the background, for a few weeks. After all, if you’re on a University connection, you probably won’t even notice the 20 kB/s or so of bandwidth it uses.
By the way, the whole thing is only 7.4 Gb. That’s roughly a third of the smallest iPod on the market. So yes, you can carry the arXiv around in your pocket. The fun part, though, is how to index all of this data. It would be so boring if we just used the standard SLAC-type searches that we’re all used to. If you’re a windows user, you might want to check out Google Desktop. In a few weeks, Mac users who upgrade to Tiger can use Spotlight. Linux users have a few option as well, such as Beagle.
Update: As more people have started downloading the arXiv, the download speeds have really picked up. My download finished earlier today. One of the first things I noticed as I looked through the papers is that certain papers seem to be missing. My desktop environment generates small thumbnails of pdf files in place of icons, and I noticed that a few of the papers weren’t pdf files at all, but html with .pdf extensions.
Upon closer inspection, the html content of these files turns out to be the usual blurb that the arXiv offers up when it tries to convert a paper’s source to pdf. Clearly, these are the papers where the scripts failed to generate pdf. For instance, go check out hep-th/9108012 and try to grab a pdf version of that paper. After a few moments, the arXiv will return an error message, stating that it can’t generate a pdf file due to “incomplete or corrupted files”.
Not to worry, though. It seems as if the sources for some of these missing papers will produce valid pdf files with a minimum of fuss. For instance, if you download the source for hep-th/9108012 and pdflatex it, you’ll get a few errors. But you’ll also get a pdf version of the paper.
Posted at April 13, 2005 5:51 AM UTC
Re: The arXiv in your pocket.
There is an interesting race going on between the development of digital memory and that of electronic information bandwidth - and the way this affects our use of both of them.
On the one hand we can rejoice that mass-storage devices are becoming so small and cheap that electronic memory tends to behave as if infinite for practical purposes. So why not download all of arXiv and do all literature research with Google Desktop - offline?
On the other hand internet access is getting so fast and ubiquitous that one can come to the opposite conclusion: Why should I care about downloading archives of anything at all, when I can always access them remotely just as easily?
My personal computer recently suffered a hard disk failure of some sort and a couple of files were lost or damaged. Some of these files have almost existential importance for me. But I relax, since all my files are mirrored on one or two servers that I access remotely.
In fact it is the other way round: On these servers do I have my permanent directories and my personal computer mirrors what I need on a day-to-day basis.
Why should I not feel quite happy with having the arXiv not downloaded on my hard drive - much less on my MP3 player! What can I do with Google Desktop that I cannot do with Google? I am routinely using Google to find my own files on the net.
Of course I can see that it is a cool thing to carry hep-th around on my iPod, with people asking me:
;-)