Corruption
I must just be unlucky. Here’s how I managed to waste my afternoon.
I committed an update to the Instiki BZR repository, and then did a
bzr log -v
which yielded an ominous
⋮ KnitCorrupt: Knit <bzrlib.knit._KnitAccess object at 0x24b6170> corrupt: While reading {distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t} got IOError(CRC check failed 3032481332 2792320114)
Yikes!
The .bzr
directory is a labyrinth of plain text and gzip-compressed files. Evidently one of the latter was corrupted. But which?
find . -name '*.kndx' -print0 \ | xargs -0 grep -l 'distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t' \ | sed -e 's/\.kndx/.knit/' | xargs zcat >/dev/null
failed on the file .bzr/repository/revisions.knit
. Replace the whole file from backups? Not ideal, since I’d just made a commit. Further experimentation revealed that
bzr log -v -r1..204
and
bzr log -v -r206..231
worked fine. But anything including revision 205 yielded the above CRC error.
How to fix it?
The first thing to understand is the the .knit
file is a concatenation of gzipped files. Looking in .bzr/repository/revisions.kndx
, I found the line
distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t fulltext 94495 460 230 :
This says that the gzip file corresponding to the revision in question is 460 bytes long, and starts at offset 94495 from the beginning of the file.
So I split the file into 3 pieces, corresponding to revision 1–204, the troublesome revision 205, and revisions 206-231. Now to find a replacement for the bad piece. If I had been at the office, I would have tried retrieving something from backup tapes. As it was, it was more convenient to poke around in the BZR repository on my home machine. I extracted the corresponding file. Oddly, it was 466 bytes long. Ungzipping it, it looked like
version distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t 11 eead9ea844a899fc58fd9b9d15a2dbe20226549a <revision committer="Jacques Distler <distler@golem.ph.utexas.edu>" format="5" inventory_sha1="be93437c54bcbb9ff6a3c73ffe9a50a835513ae5" revision_id="distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t" timestamp="1199772095.276" timezone="-21600"> <message>Update to latest HTML5lib, Add Maruku testdir Sync with the latest html5lib. Having the Maruku unit tests on-hand may be useful for debugging; so let's include them. </message> <parents> <revision_ref revision_id="distler@golem.ph.utexas.edu-20080103212703-037sbbvkyntk6mqs" /> </parents> <properties><property name="branch-nick">svn</property> </properties> </revision> end distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t
which looked just fine. So I cat
‘ed the three files together, shifted the offsets of each of the subsequent lines in .bzr/repository/revisions.kndx
by 6 bytes and changed the length of this one from “460
” to “466
”. Then I moved the new revisions.knit
and revisions.kndx
files into position, and crossed my fingers.
Problem solved!
But, surely, surely there must be a better way.
Re: Corruption
Hm… seems the right thing would be:
bzr check
bzr reconcile http://www.example.com/clean/branch/copy
An alternative would be to create a bundle of the change you just committed, then branch afresh from a copy elsewhere, then commit your bundle.
Should cc this post to bazaar@lists.canonical.com if you want the… ehem… canonical answer.