## April 13, 2008

### Corruption

I must just be unlucky. Here’s how I managed to waste my afternoon.

I committed an update to the Instiki BZR repository, and then did a

bzr log -v

which yielded an ominous

⋮
KnitCorrupt: Knit <bzrlib.knit._KnitAccess object at 0x24b6170> corrupt:
got IOError(CRC check failed 3032481332 2792320114)

Yikes!

The .bzr directory is a labyrinth of plain text and gzip-compressed files. Evidently one of the latter was corrupted. But which?

find . -name '*.kndx' -print0 \
| xargs -0 grep -l 'distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t' \
| sed -e 's/\.kndx/.knit/' | xargs zcat >/dev/null

failed on the file .bzr/repository/revisions.knit. Replace the whole file from backups? Not ideal, since I’d just made a commit. Further experimentation revealed that

bzr log -v -r1..204

and

bzr log -v -r206..231

worked fine. But anything including revision 205 yielded the above CRC error.

How to fix it?

The first thing to understand is the the .knit file is a concatenation of gzipped files. Looking in .bzr/repository/revisions.kndx, I found the line

distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t fulltext 94495 460 230 :

This says that the gzip file corresponding to the revision in question is 460 bytes long, and starts at offset 94495 from the beginning of the file.

So I split the file into 3 pieces, corresponding to revision 1–204, the troublesome revision 205, and revisions 206-231. Now to find a replacement for the bad piece. If I had been at the office, I would have tried retrieving something from backup tapes. As it was, it was more convenient to poke around in the BZR repository on my home machine. I extracted the corresponding file. Oddly, it was 466 bytes long. Ungzipping it, it looked like

version distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t 11 eead9ea844a899fc58fd9b9d15a2dbe20226549a
<revision committer="Jacques Distler &lt;distler@golem.ph.utexas.edu&gt;" format="5" inventory_sha1="be93437c54bcbb9ff6a3c73ffe9a50a835513ae5" revision_id="distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t" timestamp="1199772095.276" timezone="-21600">
<message>Update to latest HTML5lib, Add Maruku testdir
Sync with the latest html5lib.
Having the Maruku unit tests on-hand may be useful for debugging; so let's include them.
</message>
<parents>
<revision_ref revision_id="distler@golem.ph.utexas.edu-20080103212703-037sbbvkyntk6mqs" />
</parents>
<properties><property name="branch-nick">svn</property>
</properties>
</revision>
end distler@golem.ph.utexas.edu-20080108060135-7ujf0nen62ge328t

which looked just fine. So I cat‘ed the three files together, shifted the offsets of each of the subsequent lines in .bzr/repository/revisions.kndx by 6 bytes and changed the length of this one from “460” to “466”. Then I moved the new revisions.knit and revisions.kndx files into position, and crossed my fingers.

Problem solved!

But, surely, surely there must be a better way.

Posted by distler at April 13, 2008 11:07 PM

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1656

### Re: Corruption

Hm… seems the right thing would be:

bzr check
bzr reconcile http://www.example.com/clean/branch/copy

An alternative would be to create a bundle of the change you just committed, then branch afresh from a copy elsewhere, then commit your bundle.

Should cc this post to bazaar@lists.canonical.com if you want the… ehem… canonical answer.

Posted by: anon on April 14, 2008 12:59 AM | Permalink | Reply to this

### Re: Corruption

bzr check failed with the same CRC error. I don’t know that a bzr reconcile would have succeeded (though I confess that I didn’t try it).

My impression, from googling around, is that BZR just doesn’t tolerate corrupt .knit files. The prevailing attitude seems to be that, if you find yourself saddled with such a file, you must be a bad person, and no, we won’t help you fix it.

Posted by: Jacques Distler on April 14, 2008 1:11 AM | Permalink | PGP Sig | Reply to this

### Re: Corruption

Well, bzr check should fail, was the point (though, as in, give the error, not spew a traceback).

I think you are right in saying there is less emphasis in the bzr project on stable, error resistant, recoverable repository formats.

This makes sense however, as unlike with the central-repo model (where if it gets screwed up no one can do any work), everyone has their own copy of everything. So, worst case is you lose your recent divergence. Even then you should still have your working tree, so have your actual work and can commit it to a new copy.

Or you can spend quite a bit of time recovering the repo by hand and writing an interesting post about the process, I’m happy both ways.

Posted by: anon on April 14, 2008 1:22 AM | Permalink | Reply to this

### Re: Corruption

Well, bzr check should fail, was the point (though, as in, give the error, not spew a traceback).

It failed, as in “spew[ed] a traceback”.

I think you are right in saying there is less emphasis in the bzr project on stable, error resistant, recoverable repository formats.

This makes sense however, as unlike with the central-repo model (where if it gets screwed up no one can do any work), everyone has their own copy of everything. So, worst case is you lose your recent divergence. Even then you should still have your working tree, so have your actual work and can commit it to a new copy.

I believe this is the SCM equivalent of an application vendor saying, “You should have backed up your data.”

Yes, multiple redundant code repositories means that, if any one of them goes belly-up, the data can be recovered from one of the others. But, no, that’s no excuse for failing to provide “stable, error resistant, recoverable repository formats.”

To be fair, in this case, any BZR operation, which did not involve accessing Revision 205, continued to work fine.

But it would not be a terrible idea to build in some tools to automate what I spent the afternoon doing:

1. Isolate the corrupt files within the .knits.
2. Replace them with clean copies from some other source (e.g., from another branch or from a backup).
Posted by: Jacques Distler on April 14, 2008 1:57 AM | Permalink | PGP Sig | Reply to this

### Re: Corruption

I believe this is the SCM equivalent of an application vendor saying, “You should have backed up your data.”

Okay, did come off as a bit of a free-software-whine - it wasn’t the intention. Was trying to make the point that for a DVCS developer, time spent on making repo propogation as trivial as possible gets a bunch of integrity insurance for free.

But it would not be a terrible idea to build in some tools to automate what I spent the afternoon doing:

1. Isolate the corrupt files within the .knits.
2. Replace them with clean copies from some other source (e.g., from another branch or from a backup).

I believe this is the intention of bzr reconcile even if not the current effect.

On the wider topic of the frustrations of unexpectedly losing half a day to computer issues, yes it’s a pain. I think the first two slightly complex things I tried to do with bzr lead to me forgetting what I was trying to do in the first place and trying to patch problems instead.
There are I’m sure, people for whom having their schedule thrown is a complete disaster. Thankfully I’m not one of them, as even though it annoys me every time something falls over, most of the fun and useful software in the world just isn’t fully baked.

Posted by: anon on April 14, 2008 4:15 AM | Permalink | Reply to this

### Re: Corruption

Meanwhile, git’s repository format is explicitly engineered for reliability in several different ways.

Posted by: Aristotle Pagaltzis on April 14, 2008 4:08 AM | Permalink | Reply to this

### Re: Corruption

Yeay internet, turns any problem-anecdote into a holy war.

- “I was using Program X and had a problem!”
- “Use Program Y!”

For balance, git gives you a selection of ways to screw up public-facing repos, and if you do, the response will very much be You Shouldn’t Have Done That.

In the event one of the contenders “wins” the DVCS war, which, based on past holy wars seems unlikely, it’ll be on the back of social factors not technical merit, and certainly not repo format.

Posted by: anon on April 14, 2008 4:37 AM | Permalink | Reply to this

### Holy Wars

I’m not really interested in any holy wars. What I did do, after writing this, was a

bzr check

(no further problems were found) and then a

bzr upgrade

The striking thing about BZR is that it has more repository formats than you can shake a stick at. By upgrading to the latest (default) format, I hope I am doing my part to ease the task of the developers in creating some recovery tools.

Posted by: Jacques Distler on April 14, 2008 8:37 AM | Permalink | PGP Sig | Reply to this

### Re: Corruption

I would agree if the contest was between any of the other DVCSs, but git is interesting in many ways other than as an SCM (which is a facility that was bolted on as an afterthought, and hence was once very hard to use and today still lags in usability), and ultimately they do all revolve around the repo format.

The social factor here is actually the Subversion-is-enough faction, and I think ultimately the majority choice in the DVCS space will come down to git vs whatever the reluctant Subversion defectors feel least uncomfortable with. Mercurial seems to be the winning contender for that role at this time. (The fact that git has by far the smoothest Subversion integration will probably factor into the competition in some way, but I am not sure whether that will be strengthening or weakening Subversion and how it will affect the relative positions of git and whichever the other DVCS turns out to be.)

Posted by: Aristotle Pagaltzis on April 15, 2008 12:37 AM | Permalink | Reply to this

Post a New Comment