Darcs Hacking!

Cripes. This was meant to be a quick followup note about some more quick darcs hacks. So much for that — I’ve had to write an outline for this post for heaven’s sake.

(Side note: if someone wants a new title for their blog, the above’s free of charge!)

So, when last we met, darcs-repo had just come into the world, and we were still choking on the cigar smoke. Following that there were a couple of discussion threads. Interesting mails include this one, so that you just ask for a repository rather than a “branch” of a “project”, and the program works out how that’s stored, or this one (and its followups) about naming a collection of related repositories an “archive”, and changing the name from darcs-repo to darcshive. This one (and followups from December) includes some (applied!) patches to darcs itself to let me get rid of the horrific ssh/scp hacks.

Where does that leave us? Pretty much at the point of moving from a prototype/proof-of-concept darcs-repo to a functional darcshive. It’s been essentially self-hosting from the beginning, but a more challenging task is hosting darcs itself — since it’s likely that darcs excercises most of the interesting features of the darcs repository format.

This turned out a little more tricky than I’d hoped right from the start: it turns out that generating a “complete” patch bundle is a non-trivial affair. I suppose I should’ve guessed when creating the necessary null context (darcs init; darcs changes --context) I got an internal error (Fail: Prelude.head: empty list). That can just be ignored; what’s suckier is that darcs send still takes ages to work against a null context. Eventually I gave up, and just copied the patches and inventories into darcshive, munging as I went.

The inventory munging is easy, you just need to get all the tag inventories together in the right order (and date is a fairly safe assumption for tags, hence you can go by filename), and trim off the Starting with tag: line at the top:

for i in inventories/* inventory; do
  sed -n '1{s/^Starting with tag://;t
};p' $a; done

The patch munging is harder, unfortunately. The first problem is that patches used to use dates like Wed_Dec__8_14:57:25_EST_2004, instead of the modern 20041208145725. The all-numeric version is still needed since it appears in the filename, but the old version is also needed because it gets hashed to work out the filename too. Yay parsing! Fortunately, the timezone is more or less ignored by darcs, and I can get away with completely ignoring it. Considering some patches had a timezone of “Pacific Daylight Time”, I’m very happy that turned out to be the case.

More fun were a couple of other issues in the patches; particularly the fact that sometimes single patch bundles didn’t get stored as a sequence; so the opening and closing curly braces I was relying on weren’t there, and my parser got completely confused. vi fixed that. Similarly annoying was that it seems old versions of darcs ended long comments with “]$” instead of “^]$”. But that’s ambiguous — what happens if the long comment includes a closing square bracket at the end of a line? Apparently darcs deals with that somehow, but I chose not to and reverted to vi again. Fortunately when calculating the hash, darcs ignores the newline characters entirely, so adding and removing them doesn’t cause any problems.

A couple of other tweaks were needed too: darcshive had a bug in calculating the hash if the last line of the description included a closing square bracket followed by a space, which the darcs history triggered. And recent changes to darcs have changed the way patch bundles are produced (and weirdly darcs send and darcs push generate slightly different formats for their patch bundles), so I had to make my parser more liberal as far as blank lines are concerned too.

But at this point, I can store darcs in darcshive, do a darcs get on a darcshive:// url to get it, and it comes out buildable and everything. Sweet. Thanks to the DARCS_MGET_ stuff, it’s even quicker to use darcshive than darcs-over-http.

I had hoped that it might be really quick to have darcshive generate a patch bundle against the null context, so that you could “get” a repository by downloading a single patch bundle, and applying it to a freshly initialised darcs repository. Unfortunately darcs is equally slow at pulling to a null repository as pushing to a null repository, so you have to stick with get anyway. Oh well.

Where does that leave us? Well, the concept seems proven; so that means tidying things up. Having darcshive deal with TAGs better would be nice — there’re some bugs in how “Starting with tag:” is dealt with in patch bundles, and it seems like it would be nice to be able to just point at an archive and say “get me this tag, I don’t care what repository you have to get it from”. It’s probably necessary to rename the files darcshive uses too, at the moment you can’t have a repository with “inventory” in the name, or “patches”, or “commuted-patches”. Having the CGI return some useful human-readable pages for browsing as well as the darcs metadata would be nice too.

Also coming up: darcs, dpkg and debootstrap! Well, eventually…

Oh, for those of you wondering:

darcs get http://www.erisian.com.au/darcs/darcshive/mainline

Leave a Reply