Incremental Packages.gz Downloads

One of the more annoying things about managing a Debian system is sitting around waiting for apt-get update to finish. On broadband it’s irritating, and on a modem it’s absolutely horrendous. The worst part is when it takes longer to do an update (get the information we need) than it takes to do a dist-upgrade (work out what to upgrade, and upgrade it).

Basically the problem is that we need to have a current list of available packages to work out what to upgrade to, but that list is absolutely huge, sitting at about 10MB uncompressed, or 2.0MB using bzip2 compression. On the upside we usually already have a list that was current not too long ago, and generally not all that much will have changed in the intervening time. So what we’d like to be able to do is just download the updates.

One way of doing this is using the rsync algorithm. Unfortunately that only works on uncompressed files, and it’s something of a brute force solution. It looks at a file in big chunks, so if you change one line, the entire chunk will be downloaded. That’s okay, and it’s reported to be quite useful, with downloads reduced to about 10% of what they were beforehand.

But we know what changes are likely, so we can optimise for that. Packages files are sorted by stanza (lines of non-blank lines separated by a single blank line), and usually only a few lines in any given stanza will change on an update (the Version: line might change, while the Description: lines don’t, eg). So we want a way of listing just the lines that change. The easy way to do that is with the diff --ed command, which produces a file saying things like:

6,7c
foo
bar
.

The above example says “replace lines 6-7, with two new lines saying foo and bar“. Brief and simple. It can be applied by saying (cat EDPATCH; echo w) | red - FILE. The patch program can also handle it, but has been flakey in the past.

So, how cool is that? Well, this isn’t a remotely new idea, and Robert Tiberius Johnson has done some analyses, back in the day. Basically, using the Packages files which were about two-thirds the size they are now, you can reduce the average apt-get update to be about 13kB per day since your last update. That’s as little as 0.7% what was previously required.

Leave a Reply