Hacking dak

Who can resist a good rhyme? Or a bad one?

So this round of dak hacking turned out to make the AJ Market scheme another notch more confusing — hence the delay in blogging, and the teaser in my last post. The issue leading to the confusion is that the major item on the list to start hacking on was SCC, which unlike the projects I’ve undertaken so far, is more than a day’s hacking. In fact, due to the need to give mirrors a chance to adapt to the new system between it being implemented and actually used, it’s actually more of a multi-week task. And doing it as a one-day-a-week project would extend that into a multi-month task afaics. I guess that’s still better than never, but obviously it’s worth looking into alternatives.

Naturally, then, the first phase of a longer project like this is threefold. We’ll call it “the three P’s”.

Planning

My theory at this point was to come up with a plan for what to do, try figuring out how much work it’d take, and then see what sort of financial arrangements might be plausible — not involving me cutting a few weeks out of my life for spare change, but without making the whole thing an unleapable chasm from what the AJ Market’s currently managing either. I figured writing it up as a semi-formal proposal makes most sense:

Summary

Implementation of the outstanding mirror split proposal for the Debian archive to allow new architectures, particularly amd64, to be included in the archive.

Benefit

In spite (or perhaps because) of its simplicity, this project has been languishing for over two years, and is not currently being worked on; so at present it’s not even possible to estimate when it would otherwise be completed.

It is most notably preventing amd64 from being integrated into the normal Debian development environment, causing derived distributions to maintain amd64 specific patches themselves.

In the longer term, reducing the constraints imposed on the archive size may allow the introduction of additional suites, such as backports or volatile, as well as additional architectures; though significant further discussion on this would be needed.

Background

Since at least mid-2003 the Debian archive has been closed to new architectures due to the already large amount of space and bandwidth required to become a Debian mirror. At present, the archive uses some 158GiB of disk, and about 1GiB per day; additional architectures are expected to require approximately an additional 10GiB each, and there are likely around half a dozen architectures that will be considered for addition once the moratorium on new architectures is rescinded (incl amd64, armeb, sh variants, kfreebsd and possibly partial architectures for arch variants such as s390x and ppc64).

The primary work needed to fix this involves:

  • ensuring the mirror network operates correctly when a majority of mirrors are partial; this reduces the impact on bandwidth and storage capacity
  • optimising portions of the archive maintenance software, particularly apt-ftparchive; this reduces the load on the archive server
  • providing appropriate guidelines on the qualification criteria new architectures need to meet in order to be added to the archive; this provides a limit on future increases, allowing growth to be appropriately controlled

Actual work

I expect there will be six phases to the project:

  1. cleanup of the archive as it stands, and establishing a clear categorisation of its contents to define what a partial mirror by architecture or suite should officially contain
  2. providing appropriate scripts to ensure mirror sites can easily comply with the previously defined expectations for partial mirroring
  3. devise an appropriate structure for the new mirror network, that can easily incorporate existing mirrors, and coexist with the existing structure
  4. provide information on the new structure to both mirror admins and users; assist with the transition, and resolve any problems found
  5. ensure the archive management software is appropriately optimised, and that archive inclusion criteria have been debated and established
  6. add new ports that have passed the qualification requirements to the archive

In theory, a couple of days for each of those sound plausible, so making that twelve days actual work (with a couple of week’s delay in between for mirrors to have some time to adapt to the new network). On the downside, twelve days at a day a week is over three months of real time, not counting the possibility of doing other things with the one day a week, or Christmas, or the aforementioned delay for mirrors. Yick.

So much for planning.

Preparation

So the next “p” is preparation. In this case that’s finally getting around to fix dak CVS, which has been slightly broken since May. The extent of the actual breakage was just the loss of the ChangeLog history, aiui (or at least, that was the unrecovered breakage), but the result of that was months of uncommitted changes on both ftp-master and security (and reportedly from Ubuntu’s dak installation too). The changelog for the first set of commits (not counting buildd changes from ftp-master, security changes or Ubuntu changes that haven’t made it to ftp-master) looks like:

        * tiffani: new script to do patches to Packages, Sources and Contents
        files for quicker downloads.
        * ziyi: update to authenticate tiffani generated files

        * dak: new script to provide a single binary with less arbitrary names
        for access to dak functionality.

        * cindy: script implemented

        * saffron: cope with suites that don't have a Priority specified
        * heidi: use get_suite_id()
        * denise: don't hardcode stable and unstable, or limit udebs to unstable
        * denise: remove override munging for testing (now done by cindy)
        * helena: expanded help, added new, sort and age options, and fancy headers
        * jennifer: require description, add a reject for missing dsc file
        * jennifer: change lock file
        * kelly: propogation support
        * lisa: honour accepted lock, use mtime not ctime, add override type_id
        * madison: don't say "dep-retry"
        * melanie: bug fix in output (missing %)
        * natalie: cope with maintainer_override == None; add type_id for overrides
        * nina: use mtime, not ctime

        * katie.py: propogation bug fixes
        * logging.py: add debugging support, use | as the logfile separator

        * katie.conf: updated signing key (4F368D5D)
        * katie.conf: changed lockfile to dinstall.lock
        * katie.conf: added Lisa::AcceptedLockFile, Dir::Lock
        * katie.conf: added tiffani, cindy support
        * katie.conf: updated to match 3.0r6 release
        * katie.conf: updated to match sarge's release

        * apt.conf: update for sarge's release
        * apt.conf.stable: update for sarge's release
        * apt.conf: bump daily max Contents change to 25MB from 12MB

        * cron.daily: add accepted lock and invoke cindy  
        * cron.daily: add daily.lock
        * cron.daily: invoke tiffani
        * cron.daily: rebuild accepted buildd stuff
        * cron.daily: save rene-daily output on the web site
        * cron.daily: disable billie
        * cron.daily: add stats pr0n

        * cron.hourly: invoke helena

        * pseudo-packages.maintainers,.descriptions: miscellaneous updates
        * vars: add lockdir, add etch to copyoverrides
        * Makefile: add -Ipostgresql/server to CXXFLAGS

        * docs/: added README.quotes
        * docs/: added manpages for alicia, catherine, charisma, cindy, heidi,
        julia, katie, kelly, lisa, madison, melanie, natalie, rhona.

        * TODO: correct spelling of "conflicts"

Ugh. Still, that’s enough to start work.

And the final “P”? Come on, be honest with yourself, you know what it’s going to be.

Procrastination

Okay, that’s not entirely fair; the irrelevant bit of work was actually on to TODO list before SCC (mostly because it was something that I could get done reasonably quickly) and in fact was this line of the above changelog:

        * dak: new script to provide a single binary with less arbitrary names
        for access to dak functionality.

All the various model/actress names have been getting more than a little confusing recently, with almost forty in the dak suite, and another two dozen or so in use elsewhere — and then there’s the fact that the whole hot babes thing is both a bit offensive, and getting a bit old. OTOH, you need something to rename them to. We ended up deciding on the “version control solution” and introducing a “dak” command that’d launch all the different little bits of functionality depending on arguments, in the same way cvs, svn, tla, bzr, darcs etc do.

The implementation’s kinda neat: we have a list of commands (like “ls”) and their description (“Show which suites packages are in”), along with the python module and function they’re in (“madison”, “main()”). That let’s us not actually have to change any of the other scripts immediately, and lets “dak ls foo” work the same as “madison foo”. It also means that down the track we don’t need to have separate modules for each subcommand, and that we can rename modules and functions without affecting the user interface.

Of course it also means that all the internal scripts haven’t changed to use the new names yet, leaving the new interface a bit underused, but hey, “dak ls” at least manages to be one character shorter than “madison”, so that’s a win!

To be continued…

Leave a Reply