{"id":126,"date":"2008-04-15T23:53:43","date_gmt":"2008-04-15T13:53:43","guid":{"rendered":"http:\/\/www.erisian.com.au\/wordpress\/?p=126"},"modified":"2008-04-15T23:53:43","modified_gmt":"2008-04-15T13:53:43","slug":"jigdo-downloads","status":"publish","type":"post","link":"https:\/\/www.erisian.com.au\/wordpress\/2008\/04\/15\/jigdo-downloads","title":{"rendered":"Jigdo Downloads"},"content":{"rendered":"<p>Last month we had a <a href=\"http:\/\/lists.debian.org\/debian-devel\/2008\/03\/msg00540.html\">brief discussion<\/a> on debian-devel about what images would be good to have for lenny &#8212; we&#8217;re apparently up to about 30 CDs or 4 DVDs per architecture, which over 12 architectures adds to about 430GB in total. That&#8217;s a lot, given it&#8217;s only one release, and meanwhile the entire Debian archive is only 324GB.<\/p>\n<p>The obvious way to avoid that is to make use of jigdo &#8212; which lets you recreate an iso from a small template and the existing Debian mirror network. I&#8217;ve personally never used jigdo much, half because I don&#8217;t usually use isos anyway, but also because the few times I have tried jigdo it always seemed really unnecessarily slow. So the other day I tried writing my own jigdo download tool focussed on making sure it was as fast as possible.<\/p>\n<p>The official jigdo download tool, ttbomk, is jigdo-lite &#8212; which you give a .jigdo file, and the url of a local mirror. It then downloads the first ten files using wget, and once they&#8217;re all downloaded, it calls jigdo-file to get them merged into the output image. This gets repeated until all the files have been downloaded.<\/p>\n<p>By doing the download in sequence like this, you miss out on using your full network connection in two ways: one during the connection setup latency when starting to download the next package, and also while jigdo-lite stops downloading to run jigdo-file. And if you&#8217;ve got a fast download link, but a slower CPU or disk, you can also find yourself constrained in that you&#8217;re maxing those out while running jigdo-file, but leaving them more or less idle while downloading.<\/p>\n<p>To avoid this, you want to do multiple things at once: most importantly, to be writing data to the image at the same time as you&#8217;re downloading more data. With jigdodl (the name I&#8217;ve given to my little program), I went a little bit overboard, and made it not only do that, but also manage four downloads and the decompression of the raw data from the template. That&#8217;s partly due to not being entirely sure what needed to be done to get a speedy jigdo program, and partly because the <a href=\"http:\/\/azure.humbug.org.au\/~aj\/blog\/2008\/04\/10#2008-04-10-select-and-generators\">communicate<\/a> module I&#8217;d just written to deal with this sort of parallelism making that somewhat natural.<\/p>\n<p>In the end, it works: from wireless over ADSL to my ISP&#8217;s Debian mirror, I get the following output:<\/p>\n<blockquote>\n<pre>\nJigsaw download:\n  Filename: debian-40r3-amd64-CD-1.iso\n  Length:   675477504\n  MD5sum:   d3924cdaceeb6a3706a6e2136e5cfab2\nTotal: 679 s; d\/l: 586 MB at 883 kB\/s; dump: 57 MB at 57 MB\/s          \n\nFinished!\n<\/pre>\n<\/blockquote>\n<p>which is only slightly short of maxing out my downstream bandwidth, taking a total of about 11m20s. Running jigdodl with a closer mirror works pretty well too, though evidently some of my more recent changes weren&#8217;t so great, because I&#8217;ve gone from 9153 kB\/s on a 100 Mbps link down to 7131 kB\/s or lower. The CPU usage also seems a bit high, hovering at between five to ten percent at 900 kB\/s.<\/p>\n<p>For comparison, running jigdo-lite on the same file took 17m41s, which is about 566 kB\/s, with the overhead being about 6m20s. What that means is if I doubled my bandwidth to about 20Mbps, jigdodl would halve its time for the download to about 5m50s, while jigdo-lite would still have about the same non-download overhead, and thus take 12m10, which is still 69% of its original speed. Going from 10Mbps ADSL speed to 100Mbps LAN gets jigdodl down to 1m31s (13% of the time, with optimal being 10%), while jigdo-lite would be expected to still be about 7m51s (43% of its original time).<\/p>\n<p>I suspect the next thing to do is to rewrite the downloading code to use python-curl instead of running curl, and thus downloading multiple files with a single connection, and tweaking the code so that it writes the file in order, rather than updating whichever parts are ready first.<\/p>\n<p>Anyway, debs are <a href=\"http:\/\/azure.humbug.org.au\/~aj\/jigdodl\/\">available<\/a> for anyone who wants to try it out, along with source in the new git source package format.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last month we had a brief discussion on debian-devel about what images would be good to have for lenny &#8212; we&#8217;re apparently up to about 30 CDs or 4 DVDs per architecture, which over 12 architectures adds to about 430GB in total. That&#8217;s a lot, given it&#8217;s only one release, and meanwhile the entire Debian [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"_links":{"self":[{"href":"https:\/\/www.erisian.com.au\/wordpress\/wp-json\/wp\/v2\/posts\/126"}],"collection":[{"href":"https:\/\/www.erisian.com.au\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.erisian.com.au\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.erisian.com.au\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.erisian.com.au\/wordpress\/wp-json\/wp\/v2\/comments?post=126"}],"version-history":[{"count":0,"href":"https:\/\/www.erisian.com.au\/wordpress\/wp-json\/wp\/v2\/posts\/126\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.erisian.com.au\/wordpress\/wp-json\/wp\/v2\/media?parent=126"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.erisian.com.au\/wordpress\/wp-json\/wp\/v2\/categories?post=126"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.erisian.com.au\/wordpress\/wp-json\/wp\/v2\/tags?post=126"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}