BeanBag — Easy access to REST APIs in Python

I’ve been doing a bit of playing around with REST APIs lately, both at work and for my own amusement. One of the things that was frustrating me a bit was that actually accessing the APIs was pretty baroque — you’d have to construct urls manually with string operations, manually encode any URL parameters or POST data, then pass that to a requests call with params to specify auth and SSL validation options and possibly cookies, and then parse whatever response you get to work out if there’s an error and to get at any data. Not a great look, especially compared to XML-RPC support in python, which is what REST APIs are meant to obsolete. Compare, eg:

server = xmlrpclib.Server("http://foo/XML-RPC")
print server.some.function(1,2,3,{"foo": "bar"})

with:

base_url = "https://api.github.com/"
resp = requests.get(base_url + "/repos/django/django")
if resp.ok:
    res = resp.json()
else:
    raise Exception(r.json())

That’s not to say the python was is bad or anything — it’s certainly easier than trying to do it in shell, or with urllib2 or whatever. But I like using python because it makes the difference between pseudocode and real code small, and in this case, the xmlrpc approach is much closer to the pseudocode I’d write than the requests code.

So I had a look around to see if there were any nice libraries to make REST API access easy from the client side. Ended up getting kind of distracted by reading through various arguments that the sorts of things generally called REST APIs aren’t actually “REST” at all according to the original definition of the term, which was to describe the architecture of the web as a whole. One article that gives a reasonably brief overview is this take on REST maturity levels. Otherwise doing a search for the ridiculous acronym “HATEOAS” probably works. I did some stream-of-consciousness posts on Google-Plus as well, see here, here and here.

The end result was I wrote something myself, which I called beanbag. I even got to do it mostly on work time and release it under the GPL. I think it’s pretty cool:

github = beanbag.BeanBag("https://api.github.com")
x = github.repos.django.django()
print x["name"]

As per the README in the source, you can throw in a session object to do various sorts of authentication, including Kerberos and OAuth 1.0a. I’ve tried it with github, twitter, and xero’s public APIs with decent success. It also seems to work with Magento and some of Red Hat’s internal tools without any hassle.

Parental Leave

Two posts in one month! Woah!

A couple of weeks ago there was a flurry of stuff about the Liberal party’s Parental Leave policy (viz: 26 weeks at 100% of your wage, paid out of the general tax pool rather than by your employer, up to $150k), mostly due to a coalition backbencher coming out against it in the press (I’m sorry, I mean, due to “an internal revolt”, against a policy “detested by many in the Coalition”). Anyway, I haven’t had much cause to give it any thought beforehand — it’s been a policy since the 2010 election I think; but it seems like it might have some interesting consequences, beyond just being more money to a particular interest group.

In particular, one of the things that doesn’t seem to me to get enough play in the whole “women are underpaid” part of the ongoing feminist, women-in-the-workforce revolution, is how much both the physical demands of pregnancy and being a primary caregiver justifiably diminish the contributions someone can make in a career. That shouldn’t count just the direct factors (being physically unable to work for a few weeks around birth, and taking a year or five off from working to take care of one or more toddlers, eg), but the less direct ones like being less able to commit to being available for multi-year projects or similar. There’s also probably some impact from the cross-over between training for your career and the best years to get pregnant — if you’re not going to get pregnant, you just finish school, start working, get more experience, and get paid more in accordance with your skills and experience (in theory, etc). If you are going to get pregnant, you finish school, start working, get some experience, drop out of the workforce, watch your skills/experience become out of date, then have to work out how to start again, at a correspondingly lower wage — or just choose a relatively low skill industry in the first place, and accept the lower pay that goes along with that.

I don’t think either the baby bonus or the current Australian parental leave scheme has any affect on that, but I wonder if the Liberal’s Parental Leave scheme might.

There’s three directions in which it might make a difference, I think.

One is for women going back to work. Currently, unless your employer is more generous, you have a baby, take 16 weeks of maternity leave, and get given the minimum wage by the government. If that turns out to work for you, it’s a relatively easy decision to decide to continue being a stay at home mum, and drop out of the workforce for a while: all you lose is the minimum wage, so it’s not a much further step down. On the other hand, after spending half a year at your full wage, taking care of your new child full-time, it seems a much easier decision to go back to work than to be a full-time mum; otherwise you’ll have to deal with a potentially much lower family income at a time when you really could choose to go back to work. Of course, it might work out that daycare is too expensive, or that the cut in income is worth the benefits of a stay at home mum, but I’d expect to see a notable pickup in new mothers returning to the workforce around six months after giving birth anyway. That in turn ought to keep women’s skills more current, and correspondingly lift wages.

Another is for employers dealing with hiring women who might end up having kids. Dealing with the prospect of a likely six-month unpaid sabbatical seems a lot easier than dealing with a valued employee quitting the workforce entirely on its own, but it seems to me like having, essentially, nationally guaranteed salary insurance in the event of pregnancy would make it workable for the employee to simply quit, and just look for a new job in six month’s time. And dealing with the prospect of an employee quitting seems like something employers should expect to have to deal with whoever they hire anyway. Women in their 20s and 30s would still have the disadvantage that they’d be more likely to “quit” or “take a sabbatical” than men of the same age and skillset, but I’m not actually sure it would be much more likely in that age bracket. So I think there’s a good chance there’d be a notable improvement here too, perhaps even to the point of practical equality.

Finally, and possibly most interestingly, there’s the impact on women’s expectations themselves. One is that if you expect to be a mum “real soon now”, you might not be pushing too hard on your career, on the basis that you’re about to give it up (even if only temporarily) anyway. So, not worrying about pushing for pay rises, not looking for a better job, etc. It might turn out to be a mistake, if you end up not finding the right guy, or not being able to get pregnant, or something else, but it’s not a bad decision if you meet your expectations: all that effort on your career for just a few weeks pay off and then you’re on minimum wage and staying home all day. But with a payment based on your salary, the effort put into your career at least gives you six month’s worth of return during motherhood, so it becomes at least a plausible investment whether or not you actually become a mum “real soon now” or not.

According to the 2010 tax return stats I used for my previous post, the gender gap is pretty significant: there’s almost 20% less women working (4 million versus 5 million), and the average working woman’s income is more than 25% less than the average working man’s ($52,600 versus $71,500). I’m sure there are better ways to do the statistics, etc, but just on those figures, if the female portion of the workforce was as skilled and valued as the male portion, you’d get a $77 billion dollar increase in GDP — if you take 34% as the proportion of that that the government takes, it would be a $26 billion improvement to the budget bottom line. That, of course, assumes that women would end up no more or less likely to work part time jobs than men currently are; that seems unlikely to me — I suspect the best that you’d get is that fathers would become more likely to work part-time and mothers less likely, until they hit about the same level. But that would result in a lower increase in GDP. Based on the above arguments, there would be increase the number of women in the workforce as well, though that would get into confusing tradeoffs pretty quickly — how many families would decide that a working mum and stay at home dad made more sense than a stay at home mum and working dad, or a two income family; how many jobs would be daycare jobs (counted as GDP) in place of formerly stay at home mums (not counted as GDP, despite providing similar value, but not taxed either), etc.

I’m somewhat surprised I haven’t seen any support for the coalition’s plans along these lines anywhere. Not entirely surprised, because it’s the sort of argument that you’d make from the left — either as a feminist, anti-traditional-values, anti-stay-at-home-mum plot for a new progressive genderblind society; or from a pure technocratic economic point-of-view; and I don’t think I’ve yet seen anyone with lefty views say anything that might be construed as being supportive of Tony Abbott… But I would’ve thought someone on the right Bolt or Albrechtsen or Australia’s leading libertarian and centre-right blog or the Liberal party’s policy paper might have canvassed some of the other possible pros to the idea rather than just worrying about the benefits to the recipients, and how it gets paid for. In particular, the argument for any sort of payment like this shouldn’t be about whether it’s needed/wanted by the recipient, but how it benefits the rest of society. Anyway.

Messing with taxes

It’s been too long since I did an economics blog post…

Way back when, I wrote fairly approvingly of the recommendations to simplify the income tax system. The idea being to more or less keep charging everyone the same tax rate, but to simplify the formulae from five different tax rates, a medicare levy, and a gradually disappearing low-income tax offset, to just a couple of different rates (one kicking in at $25k pa, one at $180k pa). The government’s adopted that in a half-hearted way — raising the tax free threshold to $18,200 instead of $25,000, and reducing but not eliminating the low-income tax offset. There’s still the medicare levy with its weird phase-in procedure, and there’s still four different tax rates. And there’s still all sorts of other deductions etc to keep people busy around tax time.

Anyway, yesterday I finally found out that the ATO publishes some statistics on how many people there are at various taxable income levels — table 9 of the 2009-2010 stats are the best I found, anyway. With that information, you can actually mess around with the tax rules and see what affect it actually has on government revenue. Or at least what it would have if there’d been no growth since 2010.

Anyway, by my calculations, the 2011-2012 tax rates would have resulted in about $120.7B of revenue for the government, which roughly matches what they actually reported receiving in that table ($120.3B). I think putting the $400M difference (or about $50 per taxpayer) down to deductions for dependants and similar that I’ve ignored seems reasonable enough. So going from there, if you followed the Henry review’s recommendations, dropping the Medicare levy (but not the Medicare surcharge) and low income tax offset, the government would end up with $117.41B instead, so about $3.3B less. The actual changes between 2011-2012 and 2012-2013 (reducing the LITO and upping the tax free threshold) result in $118.26B, which would have only been $2.4B less. Given there’s apparently a $12B fudge-factor between prediction and reality anyway, it seems a shame they didn’t follow the full recommendations and actually make things simpler.

Anyway, upping the Medicare levy by 0.5% seems to be the latest idea. By my count doing that and keeping the 2012-2013 rates otherwise the same would result in $120.90B, ie back to the same revenue as the 2011-2012 rates (though biased a little more to folks on taxable incomes of $30k plus, I think).

Personally, I think that’s a bit nuts — the medicare levy should just be incorporated into the overall tax rates and otherwise dropped, IMO, not tweaked around. And it’s not actually very hard to come up with a variation on the Henry review’s rates that both simplify tax levels, don’t increase tax on any individual by very much, and do increase revenue. My proposal would be: drop the medicare levy and low income tax offset entirely (though not the medicare levy surcharge or the deductions for dependants etc), and set the rates as: 35% for earnings above $25k, 40% for earnings above $80k, and 46.5% for earnings above $180k. That would provide government revenue of $120.92B, which is close enough to the same as the NDIS levy. It would cap the additional tax any individual pays to $2000 compared to 2012-13 rates, ie it wouldn’t increase the top marginal rate. It would decrease the tax burden on people with taxable incomes below $33,000 pa — the biggest winners would be people earning $25,000 who’d go from paying $1200 tax per year to nothing. The “middle class” earners between $35,000 and $80,000 would pay an extra $400-$500; higher income earners between $80,000 and $180,000 get hit between $500 and $2000, and anyone above $180,000 pays an extra $2,000. Everyone earning over about $34,000 but under about $400,000 would pay more tax than if the NDIS were just increased, everyone earning between $18,000 and $34,000 would be better off.

On a dollar basis, the comparison looks like:

Translating that to a percentage of income, it looks like:

Not pleasant, I’m sure, on the dual-$80k families in western Sydney who are just-scraping buy and all, but I don’t think it’s crazy unreasonable.

But the real win comes when you look at the marginal rates:

Rather than seven levels of marginal rate, there’s just three; and none of them are regressive — ie, you stop having cases like someone earning $30,000 paying 21% on their additional dollar of income versus someone earning $22,000 paying 29% on their extra dollar. At least philosophically and theoretically that’s a good thing. In practice, I’m not sure how much of a difference it makes:

There’s spikes at both the $80,000 and $35,000 points which involve 8% and 15% increases in the nominal tax rates respectively, which I think is mostly due to people transferring passive income to family members who either don’t work or have lower paid jobs — if you earn a $90,000 salary better to assign the $20,000 rental income from your units to your kid at uni and pay 15% or 30% tax on it, than assign it to yourself and pay 38%, especially if you then just deposit it back into your family trust fund either way. The more interesting spikes are those around the $20,000 and $30,000 points, but I don’t really understand why those are shaped the way they are, and couldn’t really be sure that they’d smooth out given a simpler set of marginal rates.

Anyway, I kind-of thought it was interesting that it’s not actually very hard to come up with a dramatically simpler set of tax rates, that’s both not overly punitive and gives about the same additional revenue as the mooted levy increase.

(As a postscript, I found it particularly interesting to see just how hard it is to get meaningful revenue increases by tweaking the top tax rate; there’s only about $33B being taxed at that rate, so you have to bump the rate by 3% to get a bump of $1B in overall revenue, which is either pretty punitive or pretty generous before you’re making any useful difference at all. I chose to leave it matching the current income level and rate; longer term, I think making the levels something like $25k, $100k, $200k, and getting the percentages to rounder figures (35%, 40%, 45%?) would probably be sensible. If I really had my druthers, I’d rather see the rate be 35% from $0-$100,000, and have the government distribute, say, $350 per fortnight to everyone — if you earn over $26k, that’s just the equivalent of a tax free threshold of $26k; if you don’t, it’s a helpful welfare payment, whether you’re a student, disabled, temporarily unemployed, retired, or something else like just wanting to take some time off to do charity work or build a business)

On Employment

Okay, so it turns out an interesting, demanding, and rewarding job isn’t as compatible as I’d naively hoped with all the cool things I’d like to be doing as hobbies (like, you know, blogging more than once a year, or anything substantial at all…) Thinking it’s time to see if there’s any truth in the whole fitness fanatic thing of regular exercise helping…

Bits

Yikes. Been a while. I can’t think of anything intelligent to blog, so some linky tidbits instead:

  • rpm 4.10 includes “~” as a special versioning character, just like dpkg has for ages. Holds a special place in my heart since I did the original patch for dpkg a bit over 11 years ago now. (And looking at that bug history now, it appears it was accepted for my birthday a year later, awww). “ls” from coreutils also supports it (they borrowed the code from dpkg, based on a copyright disclaimer request I’ve finally gotten around to replying to), though I don’t think it’s actually documented.

  • Read a couple of interesting takes on the Facebook IPO: one from a “blue-collar hedge fund manager” (yeah, riiight), who blamed NASDAQ for not handling trades properly on opening day, then essentially forcing traders to sell their stock immediately to be compensated for trades not executed properly; and one from Nanex via ZeroHedge with an animation showing that NASDAQ was not actually making a market in Facebook stock for an extended period (claiming offers to buy for $43 and offers to sell for $42.99, but not executing them and clearing them out), screwing up the other exchanges they connect to and the high-frequency algorithmic traders that use them. To me, Google’s reverse-auction IPO that tried to ensure there wasn’t a day-one stock price “pop” just seems better all the time…

  • Like I needed more reasons not to be impressed by US politics.

  • SpaceX has been doing a pretty amazing demo: correctly handled launch abort, quick turn around on fix; successful launch; successful delivery of payload to the ISS. Get it back down again, and they’ve really got something. Also impressive: per Wikipedia at least, “As of May 2012, SpaceX has operated on total funding of approximately one billion dollars in its first ten years of operation”, 80% of which has come from payments by customers (“progress payments on long-term launch contracts and development contracts”). That is, about the same amount as what Facebook paid for Instagram and its 13 employees…

  • Via Andrew Pollock’s g+ feed, Leap Motion looks nifty.

  • I rode down to linux.conf.au in Ballarat this year — about 5000 kilometres of awesomeness. Here’s a photo from the way I took back, that I call “Hay, Hell and Booligal”:

Owning the new now

Things are different now. That’s certain.

Or at least that’s what one of the marketing sites for my new employer has to say.

Back in March I started at Red Hat’s Brisbane office working in release engineering (or the “Release Configuration Management” team). Short summary: it’s been pretty fun so far.

Googling just now for something to link that provides some sort of context, I came upon a video with my boss (John Flanagan) and one of my colleagues (Jesse Keating) — neither of whom I’ve actually met yet — giving a talk to the ACM chapter at Northeastern University in Boston. (It’s an hour long, and doesn’t expect much background knowledge of Linux; but doesn’t go into anything in any great depth either)

My aim in deciding to go job hunting late last year was to get a large change of scenery and get to work with people who understood what I was doing — it eventually gets a bit old being a black box where computer problems go in, solutions come out, and you can only explain what happens in between with loose analogies before seeing eyes glaze over. Corporate environment, Fedora laptop, enterprise customers, and a zillion internal tools that are at least new to me, certainly counts as a pretty neat change of scenery; and I think I’ve now got about five layers of technical people between me and anyone who doesn’t have enough technical background to understand what I do on the customer side. Also, money appears in my bank account every couple of weeks, without having to send anyone a bill! It’s like magic!

The hiring process was a bit odd — mostly, I gather, because while I applied for an advertised position, the one I ended up getting was something that had been wanted for a while, but hadn’t actually had a request open. So I did a bunch of interviews for the job I was applying for, then got redirected to the other position, and did a few interviews for that without either me or the interviewers having a terribly clear idea what the position would involve. (I guess it didn’t really help that my first interview, which was to be with my boss’s boss, got rearranged because he couldn’t make it in due to water over the roads, and then Brisbane flooded; that the whole point of the position is that they didn’t have anyone working in that role closer than the Czech Republic is probably also a factor…)

As it’s turned out, that’s been a pretty accurate reflection of the role: I’ve mostly been setting my own priorities, which mostly means balancing between teaching myself how things work, helping out the rest of my team, and working with the bits of Red Hat that are local, or at least operate in compatible timezones. Happily, that seems to be working out fairly okay. (And at least the way I’ve been doing it isn’t much different to doing open source in general: “gah, this program is doing something odd. okay, find the source, see what it’s doing and why, and either (a) do something different to get what you want, or (b) fix the code. oh, and also, you now understand that program”)

As it turned out, that leads into the main culture shock I had on arriving: what most surprised me was actually the lack of differences compared to being involved in Debian — which admittedly might have been helped by a certain article hitting LWN just in time for my first day. “Ah, so that list is the equivalent of debian-devel. Good to know.” There’s a decent number of names that pop up that are familiar from Debian too, which is nice. Other comfortingly familiar first day activities were subscribing to more specific mailing lists, joining various IRC channels, getting my accounts setup and setting up my laptop. (Fedora was suggested, “not Debian” was recommended ;)

Not that everything’s the same — there’s rpm/yum versus dpkg/apt obviously, and there’s a whole morass of things to worry about working for a public company. But a lot of it fits into either “different to Debian, but not very” and “well, duh, Red Hat’s a for-profit, you have to do something like this, and that’s not a bad way of doing it”.

Hmm, not sure what else I can really talk about without at least running it by someone else to make sure it’s okay to talk about in public. I think there’s only a couple of things I’ve done so far that have gone via Fedora and are thus easy — the first was a quick python script to make publishing fedora torrents easier, and the other was a quick patch to the fedora buildsystem software to help support analytics. Not especially thrilling, though. I think Dennis is planning on throwing me into more Fedora stuff fairly soon, so hopefully that might change.

Pro-Linux bias at linux.conf.au

Reading through some of the comments from last year’s Linux Australia Survey, a couple struck me as interesting. One’s on Java:

linux.conf.au seems to have a bias against Java. Since Java is an open source language and has a massive open source infrastructure, this has not made a lot of sense to me. It seems that Python, Perl, PHP, Ruby are somehow superior from an open source perspective even though they are a couple of orders of magnitude less popular than Java in the real world. This bias has not changed since openjdk and I’m guessing is in the DNA of the selectors and committee members. Hence *LUG* has lost a lot of appeal to me and my team. It would be good if there was an inclusive open source conference out there…

and the other’s more general:

I appreciate LCA’s advocacy of open source, but I feel that a decoupling needs to be made in the mindshare between the terms “open source” and “Linux”. Unfortunately, for people involved in open source operating systems that aren’t Linux, we may feel slightly disenfranchised by what appears to be a hijacking of the term “open source” (as in “the only open source OS is linux” perception).

My impression is that bias is mostly just self-selection; people don’t think Java talks will get accepted, so don’t submit Java talks. I guess it’s possible that there’s a natural disconnect too: linux.conf.au likes to have deep technical talks on topics, and maybe there’s not much overlap between with what’s already there and what people with deep technical knowledge of Java stuff find interesting, so they just go to other conferences.

That said, it seems like it’d be pretty easy to propose either a mini-conference for Java, or BSD, or non-traditional platforms in general (covering free software development for say BSD, JVM, MacOS and Windows) and see what happens. Especially given Greg Lehey’s on the Ballarat organising team from what I’ve been told, interesting BSD related content seems like it’d have a good chance of success at getting in…

Silly testcase hacks

Martin Pool linked to an old post by Evan Miller on how writing tests could be more pleasant if you could just do the setup and teardown parts once, and (essentially) rely on backtracking to make sure it happens for every test. He uses a functional language for his example, and it’s pretty interesting.

But it is overly indented, and hey, I like my procedural code, so what about trying the same thing in Python? Here’s my go at it. The code under test was the simplest thing I could think of — a primality checker:

def is_prime(n):
    if n == 1: return False
    i = 2
    while i*i <= n:
        if n % i == 0: return False
        i += 1
    return True

My test function then tests a dozen numbers numbers which I know are prime or not, return True if is_prime got the right answer, and False otherwise. It makes use of a magic "branch" function to work out which number to test:

def prime_test(branch):
    if branch(True, False):
        n = branch(2,3,5,7,1231231)
        return is_prime(n)
    else:
        n = branch(1,4,6,8,9,10,12312312)
        return not is_prime(n)

In order to get all the tests run, we need a loop, so the test harness looks like:

for id, result in run_tests(prime_test):
    print id, result

(Counting up successes, and just printing the ids of failures would make more sense, probably. In any event, the output looks like:

[True, 2] True
[True, 3] True
[True, 5] True
[True, 7] True
[True, 1231231] True
[False, 1] True
[False, 4] True
[False, 6] True
[False, 8] True
[False, 9] True
[False, 10] True
[False, 12312312] True

Obviously all the magic happens in run_tests which needs to work out how many test cases there'll end up being, and provide the magic branch function which will give the right values. Using Python's generators to keep some state makes that reasonable straightforward, if a bit head-twisting:

def run_tests(test_fn):
    def branch(*options):
        if len(idx) == state[0]:
            idx.append(0)
        n = idx[state[0]]
        if n+1 < len(options):
            state[1] = state[0]
        state[0] += 1
        vals.append(options[n])
        return options[n]

    idx = []
    while True:
        state, vals = [0, None], []
        res = test_fn(branch)
        yield (vals, res)
        if state[1] is None: break
        idx[state[1]] += 1
        idx[state[1]+1:] = []

This is purely a coding optimisation -- any setup and teardown in prime_test is performed each time, there's no caching. I don't think there'd be much difficulty writing the same thing in C or similar either -- there's no real use being made of laziness or similar here -- I'm just passing a function that happens to have state around rather than a struct that happens to include a function pointer.

Anyway, kinda nifty, I think!

(Oh, this is also inspired by some of the stuff Clinton was doing with abusing fork() to get full coverage of failure cases for code that uses malloc() and similar, by using LD_PRELOAD)

Silly hacks

One thing that keeps me procrastinating about writing programs I have is doing up a user interface for them. It just seems like so much hassle writing GUI code or HTML, and if I just write for the command line, no one else will use it. Of course, most of the reason I don’t mind writing for the command line is that interaction is so easy, and much of that is thanks to the wonders of “printf”. But why not have a printf for GUIs? So I (kinda) made one:

f, n = guif.guif("%t %(edit)250e \n %(button)b",             
                 "Enter some text", "", "Press me!");

In theory, you can specify widget sizes using something like “%10,12t” to get a text box with a width of 10 and a height of 12, but it doesn’t seem to actually work at the moment, and might be pixel based instead of character based, which I’m not sure is a win. I was figuring you could say “%-t” for left aligned, and “%+t” for right aligned; and I guess you could do “%^t” for top and “%_t” for bottom alignment. I’ve currently just got it doing a bunch of rows laid out separately — you’d have to specify explicit widths to get things lined up; but the logical thing to do would be to use “\t” to automatically align things. It also doesn’t handle literals inside the format string, so you can’t say “Enter some text: %e\n%b”.

At the moment the two objects that returns are the actual frame (f), and a dictionary of named elements (n) in case you want to reference them later (to pull out values, or to make buttons actually do something, etc). That probably should be merged into a single object though.

I guess what I’d like to be able to write is a complete program that creates and displays a simple gui with little more than:

#!/usr/bin/env python
import guif, wx

f = guif.guif("Enter some text: %(edit)250e \n %(done)b", 
    "", "Done!",
    stopon = ("done", wx.EVT_BUTTON))

print "Hey, you entered %s!" % f.edit.GetValue()
f.Close()

I figure that should be little enough effort over the command line equivalent to be pleasant:

#!/usr/bin/env python
import sys

print "Enter some text:",
x = sys.stdin.readline()
print "Hey, you entered %s!" % x.strip()

NBN Business Plan

We will be releasing a 50-page document that summarises the NBN Co business case, Ms Gillard said.

So the 50 page NBN Co business case summary came out yesterday. It runs to 36 pages, including the table of contents.

According to the document, they’re going to be wholesale providers to retail ISPs/telcos, and be offering a uniform wholesale price across the country (6.3). There’ll be three methods of delivery — fibre, wireless and satellite, though I didn’t notice any indication of whether people would pay more for content over satellite than over fibre. They’re apparently expecting to undercut the wholesale prices for connectivity offered today (6.3.1). They’ve pulled some “market expection” data from Alcatel/Lucent which has a trend line of exponential increase in consumer bandwidth expectations up to 100Mb/s in 2015 or so, and 1Gb/s around 2020 for fixed broadband — and a factor of 100 less for wireless broadband (6.3.2, chart 1). Contrary to that expection, their own “conservative” projections A1 and A2 (6.3.2, exhibit 2) have about 50Mb/s predicted for 2015, and 100Mb/s for 2020 — with A2 projecting no growth in demand whatsoever after 2020, and A1 hitting 1Gb/s a full 20 years later than the Alcatel/Lucent expectations.

Even that little growth in demand is apparently sufficient to ensure the NBN Co’s returns will “exceed the long term government bond rate”. To me, that seems like they’re assuming that the market rates for bandwidth in 2015 or 2020 (or beyond) will be comparable to rates today — rather than exponentially cheaper. In particular, while the plan goes on to project significant increase in demand for data usage (GB/month) in addition to speed (Mb/s), there’s no indication of how the demand for data and speed get transferred into profits over the fifteen year timespan they’re look at. By my recollection, 15 years ago data prices in .au were about 20c/MB, compared to maybe 40c/GB ($60/mo for 150GB on Internode small easy plan) today.

Given NBN Co will be a near-monopoly provider of bandwidth, and has to do cross-subsidisation for rural coverage (and possibly wireless and satellite coverage as well), trying to inflate the cost per GB seems likely to me: getting wires connected up to houses is hard (which is why NBN Co is budgeting almost $10B in payments to Telstra to avoid it where possible), and competing with wires with wireless is hard too (see the 100x difference in speed mentioned earlier), so you’re going to end up paying NBN Co whatever they want you to pay them.

However they plan on managing it, they’re expecting to be issuing dividends from 2020 (6.7), that will “repay the government’s entire investment by 2034”. That investment is supposedly $27.1B, which would mean at least about $2B per year in profits. For comparison, Telstra’s current profits (across all divisions, and known as they are for their generous pricing) are just under $4B per year. I don’t think inflation helps there, either; and there’s also the other $20B or so of debt financing they’re planning on that they’ll have to pay back, along with the 12-25% risk premium they’re expecting to have to pay (6.8, chart 5).

I’m not quite sure I follow the “risk premium” analysis — for them to default on the debt financing, as far as I can see, NBN Co would have to go bankrupt, which would require selling their assets, which would be all that fibre and axis to ducts and whatnot: effectively meaning NBN Co would be privatised, with first dibs going to all the creditors. I doubt the government would accept that, so it seems to me more likely that they’d bail out NBN Co first, and there’s therefore very, very little risk in buying NBN Co debt compared to buying Australian government debt, but a 12-25% upside thrown in anyway.

As a potential shareholder, this all seems pretty nice; as a likely customer, I’m not really terribly optimistic.

Rocket Tracking

While I was still procrastinating doing the altosui and Google Earth mashup I mentioned last post, Keith pointed out that Google Maps has a static API, which means it’s theoretically possible to have altosui download maps of the launch site before you leave, then draw on top of them to show where your rocket’s been.

The basic API is pretty simple — you get an image back centred on a given latitude and longitude; you get to specify the image size (up to 640×640 pixels), and a zoom level. A zoom level of 0 gives you the entire globe in a 256×256 square, and each time you increase the zoom level you turn each pixel into four new ones. Useful zoom levels seem to be about 15 or so. But since it’s a Mercator projection, you don’t have to zoom in as far near the poles as you do around the equator — which means the “or so” is important, and varies depending on the the latitude.

Pulling out the formula for the projection turns out to be straightforward — though as far as I can tell, it’s not actually documented. Maybe people who do geography stuff don’t need docs to work out how to convert between lat/long and pixel coordinates, but I’m not that clever. Doing a web search didn’t seem to offer much certainty either; but decoding the javascript source turned out to not be too hard. Formulas turn out to be (in Java):

Point2D.Double latlng2coord(double lat, double lng, int zoom) {
    double scale_x = 256/360.0 * Math.pow(2, zoom);
    double scale_y = 256/(2.0*Math.PI) * Math.pow(2, zoom);
    Point2D.Double res = new Point2D.Double();

    res.x = lng*scale_x;
    double e = Math.sin(Math.toRadians(lat));
    e = limit(e, -1+1.0E-15, 1-1.0E-15);
    res.y = 0.5*Math.log((1+e)/(1-e))*-scale_y;
    return res;
}

That gives you an absolute coordinate relative to the prime meridian at the equator, so by the time you get to zoom level 15, you’ve got an 8 million pixel by 8 million pixel coordinate system, and you’re only ever looking at a 640×640 block of that at a time. Fortunately, you also know the lat/long of the center pixel of whatever tile you’re looking at — it’s whatever you specified when you requested it.

The inverse function of the above gives you the the latitude and longitude for centrepoints of adjacent maps, which then lets you tile the images to display a larger map, and choosing a consistent formula for the tiling lets you download the right map tiles to cover an area before you leave, without having to align the map tiles exactly against your launch site coordinates.

In Java, the easy way to deal with that seems to be to setup a JScrollable area, containing a GridBagLayout of the tiles, each of which are images set as the icon of JLabels. Using the Graphics2D API lets you draw lines and circles and similar on the images, and voila, you have a trace:

Currently the “UI” for downloading the map images is that it’ll print out some wget lines on stdout, and if you run them, next time you run altosui for that location, you’ll get maps. (And in the meantime, you’ll just get a black background)

It’s not rocket surgery

…except when it is:

Anyhoo, somehow or other I’m now a Tripoli certified rocket scientist, with some launches and data to show for it:

Bunches of fun — and the data collection gives you an excuse to relive the flight over and over again while you’re analysing it. Who couldn’t love that? Anyway, as well as the five or six rocket flights I’ve done without collecting data (back in 2007 with a Rising Star, and after DebConf 10 at Metra with a Li’l Grunt), I’ve now done three flights on my Little Dog Dual Deploy (modified so it can be packed slightly more efficiently — it fits in my bag that’s nominally ok for carry-on, and in my bike bag) all of which came back with data. I’ve done posts on the Australian Rocketry Forums on the first two flights and the third flight. There’s also some video of the third flight:

But anyway! One of the things rocketeering focusses on as far as analysis goes is the motor behaviour — how much total impulse it provides, average thrust, burn time, whether the thrust is even over the burn time or if it peaks early or late, and so on. Commercial motors tend to come with stats and graphs telling you all this, and there are XML files you can feed into simulators that will model your rocket’s behaviour. All very cool. However, a lot of the guys at the Metra launch make their own motors, and since it tends to be way more fun to stick your new motor in a rocket and launch it than to put it on a testing platform, they only tend to have guesses at how it performs rather than real data. But Keith mentioned it ought to be possible to derive the motor characteristics from the flight data (you need to subtract off gravity and drag from the sensed acceleration, then divide out the mass to get force, ideally taking into account the fact that the motor is losing mass as it burns, that drag varies according to speed and potentially air pressure, and gravity may not be exactly aligned with your flight path), and I thought that sounded like a fun thing to do.

Unfortunately when I looked at my data (which comes, of course, from Bdale and Keith’s Telemetrum board and AltOS software), it turned out there was a weird warble in my acceleration data while it was coasting — which stymied my plan to calculate drag, and raised a question about the precision of the acceleration under boost data too. After hashing around ideas on what could be causing it on IRC (airframe vibration? board not tied down? wind?), I eventually did the sensible thing and tried recording data while it was sitting on the ground. Result — exactly the same: weird warbling in the accel data even when it’s just sitting there. As it turned out, it was a pretty regular warble too — basically a square wave with a wavelength of 100ms. That seemed to tie in with the radio — which was sending out telemetry packets ten times a second between launch and apogee. Of course, there wasn’t any reason for the radio to be influencing the accelerometer — they’re even operating off separate voltages (the accelerometer being the one 5V part on the board).

Hacking the firmware to send out telemetry packets at a different rate confirmed the diagnosis though — the accelerometer was reporting lower acceleration while the radio’s sending data. Passing the buck to Keith, it turned out that being the one 5V part was a problem — the radio was using enough current to cause the supply voltage to drop slightly, which caused all the other sensors to scale proportionally (and thus still be interpreted correctly), but the accelerometer kept operating at 5V leading to a higher output voltage which gets interpreted as lower acceleration. One brief idea was to try comparing the acceleration sensor to the 1.25V generated by the cpu/radio chip, but unfortunately it gets pulled down worse than the 3V does.

Fortunately this happens on more than just my board (though not all of them), so hopefully someone’ll think up a fix. I’m figuring I’ll just rely on cleaning up the data in post-processing — since it’s pretty visible and regular, that shouldn’t be too hard.

Next on the agenda though is trying some real-time integration with Google Earth — basically letting altosui dump telemetry data as normal, but also watching the output file for updates, running a separate altosui process to generate a new KML file from it, which Google Earth is watching and displaying in turn. I think I’ve got all the pieces for that pretty ready, mostly just waiting for next weekend’s QRS launch, and crossing my fingers my port HP Mini 2133 can handle the load. In any event, I hacked up some scripts to simulate the process using data from my third flight, and it seemed to work ok. Check out the recording:

BTW, if that sounds like fun (and if it doesn’t, you’re doing it wrong), now would probably be a good time to register for lca and sign up to the rocketry miniconf — there’s apparently still a couple of days left before early bird prices run out.

Progressive taxation

I saw a couple of things over the last couple of days about progressive taxation — one was a Malcolm Gladwell video on youtube about how a top tax rate of 91% is awesome and Manhattan Democrats are way smarter than Floridian Republicans; the other an article by Greg Mankiw in the New York Times about how he wants to write articles, but is disinclined too because if he does, Obama will steal from his kids.

Gladwell’s bit seems like almost pure theatre to me — the only bit of data is that during and after WW2 the US had a top marginal tax rate of just over 90% on incomes of $200,000 (well, except that WW2 and the debt the US accrued in fighting it isn’t actually mentioned). Gladwell equates that to a present day individual income of two million a year, which seems to be based on the official inflation rate; comparing it against median income at the time (PDF) gives a multiplier of 13.5 ($50,000/$3,700) for a top-tax bracket household income of $5.4 million ($2.7 million individual). I find it pretty hard to reason about making that much money, but I think it’s interesting to notice that the tax rate of households earning 5x the median income (ie $250,000 now, $18,500 then) is already pretty similar: 33% now, 35% then. Of course in 1951 the US was paying off debt, rather than accruing it… (I can’t find a similar table of income tax rates or median incomes for Australia; but our median household income is about $67,000 now and a household earning $250,000 a year would have a marginal rate between 40% and 45%, and seems to have been about 75% for a few years after WW2)

Meanwhile, Mankiw’s point comes down to some simple compound interest maths: getting paid $1000 now and investing it at 8% to give to your kids in 30 years would result in: (1) a $10,000 inheritance if it weren’t taxed, or (2) a $1,000 inheritance after income tax, dividend tax and estate tax — so effectively those taxes add up to a 90% tax rate anyway. If you’re weighing up whether to spend the money now or save it for your kids, you get two other options: (3) spend $523 on yourself, or (4) spend $1000 through your company. An inflation rate of just 2.2% (the RBA aims for between 3% and 4%) says (3) is better than (2), and if you want to know why evil corporations are so popular, comparing (3) and (4) might give it away…

An approach to avoiding that problem is switching to consumption taxes like the GST instead of income taxes — so you discourage people spending money rather than earning it. At first glance that doesn’t make a difference: there’s no point earning money if you can’t spend it. But it does make a huge difference to savings. For Mankiw’s example: 47.7% income tax ($1000 – $477 = $523) equates to 91.2% consumption tax (as compared to 10% GST); but your kids get $10,000 so can buy $5,230 worth of goods and still afford the additional $4,770 in taxes. As opposed to only getting $1,000 worth of goods without any consumption taxes.

The other side of the coin is what happens to government revenues. In Mankiw’s example, the government would receive $477 in the first year’s tax return, $1,173 over the next thirty years (about $40 per year), and $571 when the funds are inherited for a total of $2,221. That would work out pretty much the same if the government instead sold 30-year treasury bonds to match that income, and then paid off that debt once it collected the consumption tax. Since US Treasury’s are currently worth 3.75% at 30 years at the moment, that turns into $3,900 worth of debt after thirty years; which in turn leaves the government better off by $870. The improvement is due to the difference between the private return on saving (8%) versus the government’s cost of borrowing (3.75%).

Given the assumptions then, everyone wins: the parent, the kids, the government. It’s possible that would be the case in reality too; though it’s not certain. The main challenges are in the rates: if there’s a lot more saving going on (because it’s taxed less and thus more effective), then interest rates are liable to go down unless there’s a corresponding uptick in demand, which for interest rates means an uptick in economic activity. If Mankiw’s representative in being more inclined to work more in that scenario, that’s at least a plausible outcome. Similarly, if there’s a lot more government borrowing going on (because their revenue is becoming more deferred), then their rates might rise. In the scenario above, bond rates of 4.85% is the break even point in terms of a single 91.2% consumption tax matching a 47.7% tax rate on income and dividends and a 35% inheritance tax.

Not worrying about taxing income makes a bunch of things easier: there’s no more worries about earned income, versus interest income, versus superannuation income, versus dividend income, versus capital gains, versus fringe benefits, etc.

One thing it makes harder is having a progressive tax system — which is to say that people who are “worth” more are forced to contribute a higher share of their “worth” to government finances. With a progressive income tax, that means people who earn more pay more. With a progressive consumption tax, that would mean that people who spend more pay more — so someone buying discount soup might pay 10% GST (equivalent to 9.1% income tax), someone buying a wide screen tv might pay 50% (33% income tax) and someone buying a yacht might pay 150% (60% income tax). Because hey, if your biggest expenses are cans of soup, you probably can’t afford to contribute much to the government, but if you’re buying yachts…

One way to handle that would be to make higher GST rates kick in at higher prices — so you pay 10% for things costing up to $100, 50% for things costing up to $10000, and 150% for things costing more than that. The disadvantage there is the difference in your profit margin between selling something for $9,999 including 50% GST and $16,668 including 150% GST is $1.20, which is going to distort things. Why spend $60,000 on a nice car at 150% GST, if you can spend $9,999 on a basic car, $9,999 on electonics, $9,999 on other accessories, and $9,999 on labour to get them put together and end up with a nicer car, happier salesmen, and $20,000 in savings?

Another way to get a progressive income tax would be by doing tax refunds: everyone pays the highest rate when they buy stuff, but you then submit a return with your invoices, and get a refund. If you spend $20,000 on groceries over the year, at say 20% GST, then reducing your GST to 10% would be a refund of $1,667. If you spend $50,000 on groceries and a car, you might only get to reduce your GST to an average of 15%, for a refund of $2,090. If you spend $1,000,000 on groceries, a car, and a holiday home, you might be up to an average of 19.5% for a refund of just $4,170. Coming up with a formula that always gives you more dollars the more expenditure you report (so there’s no advantage to under reporting), but also applies a higher rate the more you spend (so it’s still progressive) isn’t terribly hard.

The downside is the paying upfront is harshest on the poorest: if you’re spending $2,000 a month on food it doesn’t help to know that $1,200 of that is 150% GST and you’ll get most of it back next year if you’re only earning $900 a month. But equally it wouldn’t be hard to have CentreLink offices just hand out $1,120 a month to anyone who asks (and provides their tax file number), and confidently expect to collect it back in GST pretty quickly. Having the “danger” be that you hand out $1,120 to someone who doesn’t end up spending $2,000 a month or more doesn’t seem terribly bad to me. And there’s no problem handing out $1,200 to someone making thousands a week, because you can just deduct it from whatever they were going to claim on their return anyway.

As I understand it, there’s not much problem with GST avoidance for three structural reasons: one is that at 10%, it’s just not that big a deal; another is that since it’s nationwide, avoiding it legally tends to involve other problems whether it be postage/shipping costs, delays, timezone differences, legal complexities or something else; and third is that because businesses get to claim tax credits for their purchases there’s paper trails at both ends meaning it’s hard to do any significant off-book work without getting caught. Increasing the rate substantially (from 10% to 150%) could end up encouraging imports — why buy a locally built yacht for $750,000 (150% GST) when you could buy it overseas for $360,000 (20% VAT say) and get it shipped here for $50,000? I don’t know if collecting GST at the border is a sufficiently solved problem to cope with that sort of incentive… On the other hand, having more people getting some degree of refund means it’s harder to avoid getting caught by the auditors if you’re not passing on the government’s tithe, so that’s possibly not too bad.

LCA Schedule

It appears the first draft of the linux.conf.au 2011 schedule (described by some as a thing of great beauty) is up as of this morning. Looks promising to me.

Of note:

  • There’s lots of electronics-related talks (Arduino miniconf, Rocketry miniconf, Lunar Numbat, Freeing Production, “Use the Force, Linus”, All Chips No Salsa, e4Meter, Growing Food with Open Source, Lightweight Messaging, Misterhouse, and the Linux Powered Coffee Roaster). If you count mesh telephony too and don’t count the TBD slot, you can spend every day but Wednesday ensconced in hardware-hacking talks of one sort or another.
  • There seems like reasonable female representation — Haecksen miniconf, LORE, HTML5 Video, Documentation, Intelligent Web, Incubation and Mentoring, Perl Best Practices, Project Managers, Growing Food with Open Source; so 7% of the miniconfs and 13% of the talks so far announced.
  • Speaking of oppressed minorities, there’s also a couple of talks about non-Linux OSes: pf and pfsync on OpenBSD, and HaikuOS. Neato.
  • Maybe it’s just me, but there seems to be a lot of “graphics” talks this year: GLSL, OptlPortal, Pixels from a Distance, X and the Future of Linux Graphics, HTML5 Video, Anatomy of a Graphics Driver; and depending on your point of view Print: The Final Frontier, Non-Visual Access, Can’t Touch This, and the X Server Development Process.
  • The cloud/virtualisation stuff seems low-key this year: there’s Freeing the Cloud, Roll Your Own Cloud, Virtual Networking Performance, Virtualised Network Bandwidth Control, and ACID in the Cloud (that somehow doesn’t include an acid rain pun in the abstract). Of course, there’s also the “Freedom in the Cloud” and “Multicore and Parallel Computing” miniconfs which are probably pretty on point, not to mention the Sysadmin and Data Storage miniconfs which could see a bunch of related talks too.

And a bunch of other talks too, obviously. What looks like eight two-hour tutorial slots are yet to be published, maybe six more talks to be added, and three more keynotes (or given the arrangement of blank slots, maybe two more talks and four more keynotes). Also, there’s the PDNS on Wednesday, Penguin Dinner on Thursday, both over the river. And then there’s Open Day on Saturday, and an as yet not completely organised rocket launch sometime too…

Some Lenny development cycle stats

I’ve been playing with some graphing tools lately, in particular Dan Vanderkam’s dygraphs JavaScript Visualization Library. So far I’ve translated the RC bug list (the “official” one, not the other one) into the appropriate format, generated some numbers for an LD50 equivalent for bugs, and on Wouter’s suggestion the buildd stats.

One of the nice things about the dygraphs library is it lets you dynamically play with the date range you’re interested in; and you can also apply a rolling average to smooth out some of the spikiness in the data. Using that to restrict the above graphs to the lenny development cycle (from etch’s release in April 2007 to lenny’s release in February 2009) gives some interesting stats. Remembering that the freeze started in late July 2008 (Debconf 8 was a couple of weeks later in August 2008).

RC bugs; first:

Not sure there’s a lot of really interesting stuff to deduce from that, but there’s a couple of interesting things to note. One is that before the freeze, there were some significant spikes in the bug count — July 2007, September 2007, November 2007, and April 2008, in particular; but after the freeze, the spikes above trend were very minor, both in size and duration. Obviously all of those are trivial in comparison to the initial spurt in bugs between April and June 2007, though. Also interesting is that by about nine months in, lenny had fewer RC bugs than the stable release it was replacing (etch) — and given that’s against a 22 month dev cycle, it’s only 40% of etch’s life as stable. Of course some of that may simply be due to a lack of accuracy in tracking RC bugs in stable; or a lack of accuracy in the RC bugcount software.

Quite a bit more interesting is the trend of the number of bugs (of all sorts — wishlist, minor, normal, RC, etc) filed each week — it varies quite a bit up until the freeze, but without any particular trend; but pretty much as soon as the freeze is announced trends steadily downward until the release occurs at which point there’s about half as many bugs being filed each week as there were before the freeze. And after lenny’s released it starts going straight back up. There’s a few possible explanations for the cause of that: it might be due to fewer changes being uploaded due to the freeze, and thus less bugs being filed; it might be due to people focussing on fixing bugs rather than finding them; it might be due to something completely unrelated.

An measure of development activity that I find intriguing is what I’m calling the “LD50” — the median number of days it takes a bug to be closed, or the “lethal dosage of development time for 50% of the bug population”. That’s not the same as a half life, because there’s not necessarily an exponential decay behaviour — I haven’t looked into that at all yet. But it’s a similar idea. Anyway, working out the LD50 for cohorts of bugs filed in each week brings out some useful info. In particular for bugs filed up until the lenny freeze, the median days until a fix ranged from as low as 40 days to up to 120 days; but when the freeze was declared, that shot straight up to 180 days. Since then it’s gradually dropped back down, but it’s still quite high. As far as I can tell, this feature was unique to the lenny release — previous releases didn’t have the same effect, at least to anywhere near that scale. As to the cause — maybe the bugs got harder to fix, or people started prioritising previously filed bugs (eg RC bugs), or were working on things that aren’t tracked in the BTS. But it’s interesting to note that was happening at the same time that fewer bugs were being filed each week — and indeed it suggests an alternative explanation for fewer bugs being filed each week: maybe people noticed that Debian bugs weren’t getting fixed as quickly, and didn’t bother reporting them as often.

This is a look at the buildd “graph2”, which is each architecture’s percentage of (source) packages that are up to date, out of the packages actually uploaded on that architecture. (The buildd “graph” is similar, but does a percentage of all packages that are meant to be built on the architecture) Without applying the rolling average it’s a bit messy. Doing a rolling average over two weeks makes things much simpler to look at, even if that doesn’t turn out that helpful in this case:

Really the only interesting spots I can see in those graphs are that all the architectures except i386 and amd64 had serious variability in how up to date their builds right up until the freeze — and even then there was still a bit of inconsistency just a few months before the actual release. And, of course, straight after both the etch and lenny release, the proportion of up to date packages for various architectures drops precipitiously.

Interestingly, comparing those properties to the current spot in squeeze’s development seems to indicate things are promising for a release: the buildd up-to-dateness for all architectures looks like it’s stabilised above 98% for a couple of months; the weekly number of bugs filed has dropped down from a high of 1250 a week to about 770 at the moment; and the LD50 has dropped from 170 days shortly after lenny’s freeze to just under 80 days currently (though that’s still quite a bit higher than the 40 days just before lenny’s freeze). The only downside is the RC bug count is still fairly high (at 550), though the turmzimmer RC count is a little better at only 300, currently.