Parental Leave

Two posts in one month! Woah!

A couple of weeks ago there was a flurry of stuff about the Liberal party’s Parental Leave policy (viz: 26 weeks at 100% of your wage, paid out of the general tax pool rather than by your employer, up to $150k), mostly due to a coalition backbencher coming out against it in the press (I’m sorry, I mean, due to “an internal revolt”, against a policy “detested by many in the Coalition”). Anyway, I haven’t had much cause to give it any thought beforehand — it’s been a policy since the 2010 election I think; but it seems like it might have some interesting consequences, beyond just being more money to a particular interest group. In particular, one of the things that doesn’t seem to me to get enough play in the whole “women are underpaid” part of the ongoing feminist, women-in-the-workforce revolution, is how much both the physical demands of pregnancy and being a primary caregiver justifiably diminish the contributions someone can make in a career. That shouldn’t count just the direct factors (being physically unable to work for a few weeks around birth, and taking a year or five off from working to take care of one or more toddlers, eg), but the less direct ones like being less able to commit to being available for multi-year projects or similar. There’s also probably some impact from the cross-over between training for your career and the best years to get pregnant — if you’re not going to get pregnant, you just finish school, start working, get more experience, and get paid more in accordance with your skills and experience (in theory, etc). If you are going to get pregnant, you finish school, start working, get some experience, drop out of the workforce, watch your skills/experience become out of date, then have to work out how to start again, at a correspondingly lower wage — or just choose a relatively low skill industry in the first place, and accept the lower pay that goes along with that. I don’t think either the baby bonus or the current Australian parental leave scheme has any affect on that, but I wonder if the Liberal’s Parental Leave scheme might. There’s three directions in which it might make a difference, I think. One is for women going back to work. Currently, unless your employer is more generous, you have a baby, take 16 weeks of maternity leave, and get given the minimum wage by the government. If that turns out to work for you, it’s a relatively easy decision to decide to continue being a stay at home mum, and drop out of the workforce for a while: all you lose is the minimum wage, so it’s not a much further step down. On the other hand, after spending half a year at your full wage, taking care of your new child full-time, it seems a much easier decision to go back to work than to be a full-time mum; otherwise you’ll have to deal with a potentially much lower family income at a time when you really could choose to go back to work. Of course, it might work out that daycare is too expensive, or that the cut in income is worth the benefits of a stay at home mum, but I’d expect to see a notable pickup in new mothers returning to the workforce around six months after giving birth anyway. That in turn ought to keep women’s skills more current, and correspondingly lift wages. Another is for employers dealing with hiring women who might end up having kids. Dealing with the prospect of a likely six-month unpaid sabbatical seems a lot easier than dealing with a valued employee quitting the workforce entirely on its own, but it seems to me like having, essentially, nationally guaranteed salary insurance in the event of pregnancy would make it workable for the employee to simply quit, and just look for a new job in six month’s time. And dealing with the prospect of an employee quitting seems like something employers should expect to have to deal with whoever they hire anyway. Women in their 20s and 30s would still have the disadvantage that they’d be more likely to “quit” or “take a sabbatical” than men of the same age and skillset, but I’m not actually sure it would be much more likely in that age bracket. So I think there’s a good chance there’d be a notable improvement here too, perhaps even to the point of practical equality. Finally, and possibly most interestingly, there’s the impact on women’s expectations themselves. One is that if you expect to be a mum “real soon now”, you might not be pushing too hard on your career, on the basis that you’re about to give it up (even if only temporarily) anyway. So, not worrying about pushing for pay rises, not looking for a better job, etc. It might turn out to be a mistake, if you end up not finding the right guy, or not being able to get pregnant, or something else, but it’s not a bad decision if you meet your expectations: all that effort on your career for just a few weeks pay off and then you’re on minimum wage and staying home all day. But with a payment based on your salary, the effort put into your career at least gives you six month’s worth of return during motherhood, so it becomes at least a plausible investment whether or not you actually become a mum “real soon now” or not. According to the 2010 tax return stats I used for my previous post, the gender gap is pretty significant: there’s almost 20% less women working (4 million versus 5 million), and the average working woman’s income is more than 25% less than the average working man’s ($52,600 versus $71,500). I’m sure there are better ways to do the statistics, etc, but just on those figures, if the female portion of the workforce was as skilled and valued as the male portion, you’d get a$77 billion dollar increase in GDP — if you take 34% as the proportion of that that the government takes, it would be a $26 billion improvement to the budget bottom line. That, of course, assumes that women would end up no more or less likely to work part time jobs than men currently are; that seems unlikely to me — I suspect the best that you’d get is that fathers would become more likely to work part-time and mothers less likely, until they hit about the same level. But that would result in a lower increase in GDP. Based on the above arguments, there would be increase the number of women in the workforce as well, though that would get into confusing tradeoffs pretty quickly — how many families would decide that a working mum and stay at home dad made more sense than a stay at home mum and working dad, or a two income family; how many jobs would be daycare jobs (counted as GDP) in place of formerly stay at home mums (not counted as GDP, despite providing similar value, but not taxed either), etc. I’m somewhat surprised I haven’t seen any support for the coalition’s plans along these lines anywhere. Not entirely surprised, because it’s the sort of argument that you’d make from the left — either as a feminist, anti-traditional-values, anti-stay-at-home-mum plot for a new progressive genderblind society; or from a pure technocratic economic point-of-view; and I don’t think I’ve yet seen anyone with lefty views say anything that might be construed as being supportive of Tony Abbott… But I would’ve thought someone on the right Bolt or Albrechtsen or Australia’s leading libertarian and centre-right blog or the Liberal party’s policy paper might have canvassed some of the other possible pros to the idea rather than just worrying about the benefits to the recipients, and how it gets paid for. In particular, the argument for any sort of payment like this shouldn’t be about whether it’s needed/wanted by the recipient, but how it benefits the rest of society. Anyway. Messing with taxes It’s been too long since I did an economics blog post… Way back when, I wrote fairly approvingly of the recommendations to simplify the income tax system. The idea being to more or less keep charging everyone the same tax rate, but to simplify the formulae from five different tax rates, a medicare levy, and a gradually disappearing low-income tax offset, to just a couple of different rates (one kicking in at$25k pa, one at $180k pa). The government’s adopted that in a half-hearted way — raising the tax free threshold to$18,200 instead of $25,000, and reducing but not eliminating the low-income tax offset. There’s still the medicare levy with its weird phase-in procedure, and there’s still four different tax rates. And there’s still all sorts of other deductions etc to keep people busy around tax time. Anyway, yesterday I finally found out that the ATO publishes some statistics on how many people there are at various taxable income levels — table 9 of the 2009-2010 stats are the best I found, anyway. With that information, you can actually mess around with the tax rules and see what affect it actually has on government revenue. Or at least what it would have if there’d been no growth since 2010. Anyway, by my calculations, the 2011-2012 tax rates would have resulted in about$120.7B of revenue for the government, which roughly matches what they actually reported receiving in that table ($120.3B). I think putting the$400M difference (or about $50 per taxpayer) down to deductions for dependants and similar that I’ve ignored seems reasonable enough. So going from there, if you followed the Henry review’s recommendations, dropping the Medicare levy (but not the Medicare surcharge) and low income tax offset, the government would end up with$117.41B instead, so about $3.3B less. The actual changes between 2011-2012 and 2012-2013 (reducing the LITO and upping the tax free threshold) result in$118.26B, which would have only been $2.4B less. Given there’s apparently a$12B fudge-factor between prediction and reality anyway, it seems a shame they didn’t follow the full recommendations and actually make things simpler.

Anyway, upping the Medicare levy by 0.5% seems to be the latest idea. By my count doing that and keeping the 2012-2013 rates otherwise the same would result in $120.90B, ie back to the same revenue as the 2011-2012 rates (though biased a little more to folks on taxable incomes of$30k plus, I think).

Personally, I think that’s a bit nuts — the medicare levy should just be incorporated into the overall tax rates and otherwise dropped, IMO, not tweaked around. And it’s not actually very hard to come up with a variation on the Henry review’s rates that both simplify tax levels, don’t increase tax on any individual by very much, and do increase revenue. My proposal would be: drop the medicare levy and low income tax offset entirely (though not the medicare levy surcharge or the deductions for dependants etc), and set the rates as: 35% for earnings above $25k, 40% for earnings above$80k, and 46.5% for earnings above $180k. That would provide government revenue of$120.92B, which is close enough to the same as the NDIS levy. It would cap the additional tax any individual pays to $2000 compared to 2012-13 rates, ie it wouldn’t increase the top marginal rate. It would decrease the tax burden on people with taxable incomes below$33,000 pa — the biggest winners would be people earning $25,000 who’d go from paying$1200 tax per year to nothing. The “middle class” earners between $35,000 and$80,000 would pay an extra $400-$500; higher income earners between $80,000 and$180,000 get hit between $500 and$2000, and anyone above $180,000 pays an extra$2,000. Everyone earning over about $34,000 but under about$400,000 would pay more tax than if the NDIS were just increased, everyone earning between $18,000 and$34,000 would be better off.

On a dollar basis, the comparison looks like:

Translating that to a percentage of income, it looks like:

Given NBN Co will be a near-monopoly provider of bandwidth, and has to do cross-subsidisation for rural coverage (and possibly wireless and satellite coverage as well), trying to inflate the cost per GB seems likely to me: getting wires connected up to houses is hard (which is why NBN Co is budgeting almost $10B in payments to Telstra to avoid it where possible), and competing with wires with wireless is hard too (see the 100x difference in speed mentioned earlier), so you’re going to end up paying NBN Co whatever they want you to pay them. However they plan on managing it, they’re expecting to be issuing dividends from 2020 (6.7), that will “repay the government’s entire investment by 2034”. That investment is supposedly$27.1B, which would mean at least about $2B per year in profits. For comparison, Telstra’s current profits (across all divisions, and known as they are for their generous pricing) are just under$4B per year. I don’t think inflation helps there, either; and there’s also the other 20B or so of debt financing they’re planning on that they’ll have to pay back, along with the 12-25% risk premium they’re expecting to have to pay (6.8, chart 5). I’m not quite sure I follow the “risk premium” analysis — for them to default on the debt financing, as far as I can see, NBN Co would have to go bankrupt, which would require selling their assets, which would be all that fibre and axis to ducts and whatnot: effectively meaning NBN Co would be privatised, with first dibs going to all the creditors. I doubt the government would accept that, so it seems to me more likely that they’d bail out NBN Co first, and there’s therefore very, very little risk in buying NBN Co debt compared to buying Australian government debt, but a 12-25% upside thrown in anyway. As a potential shareholder, this all seems pretty nice; as a likely customer, I’m not really terribly optimistic. Rocket Tracking While I was still procrastinating doing the altosui and Google Earth mashup I mentioned last post, Keith pointed out that Google Maps has a static API, which means it’s theoretically possible to have altosui download maps of the launch site before you leave, then draw on top of them to show where your rocket’s been. The basic API is pretty simple — you get an image back centred on a given latitude and longitude; you get to specify the image size (up to 640×640 pixels), and a zoom level. A zoom level of 0 gives you the entire globe in a 256×256 square, and each time you increase the zoom level you turn each pixel into four new ones. Useful zoom levels seem to be about 15 or so. But since it’s a Mercator projection, you don’t have to zoom in as far near the poles as you do around the equator — which means the “or so” is important, and varies depending on the the latitude. Pulling out the formula for the projection turns out to be straightforward — though as far as I can tell, it’s not actually documented. Maybe people who do geography stuff don’t need docs to work out how to convert between lat/long and pixel coordinates, but I’m not that clever. Doing a web search didn’t seem to offer much certainty either; but decoding the javascript source turned out to not be too hard. Formulas turn out to be (in Java):  Point2D.Double latlng2coord(double lat, double lng, int zoom) { double scale_x = 256/360.0 * Math.pow(2, zoom); double scale_y = 256/(2.0*Math.PI) * Math.pow(2, zoom); Point2D.Double res = new Point2D.Double(); res.x = lng*scale_x; double e = Math.sin(Math.toRadians(lat)); e = limit(e, -1+1.0E-15, 1-1.0E-15); res.y = 0.5*Math.log((1+e)/(1-e))*-scale_y; return res; }  That gives you an absolute coordinate relative to the prime meridian at the equator, so by the time you get to zoom level 15, you’ve got an 8 million pixel by 8 million pixel coordinate system, and you’re only ever looking at a 640×640 block of that at a time. Fortunately, you also know the lat/long of the center pixel of whatever tile you’re looking at — it’s whatever you specified when you requested it. The inverse function of the above gives you the the latitude and longitude for centrepoints of adjacent maps, which then lets you tile the images to display a larger map, and choosing a consistent formula for the tiling lets you download the right map tiles to cover an area before you leave, without having to align the map tiles exactly against your launch site coordinates. In Java, the easy way to deal with that seems to be to setup a JScrollable area, containing a GridBagLayout of the tiles, each of which are images set as the icon of JLabels. Using the Graphics2D API lets you draw lines and circles and similar on the images, and voila, you have a trace: Currently the “UI” for downloading the map images is that it’ll print out some wget lines on stdout, and if you run them, next time you run altosui for that location, you’ll get maps. (And in the meantime, you’ll just get a black background) It’s not rocket surgery …except when it is: Anyhoo, somehow or other I’m now a Tripoli certified rocket scientist, with some launches and data to show for it: Bunches of fun — and the data collection gives you an excuse to relive the flight over and over again while you’re analysing it. Who couldn’t love that? Anyway, as well as the five or six rocket flights I’ve done without collecting data (back in 2007 with a Rising Star, and after DebConf 10 at Metra with a Li’l Grunt), I’ve now done three flights on my Little Dog Dual Deploy (modified so it can be packed slightly more efficiently — it fits in my bag that’s nominally ok for carry-on, and in my bike bag) all of which came back with data. I’ve done posts on the Australian Rocketry Forums on the first two flights and the third flight. There’s also some video of the third flight: But anyway! One of the things rocketeering focusses on as far as analysis goes is the motor behaviour — how much total impulse it provides, average thrust, burn time, whether the thrust is even over the burn time or if it peaks early or late, and so on. Commercial motors tend to come with stats and graphs telling you all this, and there are XML files you can feed into simulators that will model your rocket’s behaviour. All very cool. However, a lot of the guys at the Metra launch make their own motors, and since it tends to be way more fun to stick your new motor in a rocket and launch it than to put it on a testing platform, they only tend to have guesses at how it performs rather than real data. But Keith mentioned it ought to be possible to derive the motor characteristics from the flight data (you need to subtract off gravity and drag from the sensed acceleration, then divide out the mass to get force, ideally taking into account the fact that the motor is losing mass as it burns, that drag varies according to speed and potentially air pressure, and gravity may not be exactly aligned with your flight path), and I thought that sounded like a fun thing to do. Unfortunately when I looked at my data (which comes, of course, from Bdale and Keith’s Telemetrum board and AltOS software), it turned out there was a weird warble in my acceleration data while it was coasting — which stymied my plan to calculate drag, and raised a question about the precision of the acceleration under boost data too. After hashing around ideas on what could be causing it on IRC (airframe vibration? board not tied down? wind?), I eventually did the sensible thing and tried recording data while it was sitting on the ground. Result — exactly the same: weird warbling in the accel data even when it’s just sitting there. As it turned out, it was a pretty regular warble too — basically a square wave with a wavelength of 100ms. That seemed to tie in with the radio — which was sending out telemetry packets ten times a second between launch and apogee. Of course, there wasn’t any reason for the radio to be influencing the accelerometer — they’re even operating off separate voltages (the accelerometer being the one 5V part on the board). Hacking the firmware to send out telemetry packets at a different rate confirmed the diagnosis though — the accelerometer was reporting lower acceleration while the radio’s sending data. Passing the buck to Keith, it turned out that being the one 5V part was a problem — the radio was using enough current to cause the supply voltage to drop slightly, which caused all the other sensors to scale proportionally (and thus still be interpreted correctly), but the accelerometer kept operating at 5V leading to a higher output voltage which gets interpreted as lower acceleration. One brief idea was to try comparing the acceleration sensor to the 1.25V generated by the cpu/radio chip, but unfortunately it gets pulled down worse than the 3V does. Fortunately this happens on more than just my board (though not all of them), so hopefully someone’ll think up a fix. I’m figuring I’ll just rely on cleaning up the data in post-processing — since it’s pretty visible and regular, that shouldn’t be too hard. Next on the agenda though is trying some real-time integration with Google Earth — basically letting altosui dump telemetry data as normal, but also watching the output file for updates, running a separate altosui process to generate a new KML file from it, which Google Earth is watching and displaying in turn. I think I’ve got all the pieces for that pretty ready, mostly just waiting for next weekend’s QRS launch, and crossing my fingers my port HP Mini 2133 can handle the load. In any event, I hacked up some scripts to simulate the process using data from my third flight, and it seemed to work ok. Check out the recording: BTW, if that sounds like fun (and if it doesn’t, you’re doing it wrong), now would probably be a good time to register for lca and sign up to the rocketry miniconf — there’s apparently still a couple of days left before early bird prices run out. Progressive taxation I saw a couple of things over the last couple of days about progressive taxation — one was a Malcolm Gladwell video on youtube about how a top tax rate of 91% is awesome and Manhattan Democrats are way smarter than Floridian Republicans; the other an article by Greg Mankiw in the New York Times about how he wants to write articles, but is disinclined too because if he does, Obama will steal from his kids. Gladwell’s bit seems like almost pure theatre to me — the only bit of data is that during and after WW2 the US had a top marginal tax rate of just over 90% on incomes of200,000 (well, except that WW2 and the debt the US accrued in fighting it isn’t actually mentioned). Gladwell equates that to a present day individual income of two million a year, which seems to be based on the official inflation rate; comparing it against median income at the time (PDF) gives a multiplier of 13.5 ($50,000/$3,700) for a top-tax bracket household income of $5.4 million ($2.7 million individual). I find it pretty hard to reason about making that much money, but I think it’s interesting to notice that the tax rate of households earning 5x the median income (ie $250,000 now,$18,500 then) is already pretty similar: 33% now, 35% then. Of course in 1951 the US was paying off debt, rather than accruing it… (I can’t find a similar table of income tax rates or median incomes for Australia; but our median household income is about $67,000 now and a household earning$250,000 a year would have a marginal rate between 40% and 45%, and seems to have been about 75% for a few years after WW2)

Meanwhile, Mankiw’s point comes down to some simple compound interest maths: getting paid $1000 now and investing it at 8% to give to your kids in 30 years would result in: (1) a$10,000 inheritance if it weren’t taxed, or (2) a $1,000 inheritance after income tax, dividend tax and estate tax — so effectively those taxes add up to a 90% tax rate anyway. If you’re weighing up whether to spend the money now or save it for your kids, you get two other options: (3) spend$523 on yourself, or (4) spend $1000 through your company. An inflation rate of just 2.2% (the RBA aims for between 3% and 4%) says (3) is better than (2), and if you want to know why evil corporations are so popular, comparing (3) and (4) might give it away… An approach to avoiding that problem is switching to consumption taxes like the GST instead of income taxes — so you discourage people spending money rather than earning it. At first glance that doesn’t make a difference: there’s no point earning money if you can’t spend it. But it does make a huge difference to savings. For Mankiw’s example: 47.7% income tax ($1000 – $477 =$523) equates to 91.2% consumption tax (as compared to 10% GST); but your kids get $10,000 so can buy$5,230 worth of goods and still afford the additional $4,770 in taxes. As opposed to only getting$1,000 worth of goods without any consumption taxes.

The other side of the coin is what happens to government revenues. In Mankiw’s example, the government would receive $477 in the first year’s tax return,$1,173 over the next thirty years (about $40 per year), and$571 when the funds are inherited for a total of $2,221. That would work out pretty much the same if the government instead sold 30-year treasury bonds to match that income, and then paid off that debt once it collected the consumption tax. Since US Treasury’s are currently worth 3.75% at 30 years at the moment, that turns into$3,900 worth of debt after thirty years; which in turn leaves the government better off by $870. The improvement is due to the difference between the private return on saving (8%) versus the government’s cost of borrowing (3.75%). Given the assumptions then, everyone wins: the parent, the kids, the government. It’s possible that would be the case in reality too; though it’s not certain. The main challenges are in the rates: if there’s a lot more saving going on (because it’s taxed less and thus more effective), then interest rates are liable to go down unless there’s a corresponding uptick in demand, which for interest rates means an uptick in economic activity. If Mankiw’s representative in being more inclined to work more in that scenario, that’s at least a plausible outcome. Similarly, if there’s a lot more government borrowing going on (because their revenue is becoming more deferred), then their rates might rise. In the scenario above, bond rates of 4.85% is the break even point in terms of a single 91.2% consumption tax matching a 47.7% tax rate on income and dividends and a 35% inheritance tax. Not worrying about taxing income makes a bunch of things easier: there’s no more worries about earned income, versus interest income, versus superannuation income, versus dividend income, versus capital gains, versus fringe benefits, etc. One thing it makes harder is having a progressive tax system — which is to say that people who are “worth” more are forced to contribute a higher share of their “worth” to government finances. With a progressive income tax, that means people who earn more pay more. With a progressive consumption tax, that would mean that people who spend more pay more — so someone buying discount soup might pay 10% GST (equivalent to 9.1% income tax), someone buying a wide screen tv might pay 50% (33% income tax) and someone buying a yacht might pay 150% (60% income tax). Because hey, if your biggest expenses are cans of soup, you probably can’t afford to contribute much to the government, but if you’re buying yachts… One way to handle that would be to make higher GST rates kick in at higher prices — so you pay 10% for things costing up to$100, 50% for things costing up to $10000, and 150% for things costing more than that. The disadvantage there is the difference in your profit margin between selling something for$9,999 including 50% GST and $16,668 including 150% GST is$1.20, which is going to distort things. Why spend $60,000 on a nice car at 150% GST, if you can spend$9,999 on a basic car, $9,999 on electonics,$9,999 on other accessories, and $9,999 on labour to get them put together and end up with a nicer car, happier salesmen, and$20,000 in savings?

Another way to get a progressive income tax would be by doing tax refunds: everyone pays the highest rate when they buy stuff, but you then submit a return with your invoices, and get a refund. If you spend $20,000 on groceries over the year, at say 20% GST, then reducing your GST to 10% would be a refund of$1,667. If you spend $50,000 on groceries and a car, you might only get to reduce your GST to an average of 15%, for a refund of$2,090. If you spend $1,000,000 on groceries, a car, and a holiday home, you might be up to an average of 19.5% for a refund of just$4,170. Coming up with a formula that always gives you more dollars the more expenditure you report (so there’s no advantage to under reporting), but also applies a higher rate the more you spend (so it’s still progressive) isn’t terribly hard.

The downside is the paying upfront is harshest on the poorest: if you’re spending $2,000 a month on food it doesn’t help to know that$1,200 of that is 150% GST and you’ll get most of it back next year if you’re only earning $900 a month. But equally it wouldn’t be hard to have CentreLink offices just hand out$1,120 a month to anyone who asks (and provides their tax file number), and confidently expect to collect it back in GST pretty quickly. Having the “danger” be that you hand out $1,120 to someone who doesn’t end up spending$2,000 a month or more doesn’t seem terribly bad to me. And there’s no problem handing out $1,200 to someone making thousands a week, because you can just deduct it from whatever they were going to claim on their return anyway. As I understand it, there’s not much problem with GST avoidance for three structural reasons: one is that at 10%, it’s just not that big a deal; another is that since it’s nationwide, avoiding it legally tends to involve other problems whether it be postage/shipping costs, delays, timezone differences, legal complexities or something else; and third is that because businesses get to claim tax credits for their purchases there’s paper trails at both ends meaning it’s hard to do any significant off-book work without getting caught. Increasing the rate substantially (from 10% to 150%) could end up encouraging imports — why buy a locally built yacht for$750,000 (150% GST) when you could buy it overseas for $360,000 (20% VAT say) and get it shipped here for$50,000? I don’t know if collecting GST at the border is a sufficiently solved problem to cope with that sort of incentive… On the other hand, having more people getting some degree of refund means it’s harder to avoid getting caught by the auditors if you’re not passing on the government’s tithe, so that’s possibly not too bad.

LCA Schedule

It appears the first draft of the linux.conf.au 2011 schedule (described by some as a thing of great beauty) is up as of this morning. Looks promising to me.

Of note:

• There’s lots of electronics-related talks (Arduino miniconf, Rocketry miniconf, Lunar Numbat, Freeing Production, “Use the Force, Linus”, All Chips No Salsa, e4Meter, Growing Food with Open Source, Lightweight Messaging, Misterhouse, and the Linux Powered Coffee Roaster). If you count mesh telephony too and don’t count the TBD slot, you can spend every day but Wednesday ensconced in hardware-hacking talks of one sort or another.
• There seems like reasonable female representation — Haecksen miniconf, LORE, HTML5 Video, Documentation, Intelligent Web, Incubation and Mentoring, Perl Best Practices, Project Managers, Growing Food with Open Source; so 7% of the miniconfs and 13% of the talks so far announced.
• Speaking of oppressed minorities, there’s also a couple of talks about non-Linux OSes: pf and pfsync on OpenBSD, and HaikuOS. Neato.
• Maybe it’s just me, but there seems to be a lot of “graphics” talks this year: GLSL, OptlPortal, Pixels from a Distance, X and the Future of Linux Graphics, HTML5 Video, Anatomy of a Graphics Driver; and depending on your point of view Print: The Final Frontier, Non-Visual Access, Can’t Touch This, and the X Server Development Process.
• The cloud/virtualisation stuff seems low-key this year: there’s Freeing the Cloud, Roll Your Own Cloud, Virtual Networking Performance, Virtualised Network Bandwidth Control, and ACID in the Cloud (that somehow doesn’t include an acid rain pun in the abstract). Of course, there’s also the “Freedom in the Cloud” and “Multicore and Parallel Computing” miniconfs which are probably pretty on point, not to mention the Sysadmin and Data Storage miniconfs which could see a bunch of related talks too.

And a bunch of other talks too, obviously. What looks like eight two-hour tutorial slots are yet to be published, maybe six more talks to be added, and three more keynotes (or given the arrangement of blank slots, maybe two more talks and four more keynotes). Also, there’s the PDNS on Wednesday, Penguin Dinner on Thursday, both over the river. And then there’s Open Day on Saturday, and an as yet not completely organised rocket launch sometime too…

Some Lenny development cycle stats

I’ve been playing with some graphing tools lately, in particular Dan Vanderkam’s dygraphs JavaScript Visualization Library. So far I’ve translated the RC bug list (the “official” one, not the other one) into the appropriate format, generated some numbers for an LD50 equivalent for bugs, and on Wouter’s suggestion the buildd stats.

One of the nice things about the dygraphs library is it lets you dynamically play with the date range you’re interested in; and you can also apply a rolling average to smooth out some of the spikiness in the data. Using that to restrict the above graphs to the lenny development cycle (from etch’s release in April 2007 to lenny’s release in February 2009) gives some interesting stats. Remembering that the freeze started in late July 2008 (Debconf 8 was a couple of weeks later in August 2008).

RC bugs; first:

Not sure there’s a lot of really interesting stuff to deduce from that, but there’s a couple of interesting things to note. One is that before the freeze, there were some significant spikes in the bug count — July 2007, September 2007, November 2007, and April 2008, in particular; but after the freeze, the spikes above trend were very minor, both in size and duration. Obviously all of those are trivial in comparison to the initial spurt in bugs between April and June 2007, though. Also interesting is that by about nine months in, lenny had fewer RC bugs than the stable release it was replacing (etch) — and given that’s against a 22 month dev cycle, it’s only 40% of etch’s life as stable. Of course some of that may simply be due to a lack of accuracy in tracking RC bugs in stable; or a lack of accuracy in the RC bugcount software.

Quite a bit more interesting is the trend of the number of bugs (of all sorts — wishlist, minor, normal, RC, etc) filed each week — it varies quite a bit up until the freeze, but without any particular trend; but pretty much as soon as the freeze is announced trends steadily downward until the release occurs at which point there’s about half as many bugs being filed each week as there were before the freeze. And after lenny’s released it starts going straight back up. There’s a few possible explanations for the cause of that: it might be due to fewer changes being uploaded due to the freeze, and thus less bugs being filed; it might be due to people focussing on fixing bugs rather than finding them; it might be due to something completely unrelated.

An measure of development activity that I find intriguing is what I’m calling the “LD50” — the median number of days it takes a bug to be closed, or the “lethal dosage of development time for 50% of the bug population”. That’s not the same as a half life, because there’s not necessarily an exponential decay behaviour — I haven’t looked into that at all yet. But it’s a similar idea. Anyway, working out the LD50 for cohorts of bugs filed in each week brings out some useful info. In particular for bugs filed up until the lenny freeze, the median days until a fix ranged from as low as 40 days to up to 120 days; but when the freeze was declared, that shot straight up to 180 days. Since then it’s gradually dropped back down, but it’s still quite high. As far as I can tell, this feature was unique to the lenny release — previous releases didn’t have the same effect, at least to anywhere near that scale. As to the cause — maybe the bugs got harder to fix, or people started prioritising previously filed bugs (eg RC bugs), or were working on things that aren’t tracked in the BTS. But it’s interesting to note that was happening at the same time that fewer bugs were being filed each week — and indeed it suggests an alternative explanation for fewer bugs being filed each week: maybe people noticed that Debian bugs weren’t getting fixed as quickly, and didn’t bother reporting them as often.

This is a look at the buildd “graph2”, which is each architecture’s percentage of (source) packages that are up to date, out of the packages actually uploaded on that architecture. (The buildd “graph” is similar, but does a percentage of all packages that are meant to be built on the architecture) Without applying the rolling average it’s a bit messy. Doing a rolling average over two weeks makes things much simpler to look at, even if that doesn’t turn out that helpful in this case:

Really the only interesting spots I can see in those graphs are that all the architectures except i386 and amd64 had serious variability in how up to date their builds right up until the freeze — and even then there was still a bit of inconsistency just a few months before the actual release. And, of course, straight after both the etch and lenny release, the proportion of up to date packages for various architectures drops precipitiously.

Interestingly, comparing those properties to the current spot in squeeze’s development seems to indicate things are promising for a release: the buildd up-to-dateness for all architectures looks like it’s stabilised above 98% for a couple of months; the weekly number of bugs filed has dropped down from a high of 1250 a week to about 770 at the moment; and the LD50 has dropped from 170 days shortly after lenny’s freeze to just under 80 days currently (though that’s still quite a bit higher than the 40 days just before lenny’s freeze). The only downside is the RC bug count is still fairly high (at 550), though the turmzimmer RC count is a little better at only 300, currently.

LMSR Implementation Notes, two

At the end of my previous post I mentioned some thoughts on dealing with more interesting initial states ($$q^0$$). We’ll define our initial state by choosing the amount of funds we’re willing to lose $$F$$, and a set of initial prices $$0 < p_i(q^0) < 1$$. Unless $$p_i(q^0) = \frac{1}{n}$$ for all $$i$$, we will be forced to set $$q^0_i > 0$$ in some (possibly all) cases. We will treat this as implying a virtual payout from the market maker to the market maker.

The maximum loss, is then given by $$C(q^0) – \min(q^0_i) = F$$ (since the final payout will be $$q_j – q^0_j$$, the money collected will be $$C(q)-C(q^0)$$, and $$C(q) \ge q_j$$).

If we wish to restrict quantities $$q_i$$ to be integers, we face a dificulty at this point. Working from the relationship between $$p_i(q^0)$$ and $$q^0_i$$ gives:

\begin{aligned} p_i(q^0) & = \frac{e^{q^0_i/\beta}}{\sum_{j=1}^{n}{e^{q^0_j/\beta}}} \\ & = \frac{e^{q^0_i/\beta}}{e^{C(q^0)/\beta}} \\ & = e^{q^0_i/\beta – C(q^0)/\beta} \\ \beta \ln( p_i(q^0) ) & = q^0_i – C(q^0) \\ q^0_i & = C(q^0) + \beta \ln( p_i(q^0 ) ) \end{aligned}

Since $$C(q^0)$$ is independent of $$i$$, we can immediately see that the $$i$$ with minimal $$q^0_i$$ will be the one with minimal price. Without loss of generality, assume that this is when $$i=1$$, then we can see:

\begin{aligned} F & = C(q^0) – q^0_1 \\ & = C(q^0) – \left( C(q^0) + \beta \ln(p_1(q^0)) \right) \\ & = – \beta \ln(p_1(q^0)) \\ \beta & = \frac{F}{\ln\left(p_1(q^0)^{-1}\right)} \end{aligned}

In the case where $$p_i(q^0) = \frac{1}{n}$$ is common for all outcomes this simplifies to the formula seen in the previous post.

Note that this is unlikely to result in a value of $$\beta$$ that is particularly easy to work with. However simply rounding down to the nearest representable number works fine — since $$\beta$$ is in direct proportion to the amount of funds at risk, this simply rounds down the amount of funds at risk at the same rate.

Likewise, keeping track of $$p_i(q)$$ as an implementation choice will restrict us to rational prices, and thus likely irrational values for $$q_i$$. However it’s likely we’d prefer to only offer precisely defined payoffs for precisely defined costs, even if only for ease of accounting. In order to deal with this, we can treat $$q_i = m_i(q) + g_i(q)$$ where $$m_i(q) \ge q^0_i$$ represents the (possibly increasing) virtual payout the market maker will receive, and $$g_i(q)$$ are the (integer) payouts participants will receive. In particular, we might restrict $$q^0_i \le m_i(q) < q^0_i + 1$$, so that we can calculate costs and payouts using the normal floor and ceiling functions and ensure any proceeds go to participants. This gets us very close to being able to adjust the outcomes being considered dynamically; so that we can either split a single outcome into distinct categories to achieve a more precise estimate, or merging multiple outcomes into a single category to reduce the complexity of calculations. If we look at changing the $$m \dotso n$$th outcomes from $$q$$ into new outcomes $$m' \dotso n'$$ in $$r$$, then our presumed constraints are as follows. First, if this is the most accurate assignment between the old states and the new states we can come up with (and if it's not, use those assignments instead), then we need to set the payout for all the new cases to the worst case payout for the old cases: $$g_{i'}(r) = \left\{ \begin{array}{l l} g_i(q) & \quad 1 \le i' < m \\ \max_{m \le i \le n}(g_i(q)) & \quad m' \le i' \le n \\ \end{array} \right.$$ Also, since we're not touching the prices for the first $$m-1$$ outcomes, and our prices need to add up to one, we have: \begin{aligned} p_{i'}(r) & = p_{i'}(q) \quad \forall 1 \le i' < m \\ \sum_{i'=m'}^{n'} p_{i'}(r) & = \sum_{i=m}^{n} p_i(q) \end{aligned} And most importantly, we wish to limit the additional funds we commit to $$\Delta F$$ (possibly zero or negative), and thus $$C(r) = C(q) + \Delta F$$. Using the relationship between $$p_i(r)$$ and $$r_i$$ again, gives: \begin{aligned} C(r) & = r_i - \gamma \ln(p_i(r)) \\ & = m_i(r) + g_i(r) - \gamma \ln(p_i(r)) \\ \Delta F & = m_i(r) + g_i(r) - C(q) - \gamma \ln(p_i(r)) \\ \Delta F & \ge g_i(r) - C(q) - \gamma \ln(p_i(r)) \\ \Delta F & \ge \left\{ \begin{array}{l l} g_i(q) - C(q) - \gamma \ln(p_i(q)) & \quad 1 \le i < m \\ g_i(r) - C(q) - \gamma \ln(p_i(r)) & \quad m \le i \le n \\ \end{array} \right. \\ \Delta F & \ge \left\{ \begin{array}{l l} g_i(q) - C(q) - \left(q_i - C(q)\right) & \quad 1 \le i < m \\ \max_{m \le j \le n}(g_j(q)) - C(q) - \gamma \ln(p_i(r)) & \quad m \le i \le n \\ \end{array} \right. \\ \Delta F & \ge \left\{ \begin{array}{l l} -m_i(q) & \quad 1 \le i < m \\ \max_{m \le j \le n}\left( g_j(q) - \left( q_j - \beta \ln(p_j(q)) \right) \right) - \gamma \ln(p_i(r)) & \quad m \le i \le n \\ \end{array} \right. \\ \Delta F & \ge \left\{ \begin{array}{l l} -m_i(q) & \quad 1 \le i < m \\ \max_{m \le j \le n}\left( -m_j(q) + \beta \ln(p_j(q)) \right) - \gamma \ln(p_i(r)) & \quad m \le i \le n \\ \end{array} \right. \\ \end{aligned} Setting where $$\mu$$ to be the modified outcome with maximum payout (that is, $$g_\mu(q) = \max_{m \le j \le n}(g_j(q))$$, $$m \le \mu \le n$$) and $$\nu$$ to be the least new price (so $$m' \le \nu \le n$$ such that $$p_\nu(r) = \min_{m' \le j \le n'}(p_i(r))$$) lets us simplify this to: $$\Delta F \ge -m_i(q) \quad \forall 1 \le i < m$$ and $$\gamma \le \frac{\Delta F + m_\mu(q) + \beta \ln\left(p_\mu(q)^{-1}\right)}{\ln\left(p_\nu(r)^{-1}\right)}$$ Since $$m_i(q) \ge 0$$, one simple approach to satisfying the inequalities is to simply drop the $$m_i(q)$$ terms, giving: $$\Delta F \ge 0 \quad \text{and} \quad \gamma \le \frac{\Delta F + \beta \ln(p_\mu(q)^{-1})}{\ln(p_\nu(r)^{-1})}$$ This has the drawback of not providing the maximum liquidity for the funds thought to be at risk, however.