MathJax is pretty cool — it’s essentially a client-side JavaScript implementation of LaTeX, so you can write maths in ASCII, like “x^n + y^n = z^n”, surround it with dollar signs, and have it look like:

$$ x^n + y^n = z^n $$

And, of course, you can be more complicated if you like:

$$ C(\mathbf{q}) = b(\mathbf{q}) \log\left( \sum_i e^{\frac{q_i}{b(\mathbf{q})}} \right) $$

Inclusion in WordPress is easy: you unpack the MathJax beta on your website, add a “script” line so that the MathJax javascript is loaded, and it dynamically displays the maths when the page is loaded. It also manages to do it with real fonts, so you can select bits of the equations, and not have to deal with ugly images — oh, and it zooms nicely.

Of course, there’s a downside to having a client side script redisplay the formulas, and I suspect everyone reading via RSS will have already picked up on what it is…

The semiotic web

Quite a long time ago I read a fascinating article on semiotics and user-interface design. My recollection is that it made the argument that computer user interfaces could be broken up into roughly three branches: “menus”, where you have a few options to choose between, and that’s it; “WIMP paradigm” where you’ve got windows, icons, menus and a pointer and can gesticulate to get things done; and “command oriented” where you type commands in to have things happen.

While the WIMP paradigm is obviously pretty good, it’s restricted by its “metaphoric” nature: you have to represent everything you want to do with a picture — so if you don’t have a picture for something, you can’t do anything with it. In effect, it’s reduces your interaction with computers to point-and-grunt, which is really kind of demeaning for its operators. Can you imagine if the “communication skills” that were expected of you in a management role in business were the ability to point accurately and be able to make two distinct grunting noises?

On the other hand, if your system’s smart enough to actually do what you want just based on a wave of your hand that is pretty appealing — it’s just that when you want something unusual — or when your grunts and handwaving aren’t getting your point across — you can’t sit down and explain what you want merely with more grunts and pointing.

Obviously that’s where programming and command lines come in — both of which give you a range of fairly powerful languages to communicate with computers, and both of which are what people end up using when they want to get new and complicated things done.

It’s probably fair to say that the difference between programming languages and command line invocations is similar to essays and instant messaging — programs and essays tend to be long and expect certain formulas to be followed, but also tend to remain relevant for an extended period; an IM or a command line invocation tends to be brief, often a bit abbreviated, and only really interesting exactly when it’s written. Perhaps “tweet” or “facebook status update” would be a more modern version of IM — what can I say, I’m an old fogey. In any event, my impression is that the command line approach is often a good compromise when point-and-grunt fails: it’s not too much more effort, but brings you a lot more power. For instance,

$ for a in *.htm; do mv "$a" "${a%.htm}.html"; done

isn’t a very complicated way of saying “rename all those .htm files to .html”, compared to first creating a program like:

#!/usr/bin/env python
import os
for name in os.listdir("."):
    if name.endswith(".htm"):
        os.rename(name, name[:-4]+".html")

and then running it. And obviously, one of the advantages of Unix systems is that they have a very powerful command line system.

In any event, one of the things that strikes me about all the SaaS and cloud stuff is that there really isn’t much a linguistic equivalent to the command line for the web. If I want to do something with gmail, or flickr, or facebook I’m either pointing and grunting, or delving deeply into HTML, javascript, URLs, REST interfaces and whatever else to make use of whatever arbitrary APIs happen to be available.

A few services do have specialised command line tools of course — there’s GoogleCL, various little things to upload to flickr, the bts tool in devscripts to play with the Debian bug tracking system, and so forth.

But one of the big advantages of the web is that you aren’t meant to need special client side tools — you just have a browser, and leave the smarts on whichever web server you’re accessing. And you don’t get that if you have to install a silly little app to interface with whichever silly little website you happen to be interested in.

So I think there ought to be a standard “command line” API for webapps, so that you can say something like:

$ web search -q='hello world'

to do a Google search for ‘hello world’. The mapping from the above command line to a URL is straightforward: up until the option arguments, each word gets converted into a portion of the URL path, so the base url is, and options get put after a question mark and separated by ampersands, with regular URL quoting (spaces become plusses, irregular characters get converted to a percent and a hex code), in this case ?q=hello+world.

The obvious advantage is you can then use the same program for other webapps, such as the Debian BTS:

$ web cgi-bin bugreport.cgi --bug=123456 --mbox=yes
From Tue Dec 11 11:32:47 2001
Received: (at submit) by; 11 Dec 2001 17:32:47 +0000
Received: from [] (root)
	by with esmtp (Exim 3.12 1 (Debian))
	id 16Dqlr-0007yg-00; Tue, 11 Dec 2001 11:32:47 -0600

It obviously looks cleaner when you use the shorter url (web 123456), although due to the way the BTS is setup, you also lose the ability to specify things like mbox format then.

Of course, web pages are in all sorts of weird formats, too: having Google’s HTML and javascript splatter all over your terminal isn’t very pleasant, for instance. But that’s what pipes are for, right?

$ web chart --cht=p3 \
    --chs=400x150 --chd=t:2,3,5,10,20,60 \
    --chl='Alice|Bob|Carol|Dave|Ella|Fred' | display


It’d probably be interesting to make “web” clever enough to automatically pipe images to display and HTML to firefox and so on, depending on what media type is returned.

Obviously you can use aliases just like you’d use bookmarks on the web, so saying:

$ alias gchart='web chart'
$ alias debbug='web cgi-bin bugreport.cgi'

lets you type a little less.

Anyway, I think that makes for a kind-of interesting paradigm for looking at the web. And the “web” app above is pretty trivial too — as described all it does is convert arguments into a URL according to the given formula.

Things get a little more interesting if you try to make things interactive; a webapp that asks you your name, waits for you to tell it, then greets you by name is made unreasonably difficult if you try to do it on a single connection (with FastCGI and nginx for instance, the client has to supply the exact length of all the information you’re going to send before it will receive anything, and if you don’t know what you’re going to need to send up front…). Which means that so far my attempts to have web localhost bash behave as expected aren’t getting very far.

The other thing that would be nice would be passing files to remote web apps — being able to say “upload this avi to youtube” would be more elegant as web upload ./myvideo.avi than web upload <./myvideo.avi, but when web doesn’t know what “youtube” or “upload” actually means, that’s a bit hard to arrange. After all, maybe you were trying to tell youtube to do the uploading to your computer, and ./myvideo.avi was where you wanted it to end up.

Anyway. Thoughts appreciated.

Resource Rent Maths, take 2

My previous post apparently didn’t do the economics for the resource rent analysis quite right — it seems that the idea is a cleverer company would be able to use the resource rent tax to find cheaper sources of funding, which changes things…

The idea then would be that you start your mining project seeking 60% in risky funding (they get whatever profits you make and the totality of the loss), and 40% in risk-free funding (they get the same return as they would if they invested in government bonds, whether the project succeeds or fails). That’s as opposed to the current approach of seeking 100% in risky funding.

So say you’ve raised $5B. You spend your $5B doing surveys, setting up your mine, etc. Failure here means you declare bankruptcy and the government gives you enough money to pay back the $2B of risk-free investment, plus interest, presuming the Greens don’t have their way. On the other hand, your mine might be a success, and you might, eg, start getting $1.5B in revenue, against $500M in expenses. At this point you first have to pay your “super profit” tax, which is, apparently 40% of:

  • gross receipts: $1.5B
  • less depreciation: assuming 20 year expected life, 5% of 5B = $250M
  • less running expenses: $500M
  • less “normal return” on debt/equity: 6% of $5B = $300M
  • totalling: $450M

So $180M on resource rents. You then pay corporate income tax of 30% (eventually 28%) of:

  • gross receipts: $1.5B
  • less depreciation: assuming 20 year expected life, 5% of 5B = $250M
  • less running expenses: $500M
  • less resource rent: $180M
  • totalling: $570M

So $171M ($159.6M at 28% in 2014 or so).

You then pay the risk-free return to your risk-free investors, which is 6% of $2B or $120M. (Actually, this might be tax deductible too)

So after paying expenses ($500M), resource rents ($180M), income tax ($171M) and the risk-free dividend ($120M), your $1.5B of earnings is down to $529M. Issuing all that to your risky investors, gives an annual return of 17.63%, fully-franked.

That compares to doing things the current way as follows: you raise $5B of risky investment; your mine succeeds and makes $1.5B in revenue, against $500M in expenses. You just pay company tax at 30% after expenses and depreciation, so that’s 30% of $750M, or $225M. That leaves you $775M to pay in dividends, which is an annual return of 15.5%, fully-franked.

That, obviously, is an entirely convincing investment. It relies on the government refunding the $2B of “risk-free” investment in the event that the mine falls apart, though — which, as I understand it, is the part of the plan the Greens oppose. But otherwise, the above’s fairly plausible.

The difference in those sums — profit rising from 15.5% to 17.63% is due to the level of depreciation in the above sums. If those formulas for calculating the rent and company taxes are correct, then your return on investment increases by two-thirds of your annual depreciation compared to the initial investment and decreases by a fifth of the risk-free rate. In the above case, annual depreciation was 5% of the entirety of the initial investment, and the risk-free rate was 6%, which implies an improvement of 2/3*5%-6%/5 which is the 2.13% we saw.

In reality, you’d probably need to offer a higher return to your “risk-free” investors — because if you didn’t, they’d probably just by bonds directly from the government in the first place. And if I’m not mistaken you still need to repay the principle for your risk-free investors over the life of your mind. So hopefully that simply evens out in the end.

There’s not a lot of difference in that scenario to having the government borrow enough to maintain 40% ownership in every mining operation in Australia. They’ll then receive 40% of the after-tax profits, and have to pay interest on their borrowings at the long term bond rate, which would mean (in the above example) getting $225M in company tax, then $310M in franked dividends, then paying out $120M in interest costs for a total of $415M extra per-annum. That’s more than the total of $351M in receipts in the above example, I think due to the depreciation deduction in the resource rent tax calculation.

Mechanically, there’s a few differences: the company has to gain two sorts of investment (risky shares and risk-free bonds, for instance), if it fails it has to go to a lot more trouble to pay back the risk-free investors (getting the tax office to issue a refund in cash), and the government gets to keep it mostly off its books (doesn’t have to raise funds directly, investment losses turn into tax refunds).

In any event, that should make it easier for mining companies to raise funds — they only need to raise 60% of the amount at the risky level, for the same return they previously offered.

I don’t see anything stopping you from being tricky and doing a two stage capital raising: raising $3B of risky funds to do exploration; and if that fails repaying your investors 40% ($1.2B) of their capital — then doing the risk-free fund raising to get enough cash to start production. The initial fund raising then has a chance at a 17% ongoing return, or a 60% loss — compared to currently having a chance at a 15% return or a 100% loss. Again, that should make it easier to raise funds for new projects.

On the other hand, I also still don’t see anything stopping you from transferring your profits. Say you’re a public investment company. You’ve got plenty of money from offering superannuation products or what not, and you want to get into mining because you hear it gives a high return for your investors. So you allocate a few billion to start a mining company, which does some prospecting and opens a mine. That works out, and it starts making super profits. You decide you want to reduce your tax, and get more dividends. So instead of having one privately held subsidiary mining company, whose balance sheet looks like:

  • Revenue: $1500M
  • Expenses: $500M
  • Resource rent tax: $180M
  • Company tax: $171M
  • Dividends: $649M

you decide to invest in a transport company as well. Hopefully one that’s already making a decent profit, but paying a bit more than market value works too. You then have them make an agreement that the mine will exclusively use your transport company for the next 10 or 20 years, for whatever excuse satisfies appropriate laws. Then have the transport company seriously jack up the price. Your balance sheets should then look like:

Mine Mine change Transport change Total change
Revenue $1500M +$700M +$700M
Expenses $1200M +$700M +$700M
Resource rent tax $0M -$180M n/a -$180M
Company tax $15M -$156M +$210M +$54M
Dividends $285M -$364M +$490M +$126M

And voila, your resource rent tax has been reallocated to your dividends (except for the 30% that goes to company tax, of course). It doesn’t have to be a transport company, either — any private company that you can buy outright, that isn’t hit by the resource tax, and that you can find some excuse to make your exclusive supplier of a necessary product/service will do fine. And even better, as far as I can see, even when you get rid of all the resource rent proceeds the government was hoping for from your mine, they’ve still covered 40% of your initial risk…

Resource Taxes

So it seems that taxing oil and gas is the only significant result that’s going to come out of the Henry Review, and that probably means I should work out an actual opinion on it.

As I understand it, the Resource Rent Tax as proposed by the review is meant to be a different way of charging for non-renewable resources extracted from Australia — coal, oil, gas, uranium, whatever. The aim being to increase the government’s share of the value, while maintaining profit incentives to actually find, extract and sell it. And the way the Henry review recommends achieving that is for the government to make themselves 40% partners in the investment, with their initial capital contribution being made by tax concessions, and then receiving dividend payments worth 40% of profits via the tax system. See Nicholas Gruen’s take for more on this line of thinking.

Of course, the government’s cheap — it’s not going to actually put money up front to become a 40% partner like anyone else would; it’s obviously hoping to get all the benefits without any of the risks. But that’s not economically sound — it would mess up the incentives for investment, effectively making investing in Australia 66% more expensive [0].

(And, of course, the government’s not going to be an ordinary investor either — just buying shares in mining companies would be way too straightforward…)

Instead the government’s contribution is in the form of payment of the state taxes and tax deductions, and because that’s not as valuable as actual money, their payback only kicks in when the endeavour starts making lots of money (where lots is defined as more than you’d make just loaning to the government).

On that basis the maths would go like this: in the first year you spend $3B to setup a mine, but don’t make any revenue yet. The government gives you $2B in tax credits you can use later (or possibly against other projects you’re working on). Your mine starts production, earning its first $1B ($1.5B in revenue, 500M in expenses). You then owe $280M in company tax, and another $288M in resource tax. You deduct those from your tax credits. You can presumably then pay out $432M as fully franked dividends to your investors; I’m not sure about the remaining $568M (if it’s not, $568M in unfranked dividends is equivalent to about $400M in franked dividends, the difference going to the government via the investors’ income tax). Anyway, that goes on for three and a half years or so until your tax credits have been all used up — you’re either paying out $1B in dividends a year to your investors (a 33% return) and no tax, or $832M in dividends (28% return) and $168M in tax (16.8% of earnings). After the three and a half years are up, you switch to $432M in dividends (14.4% return) and $568M in tax (56.8% of earnings). Presuming the initial investment is entirely unrecoverable (the trucks you bought wear out over the life of the mine, it’s cheaper to demolish the buildings and rebuild than try moving them to your next mine, etc); that would mean over the first three and a half years investors recover either 110% of their investment or 97% of their investment, and then earn a 7% return.

With just company tax, the same scenario would have resulted in $700M in fully franked dividends each year (23.3% return), so investors would get 93.3% of their money back after four years, and then earn 23.3%.

Except, of course, things aren’t actually that simple either, because, AIUI, some of the ongoing costs will get counted as well, so the $500M in annual expenses might mean up to an additional $333M in tax credits each year, which would not be very sound — but if some of those expenses are for “expanding the mine” they possibly should be counted as additional “investment”. Additionally, interest is earned on unspent tax credits at the government bond rate, but that would be pretty insignificant in the above example. And it’s possible the government doesn’t plan on providing tax credits worth 40% overall, but only puts in 40% of the previous investment (which would be worth 28.5%).

There’s also the “super profits” aspect — I can’t see how that’s intended to be calculated. It could be simply via the interest rate on unspent tax credits: if you’ve got $2B in tax credits, and earn the 6% bond rate on that for an additional $120M in tax credits, then you could just spend the “interest” to reduce your $288M annual resource rent liability to an $168M annual liability in perpetuity. The $120M saving then is the resource rent tax (40%) on the non-super part of the annual profits (6% of 5B). Of course, if you work things that way, you don’t have the few years of no/low tax. I wouldn’t have thought the tax office would let you work things that way, either, to be honest; but economically it’s probably meant to be treated as an equivalent outcome. Anyway, that totals to a 55.2% tax on annual earnings in the above example.

Beyond that there’s the effect on risky projects. If a mine doesn’t turn out to make money, at the moment you lose lots of money. With the government being a 40% partner, you still lose that money, but you get 66.6% of it back in tax credits. And if you didn’t “lose” the money, so much as paid your nephew (or subsidiary company, whatever) to do some prospecting for you, well, hey, that’s pretty neat, right? The risk there is pretty simple: the government wants to be counted as an investor in all the profitable mining companies, without actually exercising any judgement on what’s likely to be a good company and what’s not. And if you’ve got an investor with no judgement, people are going to take advantage of that.

Of course, being the government you can write the laws to your own advantage — so you can claim all the profits when things go great, and disclaim any losses when things go bad. That seems to be the Greens plan:

Resources Minister Martin Ferguson said the government was committed to keep a controversial plan to reimburse miners for 40 per cent of their losses.

But with opinion polls predicting the Australian Greens gaining the balance of power in the Senate after the next election, the government may have to scrap this.

Greens leader Bob Brown has said that while he supported the resource profits tax in principle, he did not want miners being rebated for their losses.

That goes directly against the positioning of the government as a “co-investor”, though, compare to Ken Henry’s reported comments or Terry McCrann’s expansion from senate testimony.

And really, nothing in the calculations above actually had anything to do with resources — just investment; you can invest in a restaurant too, and except for scaling down the numbers, the same calculations and arguments would apply. If it was really about the resources, it would make more sense for the resources themselves to be government’s initial contribution [1] to the investment. But that just goes straight back to charging royalties on whatever’s dug up, which is the system we’ve already got.

As far as sovereign risk goes, that seems a pretty simple calculation. After, what, 30 years of Hawke, Keating, Howard and Howard 2.0 Rudd, people might’ve expected pretty simple and sound economic policies — floating the dollar, privatising the banks, independent reserve bank, compulsory superannuation, the GST, free trade agreements, low inflation, gradual lowering of income tax rates, pretty good handling of the Asian currency crisis in the ’90s and the recent financial crisis. What are the odds the government will suddenly start doubling the amount it taxes various companies? Before the “super tax”, you might’ve said pretty low. Now, not so much. Will it happen again? Who knows — but I bet more people would guess it would now, than would have previously. So yeah, Australia’s sovereign risk seems way higher.

Ultimately, this is looking more and more to me like one of those ideas that sounds great at the height of a boom (“look, those people are making lots of money, gosh I wish we were those people!”), but that turns out too clever by half, and all the little complexities involved in turning theory into practice end up biting you in the butt.

[0] If (eg) you currently have to invest $10B in setup costs for every $2B profit per year your mine makes; then with the government taking 40% of that, $10B would only get you $1.2B in profit. To get $2B in after-resource-tax profit, you’d need $3.3B in before-resource-tax profit, which would mean a $16.6B in setup costs; a 66% increase.

[1] Though that runs into the “which government?” problem — the resources are owned by the states, and it’s the federal government that wants to collect more money… And that in spite of the state governments having the bigger budget problems at the moment…

The Gold Standard

I’ve been trying for a while now to figure out why I dislike the gold standard — that is, pegging currency against the price of gold. I think currencies are fundamentally arbitrary — they’re a convention that needs to be (roughly) agreed on, but whether that’s $1 for a ham sandwich or $1000 for a ham sandwich doesn’t make much difference, as long as everyone agrees which of the two it is. By that argument, saying $1 is worth 23 miligrams of gold should be fine. Now sure, that breaks down if you suddenly get a huge increase in supply of gold — if someone posts a video on youtube how to turn grass clippings into pure gold, you’re going to get some pretty serious (and worse, uncontrollable) inflation and it’ll cost more than 230 miligrams of gold (or a mower load of grass, or $10) to get a big mac and coke. But worries about vast new sources of gold is probably not a realistic objection, at least until we have asteroid mining.

So I think there must be something else to justify opposition to the gold standard, and in particular that at some point you have to argue that not being able to inflate your currency whenever you want is actually a bad thing.

To some extent the consequences of being unable to devalue your currency/inflate you money supply is playing out in Greece now, and depending on whose explanation you believe, was a cause of the Great Depression.

In both cases, the theory goes that crushing debt (war reparations, too much spending) and an inability to actually pay that debt can’t go on forever, and inflation is an easy way to get out of that. At any rate, easier than bank failures, easier than government defaults, and easier than going to war. Basically, inflation turns into a way to force everyone to forgive their debtors by a given percentage, rather than having to pick some people who get nothing back, while others get everything.

On the other hand, inflation only “really” helps with long term debt — if you have to pay someone a million dollars in ten years time, a 15% annual inflation rate lets you pay it all back by doing the equivalent of $135,000’s work in each of the last two years, even if you do nothing for the first eight years. But if you owe someone a million dollars in two years time, and can only earn $135,000 each year, you’d better hope for inflation of over 540%, or you’ll want to start bankruptcy proceedings now.

That’s essentially what happens with debt that has a variable interest rate (or rolling debts) — the lender guesses what inflation’s likely to be, and says “ok, I won’t send the boys around to collect my $1M today, but next week you owe me an extra 1%”. Which of course means inflation isn’t going to do you much good if your debts are all short term (135B euros of Greek bonds due within the next five years, a five year fixed rate mortgage, credit card debt, etc) — the people loaning you money have been clever enough to factor in the possibility of inflation and still make you pay what you owe.

In theory, all that’s fine and proper: you shouldn’t borrow more than you can pay back, and there should be some negative consequences to living off promises you never make good on.

In practice, people get into situations where it’s simply impossible to pay back a debt. Whether that’s a thin veneer over slavery in the form of debt bondage, or managing to spot a bunch of uncovered short sellers who were willing to commit to selling (in effect) more than 100% of a company, or something else.

There’s probably no simple solution to that — people are always going to want to buy now and pay later, people are always going to try making that “pay later” part as expensive as possible, and people are always going to make mistakes in estimating what’s possible: all of which leads to people getting into more debt than they can actually pay. In that view going to the gold standard to stop government getting themselves out of debt by printing money just makes it harder to get out of too much debt, it doesn’t actually decrease the factors that get people (or governments) into debt in the first place, and thus actually makes things worse, not better (at least overall: people holding long term government bonds whose worth might be inflated away have every reason to like the idea).

Of course, the main resolution surely has to be liquidation/bankruptcy proceedings, where creditors only end up getting a percentage of what they’re owed adjudicated by some trusted third party, the debtor gets put on a list of bad people who don’t pay their debts reliably, and otherwise everyone goes back to living their lives, usually including the bankrupt individual. That approach seems a lot better than the pure debt-market approach of having risky debts become increasingly short term and increasingly expensive until either someone rich comes along and provides a bailout, or there’s a global recession.

Henry Tax Review, post release

And here was me thinking forming an opinion on yesterday’s tax review would be hard. Turns out, not so much: the review itself was really well done, pretty much what you’d hope for from a professional public service; the government’s response, on the other hand, was impressively gutless.

The most interesting recommendation (to me) in the Henry Review was the changes to personal income tax, which I’d summarise as:

  • Raise the tax-free threshold from $6,000 to $25,000
  • Change the official income tax rate to 35% for up to $180,000 per annum (ie, almost everyone), and leave it at 45% for above that
  • Drop the Medicare Levy, Low Income Tax Offset, etc, and just have a single rate
  • Fringe benefits tax should be simplified (particularly for cars), moved to market valuations, and taxed progressively rather than always at the top marginal rate
  • Introduce a standard deduction for work related expenses to simplify filing

At first glance, I thought the 35% rate seemed high (it’s currently 15% to $35,000 and 30% to $80,000 and 38% to $180,000). But graphing the rates seems to dispute that thought:

Tax Rates

There is some loss — people earning between $35,000 and $65,000 pay $250 more in tax per year, which then rises $1,000 more per year at $80,000 dropping back to parity at $113,000. People earning more than that get a small tax break that eventually levels off at a flat $2,000 tax benefit for people earning $180,000 or more. At the other end of the scale, people earning between $18,000 and $30,000 pay between $450 and $1500 less tax per year, which seems sensible. And of course, everyone benefits from having a simpler tax system, and (in theory) not having to pay an accountant for help filing your return. And the marginal tax rates become both easier to understand and generally the same or lower, which hopefully means less people are in the situation of thinking “well, I could work a few days a week, but I’d end up with less money that way, so I’ll set at home and watch Oprah instead”.

(Caveat: those numbers aren’t strictly right — they’re based on the current marginal rates and the LITO; so they don’t include the Medicare levy, and probably other things. This is why I’m not the treasury department. But I think it’s a fair indication of what the effect would be)

It’s not clear to me what Labor’s planning to do with the recommendations here — they haven’t accepted them, but they didn’t officially reject them yet either. Presumably they’ll have to say something, sometime, about it, but I don’t see any advantage to waiting if they were going to take this and run with it. I guess that makes them an exercise in cowardice: doing something about it would be too hard, as would finding actual flaws with it, so let’s just ignore it and hope we get re-elected anyway.

The company tax changes seem similarly motivated — dropping two percent over five years? Is anyone seriously going to pay attention to that? I don’t think so; and the Henry review’s recommendation was, in my opinion, much less subtle: dropping from 30% to 25%, the idea being merely to stay in line with international trends, particularly those for small economies. I suppose I can appreciate taking some time to cut the rate, but not if you’re also only going to cut it by what looks like a token amount.

As far as I can see the only reason that recommendation even got the token support from the government that it did was that the Henry review explicitly linked it with the 40% resource rent tax — recommending that the 25% company tax and the 40% resource tax be balanced to maintain an overall 55% tax rate (25+(1-25)*40=55). I can’t say I understand the resource rent tax (or the “super profits tax” as the government calls it) — but then I don’t understand the motivation for it either; if you get $90B of profits, how do you only pay $10B in “resource taxes” when you should be paying at least 30% company tax on profits, which would be $27B? Or are we not counting some tax receipts, in order to make the profits sound more unfair? The numbers all sound very shoddy there.

And of course, the government is using the “super profits tax” to pay for superannuation concessions, which is a clever sound bite I’m sure; while the Henry review was recommending they be tied into infrastructure spending, which seems like an actual logical link (Losing non-renewable resources? Spend the proceeds on stuff that will last…) But a $700M infrastructure fund, versus a $9,000M resource rent tax doesn’t sound like an impressive match to me.

As far as simplification goes, there seems to be lots in the review’s recommendations, and pretty much none in the government’s changes. Whether it’s justified or not, the resource tax is a bunch of extra regulation, that’s not accompanied (as far as I can see) by any reduction in regulation. I guess I’m not terribly surprised, but that was the one election promise that I was actually impressed by and that I figured the government might be willing to keep.

Henry Tax Review

The Henry Tax Review is supposed to be released tomorrow. Since that might warrant a blog post, and possibly even some criticism, I thought it might be interesting to note down some criteria beforehand to remove one avenue for bias.

One issue for regulatory reform is whether changes make the entire system simpler or more complex — more complex regulation potentially handles trickier situations more “fairly”, but at the same time forces everyone to incur the cost of understanding all the complications, even if only to be sure they don’t apply in their situation. The Rudd government made an election promise to that effect:

Labor believes that when making new regulations, governments should remove an existing regulation and should design rules with small businesses in mind. We call this approach Ďthink smallí. It will require government departments and agencies to better understand the realities faced by businesses on the ground. Labor will adopt a Ďone-in, one-outí principle for federal government regulation. This means that when a new regulation is proposed it must be accompanied by a proposal to remove an existing regulation.

There’s a deregulation group as part of the Department of Finance, but I haven’t seen much talk either way as to how this promise has been holding up. In theory, based on this principle, the Henry review should be proposing about as much reduction in regulation as new regulation though.

One of the obvious ways to reduce the complexity of the tax system would be to remove the various GST-free categories of goods (unprocessed food, etc). It would probably be appropriate to compensate that with a small increase in some welfare payments.

It’s probably also one of the few changes to the GST that’s within the review’s purview, given the clause in its terms of reference that goes “The review will reflect the government’s policy not to increase the rate or broaden the base of the goods and services tax (GST); preserve tax-free superannuation payments for the over 60s; and the announced aspirational personal income tax goals”. It’ll be especially interesting to see how true the Henry review has stayed to that policy, compared to the conclusions being drawn from Rudd’s hospital plan on a backflip there.

Personally, I quite like the “Reform 30/30” proposal, which involves a massive simplification of both welfare payments and income tax. Supposedly it would boost government revenue by $15B per year, which is a significant fraction of the $125B in income tax or $43B in GST received in the 2008/9 financial year. On the other hand it comes at a cost of not giving welfare bonuses to people doing good things (having kids, buying houses, studying, etc) and taking less account of various other ways in which you might be rich other than having a high paying job (rich parents, rich spouse, money already in the bank, nice house, etc).

Presumably anything like that would be a non-starter politically, but some movement in that direction ought to be plausible. There’s been some talk for a while now about having a simplified tax return, so that you can just tick a box and accept whatever the ATO says rather than fill out a bunch of forms — basically heaps easier and quicker, but you don’t get to claim lots of deductions. Given the ATO’s electronic systems and reporting of interest payments by banks, and PAYG contributions by employers, that ought to be pretty plausible to setup, and might start paving the way for cutting out lots of personal tax deductions — why keep them if barely anyone’s using them, after all?

That, at least, is kind-of like cutting welfare payments — a tax deduction for $1000 is roughly the same as a receiving a cheque from the government for $300 if you’re at a 30% tax rate. Of course that means that deductions are being more considerate of the welfare of people paying more tax, which is similar to being more considerate of the people who lease need consideration.

I can’t see how the Henry review will be able to recommend much in the way of cutting welfare expenditures in general ($125B of expenses in 2008/9), but they’ve at least been told “The review should take into account the relationships of the tax system with the transfer payments system and other social support payments, rules and concessions, with a view to improving incentives to work, reducing complexity and maintaining cohesion”. So maybe there will be some ideas on this.

Maybe this will also mean the Ergas review will be revealed soon too. It looks like it’s even more out there than the 30/30 proposal, with a roughly flat 20% income tax, raising tax on income from superannuation, and taxing the family home. I’m pretty surprised that there’s anything out there more wacky than what the Liberal Democratic Party came up with, but maybe that’s due to its progenitor — supposedly Turnbull ordered the review as shadow treasurer without bothering to even tell Brendan Nelson. Still, it would be interesting to be able to compare the reasoning and recommendations to those of the Treasury-Secretary’s in tomorrow’s report.

WoBloMo 2, Epilogue

This year’s woblomo was a bit more consistent than last time — every post was either on the appropriate odd day of March, or before midday the next day. (I did backdate a few posts that actually got posted between midnight and about 4am the next day, just to keep the calendar widget in the sidebar pretty)

I felt a bit pressured this time around on what I was posting — there were a couple of topics I would’ve liked to have posted on, but didn’t because I wasn’t sure I’d be able to finish them in time. On the other hand most of the posts were interesting to me at least, and I learnt a few things in writing them (first time I’ve played with R, or done a youtube screencast, in particular). Overall I’d call it a pretty good experience.

I think for April I’m going to try to do a bunch of blog posts again, but aiming to be a bit more bursty (so if I want to post about X, I can spend a couple of days thinking about it first). I’m trying out’s getting things done tips at the moment too, which I think should work okay with that plan.

Excuses to use gearman

Sometime ago I stumbled across gearman and thought it looked cool — map/reduce and distributed processing for shell commands? Neato! Unfortunately, at the time I was looking for a distributed database (and found couchdb) so that didn’t go anywhere. The session at lca reminded me how cool at was, but didn’t get much further on an actual use.

I’m thinking now, though, that it might be a good match for my notmuch usage: that is, I could have incoming mail tagged as unread and filed into an inbox or list or whatever, and then separately have all that mail sent through gearman to get flagged as spam. The win there being that mail gets delivered immediately, still gets flagged pretty much ASAP, and can easily get flagged as spam before I see it even if the initial check doesn’t flag it as spam. Gearman (and notmuch’s tagging) should allow that to be rate limited, queued, and handled asynchronously without much hassle.

Fingers crossed. :)

Email: how much does it suck?

Yes, this post is going to mention notmuch. Whether that’s the answer to the question posed is another matter…

My email habits have defaulted to mutt and procmail for quite some years now (and prior to mutt, pine which was essentially the same except older and less nifty). I had a brief interlude under OS X with Thunderbird and IMAP right up until poor filesystem performance drove me back to Linux, and with it mutt. Spam has obviously also been a problem, and I’ve variously used spamassassin, greylisting and dspam to combat it. Greylisting was great when it started, but now seems mostly useless, at least for my mail — and the delay it causes when you register to websites who want to send you an email for confirmation gets annoying. dspam was working great for a while, but ended up taking too much time and CPU the way I was using it, and trying to move it from my laptop to my server broke it almost completely.

When I went off to this year I decided to take my netbook, and decided it wasn’t worth trying to move my email from my laptop over to my netbook (having 40k mails in your inbox isn’t very workable, least of all when 80% of the recent ones are spam), so decided to get all my new mail copied to my gmail account instead. I figured when I got back I could work something out, and in the meantime, gmail was at least easy.

The only real work I ended up needing for that was to setup a bunch of filters for incoming list mail (I’ve got 19 at the moment) and a couple more to do away with stupid automated mails I don’t actually want to ever see. That worked, but what was really shocking was just how wonderful it was that all my spam just went away — my gmail spam folder currently has 7057 messages from the last 30 days, but I only ever actually see maybe a couple a day. Presumably the fact that gmail sees heaps of emails and has lots of users pressing “Report spam” makes it a lot less likely for any individual user to see any individual spam, too. And having effectively declared email bankruptcy was nice too; a clean inbox really is much easier to work with. I’m not especially convinced by tags; I’m not really getting much more value out of them than folders, but having filters that can automatically (and quickly) apply to existing mails as well as new mail is quite useful. Another advantage Gmail has is it’s easy to access from multiple devices — different laptops and my mobile phone in particular.

Last time I looked at Gmail I dismissed it with the comment:

GMail is kind-of nice, but I like to be able to read my mail offline, so whatever.

And though I suppose I could try Google Gears or something to deal with that, ultimately I still feel pretty much the same way. There have been a couple of times Gmail’s been not working quite right (not loading an email in particular), and it’s often a little bit slow just due to network lag, and, honestly, I want all my email archives in one place, and I don’t particularly want to upload it all to Google. Gmail also seems somewhat unfriendly to people who want to send/read their mail in fixed-width, 80-column format.

So the question becomes how to run my own email system again and have it not only be usable, but have it be as pleasant as I’m finding Gmail.

So far, I’m thinking that (despite what I just said above) tagging is probably a key feature: not so much from making it different to use, but making it easier to work with mails. Tagging messages as spam, versus moving them from one Maildir to another just seems lots easier to deal with. And not having messages get constantly renamed when being read or replied to would be nice too. So using the aforementioned notmuch seems like a win on that score.

The idea, I think, would be to basically mimic gmail: have tags for “inbox”, “unread”, “starred”, “spam” and “trash” as well as tags for different lists I follow, and tags for other collections of things (receipts for tax purposes, particular projects). Make sure those tags are applied automatically to incoming mail, and then have some folders corresponding to various tags.

I’m not quite sure how I want to deal with spam. A major problem with dspam for me was that once a few spams got through (perhaps because I simply wasn’t reading mail), the hit rate would drop off and even more would get through afterwards. I’m not sure if that’s simply to be expected, a configuration error on my part, or what, but I think the solution needs to ultimately be doing spam scans on unread but already delivered mail, as well as during delivery. That way once I mark a few mails as spam, the system can recheck others and make them go away too, reducing my workload. I think I’d also like to start using something like Vipul’s Razor, to maintain some of the benefit of many other people noticing something is spam. I’m still not sure whether I prefer spamassassin or dspam or both, though.

For mobile access, IMAP seems like a no-brainer of a way to go; especially since Gmail’s already come up with a pretty reasonable behaviour mapping from IMAP to tags. IMAP also has a SEARCH command which might possibly be a reasonable way of exposing notmuch’s searching. An IMAP server would also solve the problem that you can currently only really use notmuch from emacs or vim, neither of which appeals to me. I guess it would also support disconnected operation to some extent by way of offlineimap. Ideally, though, I guess there’d be a way that synchronises the Xapian database. In theory it ought to be possible to make extra notmuch functionality available via IMAP extensions, but you’d need the client to support those too.

I’m not sure what client I’d want to use — I suspect it’d just be a toss up between mutt and Thunderbird. The only possible drawback there is that neither of them have the clever per-thread view of messages that notmuch and Gmail have adopted. I’m not sure how much I’d miss that.

Anyway, I’ve currently imported my old mail into notmuch (mostly), and I’m currently running my old inbox through spamassassin to try to clear out all the garbagey spam stuff. The notmuch stuff went quickly enough, but spamassassin’s currently only about a quarter of the way through.

(And yeah, I’m claiming the Hawaii/New York woblomo exemption again. Even missed UTC this time…)

Consumer reports: Mobicity

When I was looking at what new smartphone to get, the cheapest place to get it from seemed to be Mobicity which seems to be a local shopfront for a Hong Kong warehouse. That there was a discount voucher on tjoos helped too. Anyway, the price was right and it got here reasonably quickly so it all sounded good.

The other week I put it on to charge overnight and woke up the next morning to find it was dead — the “charging” light would come on, but nothing else would do anything, plugged in or not. It came with a one-year warranty, via Australian Warranty Services (who seem to sell warranties for imported phones generally too, maybe), and you fill out a web form to do claims. Despite Mobicity being just a suburb away, I got an email saying that I’d need to send my phone to Sydney Cellular Repairs. As it happened I was going to Sydney for the LA F2F the next weekend so I dropped it off on the way; and at that point things were pretty straightforward: they took the phone, got some receipt number for Mobicity, checked the phone over and a couple of days later I got an email that it was fixed and on the way back to me. And it arrived working, so… sweet! Cheap and supported. Who knew?

On not being able to think straight

When I last posted about my pygame/trigrid hax0ring, I said:

Iím now at the point where that all works, but thereís no intelligence ó peonís will buy and transport goods without checking first that anyone actually wants them.

Getting past that is proving troublesome, so this is me thinking out loud about it.

The key scenario is having two markets, with an agent who can move goods between them. Agent Example In the example that should hopefully appear at the right, we’ve got two visible markets (represented by blue circles), and an agent who can carry goods from one to the other (represented by the smaller green circle on the red line). There are three other agents connected to each of the markets, but we’re not concerned about them for the time being. The agent might already own some goods, which could either be stored at one or the other of the markets, or be being carried right now.The agent can move between the two markets (which takes some time), possibly carrying some goods. While at a market, the agent can drop off any goods being carried, or pick up some goods to carry.

As far as trade is concerned, each market maintains a list of offers to sell goods that agents can either add to or accept — in the example pictured, one market has a good offered for $9, and the other has a good offered for $15. Once an offer is accepted, the agent that made the offer is expected to ensure the goods are at the market (by delivering them, eg), and to indicate when this happens, at which point the market transfers ownership to the agent that accepted the offer, and transfers the associated payment in the other direction. The offers are stored in a list in each market, and at the moment, each agent follows the following simple, and stupid, procedure:

while True:
    (src, jobid) = wait_for_a_job()
    (cargo, price) = accept_offer(src, jobid)
    wait_for_job_completion(src, jobid)

    dst = other_end(src)
    dst_jobid = make_offer(dst, cargo, price+profit)
    job_is_completed(dst, dst_jobid)

What it should be doing is something more like:

  • scanning available offers at one market and making a new offer with some additional profit at the other market
  • only accepting an offer when its offer is accepted
  • getting payment upfront, so you don’t need to pay the supplier before you’re paid
  • dealing with offers and acceptances asynchronously with actually fulfilling them

Somehow the code for that should look something like:

def update_offers(profit, deliver_time):
    for (src, dst) in [(left, right), (right, left)]:
        for (cargo, price, arrive_time) in get_offers(src):
            add_offer(dst, cargo, price + profit, arrive_time + deliver_time)

That risks two instances of unbounded recursion: if get_offers() returns offers made by me, I’ll be making offers to deliver from left to right to left to right to left to… with huge costs and delays. So get_offers() shouldn’t return offers you made. But also, if I offer from my left to my right, someone else offers from right to “up”, and someone else offers from “up” to my left, we have the same problem.

I figure that’s best solved by adding a “contingent” field, to say “this offer is only able to be accepted by you if the offerer is able to accept the contingent offer, otherwise it’s void”. In the event that you have a chain of offers to get from A to B to C to D that allows the markets to accept all the offers simultaneously rather than giving time for some other agent to accept the offer from B and deliver it to E and muck up the whole transaction.

def update_offers(profit, deliver_time):
    for (src, dst) in [(left, right), (right, left)]:
        for (jobid, cargo, price, arrive_time) in get_offers(src):
            contingency = (src, jobid)
            add_offer(dst, cargo, price + profit, arrive_time + deliver_time, contingency)

If you let markets keep track of the “root” of the contingency tree, that also lets you limit the possibility of multi-agent offer loops.

That then needs to be hooked into actually doing the deliveries. When an offer of ours is accepted, two things happen: we’re committed to handing over some goods at one market at time t, and someone else is committed to doing the same at the other market at time t-d. We can thus maintain two lists for each market we interact with: a set of times when some goods are expected to arrive at this market, and a set of times when we’re meant to deliver goods to this market. We can then resolve our obligations by noting that either cargo will arrive at each market before it needs to be delivered, and we can collect our profit without actually doing any work; or else (exactly) one market will require some goods before they arrive, and we’ll have to transport some goods from the other market to satisfy this. Maintaining a schedule of what we should do, and working out a minimum delivery_time for our additional offers should be straightforward at that point.

(For completeness, agents should be able to be penalised for failing to deliver, and potentially rewarded for delivering goods early. Occasionally it’ll work out that the penalties for dropping one commitment will be outweighed by the rewards of delivering on something else — I’m optimistic that just coding that logic should make the overall system much more dynamic while remaining fairly understandable and predictable)

Hmm, I think that covers the next step. Guess we’ll find out when we start coding…

Thoughts on auditing systems

One of the XP systems I look after had a trojan this month — looks like it came from a fake “UPS package” mail with a zipped attachment that got clicked on, then stuck its tendrils into the registry and all over the place, and started popping warnings about viruses and instructions on how to pay for a fix. After a couple of attempts at removing the infection and finding it just coming back, looks like a reinstall is going to be easier.

Of course, on a free operating system it’s at least theoretically reasonable to know what everything on the system is supposed to be doing, so it should be possible to fix that sort of problem. There’s been a bit of discussion this past month about the md5sums control files which goes some of the way to handling that for Debian — but of course, that assumes your md5sum files aren’t compromised along with the rest of your system.

Ultimately you want two things to cope with potential compromises like this — one is to detect them as early as possible, and the other is to work out what’s infected and what’s recoverable. Which basically means you need a description of how things should be and the ability to compare that to how things actually are.

In some respects, that’s difficult to do: “how a system should work” is hard to define, and tends to change over time — and often people don’t think their systems work as they “should” even when they’re freshly installed and completely uncompromised. But if you aim a little lower, you can at least get somewhere. You could say “my system should be built from the latest Debian testing packages” and verify that, for example. Or you could keep a running tally of packages installed and removed, and say “each entry in my running tally should say what happened and be dated and match my recollection, and the packages from that tally should be Debian packages, and the files on my system should match those packages”.

Knowing what packages you’re meant to have is probably the first challenge — maybe you’re running puppet or similar and have an easy answer to that, but if you just run apt-get and aptitude whenever you want something, it’s a bit harder to tell. Are you running an ircd because you thought one day it’d be a fun thing to do, or because some warez kiddies are using it to control their botnet?

Once you know you’re meant to be have, say, python-llvm installed, you need to know which version it’s meant to be. You could say “the last version I installed, of course” — except of course your only record of that might be on your compromised system. You might say “well, I follow testing, so the latest version in that”, except that there might have been an update to that package in testing while you were compromised, or you might have installed something from unstable or experimental (or backports, or compiled it from scratch). You certainly want to know the architecture, version number and whether it was from Ubuntu, Debian or somewhere else.

Going from that step to knowing what the contents of the package is meant to be is slightly harder. If you happen to know you’re looking for the current version of python-llvm in Debian testing, then you can establish a trusted path to verify what its contents should be by downloading test testing Release, Release.gpg and Packages.gz files which will give you a verified download of a deb file (assuming you trust gpg and sha256, which is reasonable for the moment at least).

If you’re running an outdated version of the package, you’ve got more problems. You could find the original .changes file uploaded with the package to verify it based on the developer’s signature — but that will only tell you that that developer built that package, not that it was uploaded to Debian, distributed far and wide, and installed on your machine. You could find the Release/Packages files that were current when you downloaded it, and verify them, but that’s something of a chore in and of itself. You could make a note of the name, version, location and sha256sum of every package you install and keep it somewhere secure, but that’s a chore too. The easiest solution I can think of is just to treat “outdated” as “potentially compromised”, and install the current version of the package anyway. (For locally generated packages, you should presumably be able to either find an uncompromised version to compare against easily enough, or you’ll have to rebuild it from scratch as part of your recovery anyway)

Once you’ve downloaded the deb file, it’s a relatively simple matter to verify the package is correctly unpacked; a good approximation is something like:

TAR_CMD='printf "%s%s\n" "$($CKSUM - | sed s/-$//)" "${TAR_FILENAME#./}"'
export CKSUM
ar p "${DEB_PATH}" data.tar.gz | tar --to-command="$TAR_CMD" -xzf - > "$HASHES"
(cd / && $CKSUM -c) < "$HASHES"

(Caveats: assumes data.tar.gz, some debs have data.tar.bz2 instead; the extraction command above takes about 7m on my netbook (HP mini 2133) for the 420 or so debs that happen to be in my /var/cache/apt/archives (about 480MB worth); the above assumes that you have a trustworthy ar, GNU tar, gzip (or bzip2), md5sum (or sha1sum etc), and filesystem, as well as copy of the .deb; the above includes conffiles in /etc many of which will have be intentionally modified; some, but very few, .debs expect some of their distributed files outside /etc to be modified too)

You can skip the first command in that sequence if you use the md5sums files shipped with debs, but that comes with a few drawbacks, in that you're forced to rely on the md5sums files, which can be lost, not present, incomplete or, if you're using the local cache of the hashed files that dpkg keeps in /var/lib/dpkg/info, potentially compromised along with the rest of your system. The upside is there's an existing tool to verify them (debsums).

Personally, I'm now running a patched version of dpkg that generates its own .hashes files as packages are installed. That doesn't do anything about lost or compromised files, but it does ensure they're complete and at least initially present.

But even if all the files that are meant to be installed are exactly as they should be, that's not enough. You've also got to worry about extra files -- maybe your "ls" command isn't invoking "/bin/ls" but "/usr/local/bin/ls" which has been compromised. To some extent that's easy enough with tools like cruft, but there are quite a few places where extra files can screw you over.

Probably the hardest part is checking your configuration files are correct. On both Linux and Windows, you can do a great job of taking over a system just by messing with configuration files, whether that be zeroing a password field, or adding a preload so that every time you run a program a trojan starts up as well, or a timed job to start your trojan back up if it gets disabled. If there's enough configuration data, you might be able to hide a copy of your trojan in their, so that it'll be re-extracted even if the rest of the system is completely cleaned up.

I'm not really sure what the solution here is. For Windows and its registry (and other configuration scattered about the place) I don't think its solvable; there's just too much of it, that's changed in too many ways to really control. So as far as I can see, it's a matter of scrubbing everything, and reinstalling from scratch there.

For Debian and Linux in general, things still probably aren't great, but there's at least a few things you can do. You can probably rely on /etc not changing too much, which means you can do things like track changes with something like etckeeper and review the diffs to make sure they're sensible. Unfortunately reviewing configuration diffs is probably something of a chore, but with distributed version control and a remote append-only repository you've got a chance of that being at least feasible to leave until you're looking to recover your system.

That doesn't help you with dot-files in your home directory though, and honestly I'm not sure anything will. Compared to 10MB in /etc on my netbook, there's 86MB in ~/.mozilla alone for me, often in inscrutable XML and binary files. Worse, applications feel free to create their own dot files at any time, and also to hide them underneath other directories (.config/gnome-session, .gnome2/evince, .kde/share/apps/ScanImages etc). Some need to be per-machine, others don't.

You could imagine having a .bashrc that sets LD_PRELOAD to include some file in .mozilla/firefox/194653e1.default/Cache/36A45162d01, which then checks every now and then to see if it can run "sudo" without needing a password to give itself root permissions, for example. Perhaps a .bashrc and LD_PRELOAD would be noticable (though I think not to many people), but there's also .xsession and a myriad of other bits of configuration that'll let you get a trojan started up that way.

On the other hand, the amount of valuable configuration in dotfiles isn't that large -- manually deciding which dotfiles are interesting and keeping them in version control, while scrubbing the rest every now and then (when things break, when you switch computers, once a week, whatever) could be feasible.

Another place to worry about stuff is /var. It's usually a little safer in that it's generally full of data, so it won't spontaneously launch software quite so much, but not completely so. Adding something to /var/spool/cron/crontabs/root could get you into trouble pretty quickly, eg. If you modified /bin/ls to do evil things, and someone tried reinstalling with apt-get, if you'd also added some code to /var/lib/dpkg/info/coreutils.prerm you could make sure /bin/ls was reinfected immediately.

I'm honestly not to sure what there is to be done about that either. It might be feasible to monitor just the "risky" parts of /var in a useful way, but it would be pretty easy to miss things. It might be possible to classify great swathes of /var as "not-risky" and treat the other bits similarly to /etc, but I don't think there are tools to do that at present. It might be possible to get programs to move the risky bits into /etc, /usr or users' home directories, but I know people were talking about some of those things over a decade ago, so it's not likely to happen soon.

Finally, there's the disk and filesystem in general -- having /etc/shadow be world readable or having a misplaced setuid bit can ruin your whole day, and you can put a fair bit of information in extended attributes these days if you're looking to hide it from the suspicious admin. You also want to make sure your boot isn't compromised -- perhaps your bootloader is jumping to code other than the kernel you thought you were pointing at, or your BIOS firmware has some code to setup a timer and a ring-0 trap that'll take control of your kernel a little while after it's booted. On the upside, there's nothing inherently difficult with dealing with that: just reflash your system and your bootloader; all your configuration should be elsewhere in the filesystem, so that should be easy. (Whether it is or not is just a matter of how good your tools are)

Ever tried modelling?

Subtitled: David Pennock’s Wall-Street pick up lines

Dr Pennock’s latest post is about fitting stockmarket data — he comes up with a nicely matching randomly generated histogram based on a Laplace distribution over the daily log differences (that is, take the log of the ratio between daily close prices — so if you gained 20% in a day, take the log of 1.2). As well as pretty pictures, logs of differences have the nice property that their sums and averages are actually meaningful — if you invested $p, at an average log difference of x over n days, then your total at the end of the n days is p*enx.

Dr Pennock doesn’t state the figures he came up with, but by my maths (well, R‘s maths assuming I issued the right incantation, and Yahoo’s data) the 60 year average (between 1950 and 2010) daily difference for the S&P 500 is 0.0004596 (with a variation of b=0.006505). Annualising that (ie, multiplying it by 365) and converting it to a percentage ((ex-1)*100) gives an 18.2% annual return over the fifty year period. All very reminiscent to the way of thinking about interest via logarithms I posted about some time ago.

Of course, you only get that result by averaging some really good years and some really bad years, but there’s no reason you have to apply the model to exactly that fifty year period — you could, eg, apply it to 365 49-year periods starting anywhere up to a year after the start of data.

One of the things Dr Pennock notes is:

At the aggregate level the stock market is well behaved: itís randomness is remarkably predictable. Itís amazing that this social construct ó created by people for people, and itself often personified ó behaves so much like a physical process, more so than any other man-made entity I can think of.

If the stockmarket were a random physical process — like beta decay or similar — the parameters pulled from the statistical fitting would have a physical meaning, and scientists would look at them to see if they were fundamental constants or if (and how) they varied depending on external influences. These parameters probably can’t be given too much meaning because they only relate things to the US dollar, which has all sorts of other influences, but at least we can have a look at how the parameters change over time.

(I sat up late reading Feynman anecdotes last night. I’m trusting taking a physics-esque approach to questions will be a short-lived consequence)

Anyway, taking 20-year periods gives us 40-years worth of data points (ie, investments beginning from 1950 to 1990; or equivalently ending between 1970 and 2010). Graphing the mean and variation for 400 of those periods gives something like the following:

S&P 500 20yr

An interesting thing to note from that is that the mean is both positive and fairly consistent — meaning that if you invested your money in the S&P 500 for 20 years, it doesn’t much matter when you did it, the log of your daily returns would average between 0.00025 and 0.0005 (generally in the 0.0003-0.0004 range) — which by the maths above means an annual return of between 9.5% and 20% (generally 11.5% to 15.7%), which compounded over 20 years is between 521% and 3757% (generally 794% to 1757%). It’d be interesting to see how that changed when adjusted for inflation.

The other interesting aspect of that chart is that the variation seems to be gradually increasing — meaning that while the overall result of the 20 year investment is roughly the same (in so far as a 6x return and a 38x return is “the same”), on a day to day basis you can expect to see both larger gains and larger losses in more recent investments.

If you start reducing the investment period things get a bit more lively, though. With a ten year investment, if you have particularly lousy timing you might have no more money than you started with:

S&P 500 10yr

The variation isn’t as stable here either — you can pick some periods of fairly constant variation, some increases and this time even some decreases. There’s also a very sharp increase in the variation fit for investments that span the last couple of years.

Shortening the period still further to a five year investment gives us the possibility of ending up with less cash than we started with:

S&P 500 5yr

Though it’s worth noting both that losses are still pretty rare at that point (pretty much limited to people trying to cash out during the 1970s by the looks), and that even investments that ended anytime in the past five years or so look like they should have made a reasonably healthy profit (financial crisis or not).

Investing for a period of just one or two years is still somewhat reasonable, but you’re starting to have some bigger risks of losing money, and it’s getting hard to predict just how chaotic things are going to seem if you check your net worth every day.

S&P 500 2yr

S&P 500 1yr

Of course, the modelling is breaking down at this point too; without lots of data, guesses at the mean and variation aren’t going to be incredibly meaningful. So if you shorten the period further, to just a month or a quarter, you get pretty useless results:

S&P 500 qtr

S&P 500 month

In general, though, the Laplace analysis seems to support ideas about index funds and long term investing being productive and relatively save ways about dealing with the sharemarket, and possibly provides some interesting ways to analyse different funds.

At least, if I’ve been doing my maths right, anyway…

The simple scripts in life are often the best

Possibly my longest blog post title ever?

Anyway, here’s a link to today’s little bit of scripting. I’ve now written this script three or four times, so I figure that means it’s useful and maybe worth keeping around. I’m calling it dir2tree and all it does is take a (sorted) list of pathnames and convert it into a tree structure. So, eg:

$ dpkg -L samba | grep man.*gz


$ dpkg -L samba | grep man.*gz | sort | dir2tree

So yeah. There you go.

(A related useful tool is tree, which generates a prettier tree and does the directory walking itself. I wanted something that I could use with find, and I couldn’t spot anything that already existed.)