Side project #1: Pageant

So as per my post from a week ago, here comes the description of my first little side project. But first a quick reiteration of the aim: I’m trying to get a feel for what it’s like actually doing a tech startup; so not charging for my time, but rather making something once that I can then sell repeatedly without having to do a lot more work. This is intended to make me more experienced rather than wealthy, so “success” means learning something, rather than making much money. As a consequence I’m aiming for business ideas that are in the bad-to-mediocre range, that will nevertheless involve some interesting/useful technology. That way if the business part goes badly, I don’t feel like I’ve screwed up a chance to make a bazillion dollars, or wasted my time doing something pointless.

So the first interesting-tech/mediocre-business idea I have is related to popcon. I like to think a comment I made once helped inspire popcon’s existance back in the day:

I think It’d be interesting to have a debian-survey style package that when installed, informs the `project’ (stats@debian.org?)  who’s using which packages. This would allow us to get a *much* better indication on which packages’s are in fact moderately stable and tested, and which are just gathering dust; and give us a better idea of what’s appropriate for inclusion in stable and/or unreleased.

Sadly that mail disappeared from the web (it was in the archives mentioned at the bottom of one of my initial posts to debian-devel regarding (what became) the testing suite, but disappeared after an upgrade/reinstall of www.debian.org) — but it was nominally in the public domain as of late July 1998, and lo and behold, popularity-contest appeared some three months later, doing everything I’d thought of and more. (For all I know, my comment played absolutely no part in Avery’s implementation, but I still like to think it did :)

Anyway, cool as popcon (and my original idea!) is, there’re interesting ways you could extend it, getting more information, and doing more with it. You could, for instance, survey more information about packages — what version’s installed would give you hints about how many people are pulling from backports, or mixing stable and unstable, or Debian and Ubuntu; or checking conffiles against their original md5sum might give you useful information about how often the default configuration is sufficient. Or you could analyse the information more thoroughly — eg, seeing if there are any unexpected correlations between people who use particular combinations of packages, or doing a netflix-like “I see you use package foo, many other people who use it also use bar, maybe that might be worth investigating.” (I once tried to do that sort of analysis on the popcon data, but all I ended up with was a pretty animated gif, that apparently crashed some people’s browsers… Red dots were systems, blue dots packages, with a package being installed on a system implying attraction, and uninstalled applying repulsion)

You could also gather completely different data — like information about the hardware, or things like the default language or timezone, or potentially even things from logs. That would let you answer questions like “do many people run Debian on HP hardware?” or “which IBM hardware is popular with Linux users?” which might influence future hardware development or purchases; or tell you surprising things about where Linux is actually being used; or give you some feedback on questions like “is the OOM killer a common occurence?” or “is IPv6 adoption actually going anywhere?”

As well as just gathering data from otherwise passive users, you could also use the data collection as an opportunity to make introductions between users — having established you’re running Debian and have a particular Intel graphics card, you could be automatically given the address of a section of the Debian wiki that’s dedicated to issues with that card; with the idea being that you can see any helpful solutions other users have already come up with to problems you’re having, or leave your own tips for future users. The same principle potentially applies to other sorts of data: if you have an old version of wordpress installed, it might be reasonable to point you at some security alerts that apply to it, or having determined you’re running Debian on some HP server, you might get directed at some updated management software that enables some extra features.

Another interesting improvement I think you could make is to provide ways users can aggregate and anonymise their own data. Even in the age of social networks and ubiquitous transparency, managing privacy of this sort of data is important: it would be spectacularly bad to provide a website that told people exactly which machines were vulnerable to which secuirty exploit, but that’s exactly what a list of which machines have which versions of which packages installed would provide. The popularity-contest software goes to some lengths to avoid that, by identifying data against a randomly generated UUID rather than an internet address, email or username; by not storing detailed information about package versions; and by restricting who has the ability to run any detailed analysis on the data. But you can go further than that by aggregating and filtering the data even before it makes its way to a centralised server — eg, rather than have each individual machine on a network reports its statistics to Debian, you could have the information sent to a proxy server that aggregates all the packages into a single report (30 computers, 10 of which have apache, 15 of which have exim, …), thus removing certain correlations (do all the machines running apache also run exim? or do none of them?), and potentially filtering things like the UUID (which might reveal something about the random number generator, particularly given Debian’s recent issue with randomness…) popcon version (which gives an indication what version of Debian is in use, and in some cases how recently it’s been updated) or timestamp (that may give away that the machine has been down). And if you’re running a network that’s intended to be somewhat locked down, it might be more reasonable to have computers reporting to a machine that you control, rather than one just out their in the wild.

So that, in very rough terms, is the spec for this project, which is currently going by the name “pageant” (ie, a popularity contest that takes itself a bit more seriously…) The technical goal is to provide a pageant client that people can run on their systems, which can report potentially arbitrary information to a central server and can receive and present relevant snippets of advice related to that information; a pageant proxy that can intermediate and filter pageant clients to provide a slighter higher level of anonymity/privacy; and a pageant server that can collect the data, provide relevant advice to clients, and analyse the data. I think it’s feasible to do an interesting job of that, that should go a little further than existing programs, and be usable by actual people, though I suspect the server side will have to be a bit beta-ish to be finished within a  week or so.

The business goal, obviously, is to turn some of the hypothetical benefits touched on above into actual income, ideally without turning it into a vast NSA-like data hoarding corporate conspiracy. I figure there’s a few reasonable ways to approach that:

  • First, I figure that providing the same information other systems currently do at no charge makes sense: so getting basic stats on how many Debian users have nickle installed, or Ubuntu users have network-manager, or Fedora users have a Synaptics touchpad should be free.
  • Second, I figure providing further analysis for companies and researchers should probably be possible, and cost something: probably more depending on how complicated the analysis is. Possibly there could be an extra fee for the analysis to not be also made available to the public; that could be entertaining.
  • Third, I figure that it probably should be possible for companies to at least provide advice to users of their hardware through the system, and that at least in some cases, that probably should be for a fee. I’m not sure if there’s a line in there somewhere between necessary advice (security updates?), helpful tips (here’s some non-free drivers for that hardware?), or outright advertising (buying our hard drives will give you 200% better performance!) that might mean “advice” should vary between free, paid and blocked. An approach might be to say distros’ advice is free, other people pay.
  • Fourth, I think it would be interesting to allow users to optionally pay a fee to register their hardware. This could have a couple of benefits: it provides a low-maintenances way to discourage ballot stuffing — it’s not at all difficult to hack up popcon to pretend you have thousands of servers running your favourite package to try to bias the statistics, but it’s somewhat harder to come up with even a few dollars thousands of times; and possibly more interestingly, it provides an easy means to link a small payment for “using Linux” with the software that’s being used — so distributing 80%-90% of those fees to the authors of the software that’s actually being used might be an efficient way of helping support free software development.

Anyway, that’s the project! My notes have a few other things in them worth mentioning — there’s a couple of not entirely little complications in a few of the above ideas, for one — but this is already long enough, and it’s not like I can’t blog again later. Even though there’s a few similar projects around (popcon and smolt in particular) I’m planning on taking a NIH approach and starting from scratch, on the basis that current stuff is mostly pretty basic to reimplement, and getting an architecture I’m comfortable with is pretty important in making it appropriately generic. As always, helpful tips, questions and/or any general encouragement appreciated, either by email or the comment link…

On organising oneself

Matt writes on ideas, organization and overflow, and that he’s ending up with so many awesome ideas, that even when he notes them down for future reference, he’s so busy that he ends up independently reinventing them before finding time to actually make them happen. (Spoiler: he likes the approach of Steven Covey’s Habit Three and has some ideas for improving it, that he’s probably already noting down for future reference… I kid, I kid)

I have what I guess is a similar collection of interesting little ideas that I think would be worth seeing through, with many of them not really seeing the light of day. If I had any sense, I’d be keeping them all in a wiki, but in reality I tend to just use a collection of TODO files in my home directory. Fortunately, that’s not really the problem for me: I’m mostly able to either track down or reinvent the nuggets of ideas, and I don’t usually forget ideas entirely, however long I’ve put them aside.

What I have been having difficulty with is actually getting them finished — I’ve got a bunch of neat ideas to the point where I think I see how to finish them — ie, right up to the point where the next bit is Hard Work. And apparently I’m no longer at the point where I like doing complicated coding gymnastics just to prove I can. And beyond that, the overall motivation, ie that it’d be kinda cool when finished, just isn’t enough to actually get stuff done.

(And the worst part about trying to motivate yourself to do anything hard, is that when you don’t succeed n times, that becomes an additional reason not to succeed at attempt n+1. But pfft, we’ve addressed that, right?)

(Of course, the worst part for you, dear reader, is having to skim through all this filler I’m writing while trying to avoid getting to the actual point…)

One aspect to my motivations these days is that I’m interested in business and entrepreneurship these days — and hey, I figure it’d be a lot easier to deal with having lots of Fantastic Ideas if you happened to have a profitable business with a bunch of employees you could tell to implement them for you. (I’m sure there are other complications of some sort in that plan, but hey) And having decided to pass on the exciting sounding BootUpCamp next month in favour of the inaugural (and much more local) kernel.conf.au and I guess the forthcoming Brisbane BarCamp now too, I’m left lacking an awesome opportunity to hone said interests.

Being the sort of person who likes looking for mass-kills when considering stones and birds, an idea came to me. If I want to try starting some businesses just for practice, and have a bunch of kinda cool ideas I want to finish, why not turn them into mediocre business ideas, build them, and see how it goes? Adds an extra reason to actually get the cool ideas implemented (it’s a business necessity!), makes them slightly more challenging (can’t just be useful, but have to be at least potentially profitable), and means that even having the business part fail completely is still an objective win (because there’s a kinda cool creation that’s at least functional, if not finished), as well as being a learning experience (which of course is also a win!). And if the business part happens to be successful, well, there’s all those wins, plus some extra cash!

Great in theory! In practice, the abject terror it inspires is something of a drawback — though also possibly motivating in its own way. Anyway, I figure sometime in the near future I’m going to try running a handful of projects through roughly the following formula:

  1. Pick a neat idea I’ve been putting off, that shouldn’t take too long to actually get up and running (say a week or two)
  2. Work out how it improves the world — who would be better off, and why
  3. Work out a plausible way of charging some-or-all of those people before they get some-or-all of that benefit.
  4. Blog about that, on the basis that (a) it gives me a huge incentive to actually finish in a timely manner, a la WoBloMo, and (b) if I’m really luck someone comments or emails with an even better business model.
  5. Write the software and whatnot.
  6. Setup as simple a charging mechanism as possible.
  7. Publish both.
  8. Take a breath.
  9. Go back to step 1.

My initial idea was to commit myself to doing one of those each week for about a month starting yesterday, but while that might be plausible for the coding part (or might not be, too), it’s a bit too daunting for the business part. So I’m thinking I’ll just be starting “soon”, and aiming to get each project “launched” within a week or two, and seeing how that goes. There’s a few other daunting bits too, like setting things up to automatically deal with payments, and there’s a few aspects to some of the ideas that require confusing things like setting up websites… But hey, learning experience!

Anyway, that’s the theory. Whether it ends up bearing any resemblance to practice, I guess we’ll see…

Passions

(not a post about Spike’s favourite soap opera)

As an INTP, I generally try to make “rational” decisions — which is to say, ones I can rationalise and explain and logically support. That in turn is something I can rationalise, explain and support: I’m fairly good at logic, and I’ve been taught lots of ways of analysing problems that people have discovered over centuries, which helps make better decisions. But the counter-argument to that is that it’s still easy to make mistakes, and mistaken logic can lead you to all sorts of bad ideas; a lot of deep, rational thought went into eugenics, for example. So for me, I like to keep a handle on that at least partly by trying to keep things aligned with my emotional response. If something doesn’t feel right, that’s a good time to look back through the logic, because there’s probably a mistake. If you can’t find a mistake, and that doesn’t make you feel better, it’s a good time to be cautious in other ways; if you do feel better, that probably means you’ve got a better understanding than you did before and definitely means you’ll be able to act on the ideas more effectively; and if you do find a mistake, well, you get a chance to fix it.

Going the other way — rationalising whatever you already feel — isn’t so effective to my mind; it’s often easier to make an apparently logical argument that’s actually wrong, than to work out how it’s wrong. That can be useful as a defense mechanism to rebuff challenges to what you want to do, but you can generally rationalise anything without much effort so it’s not actually adding information or improving your decision, and if you infer from the fact you’ve come up with a rationalisation that your decision is the only rational one to make, you can end up with a closed mind to better alternatives, leaving you with a worse decision than if you’d just continued going with your gut feeling. Personally, I tend to take that a fun game: take a completely subjective and illogical response to something (eg, “orange is the best colour”), then “logically” and “conclusively” prove it’s the only justifiable response. At the very least, it’s a good way to keep a little humility about the value of a good argument.

Logically investigating something (”I like orange. Hmm, is there some way to tell what the best colour is?”) you had a gut feeling about is a different matter entirely, of course — and as you think about it, if the analysis leads you to find something different to what you expected (”that’s odd, I think I just proved chartreuse is the best colour”), which might lead you to investigate different definitions for your terms (”perhaps orange is best in some ways, and chartreuse is better in others”) if it doesn’t lead you to change your opinions (”oh. my. god. this chartreuse cape is to die for!”)

Of course, if you’re not naturally comfortable with coming up with logical arguments, or trained enough to do them well, you’ve probably got other, more personally appropriate, ways of coming up with decisions anyway, and maybe none of this applies. But hey, that’s not my problem.

The other advantage of keeping your feelings in accord with your thoughts is that it tends to be more motivating — “passionate” tends to be a decent description both of someone pretty emotional and of someone pretty motivated and active, and there’s no point to making good, rational, decisions without acting on them. In some respects, the more intense the emotion the better; it’s easy to want to quit doing something difficult that’s not immediately rewarding, no matter how logically you’ve convinced yourself that it’s a good idea, but it’s a lot harder to shake off broiling rage, true love, or abject terror, eg.

The trick, then, is if you’ve found something that inspires that sort of emotion, to make sure it’s working in the same direction as the goals you’ve carefully and logically examined. That can be really easy: if your primary emotion is that you care deeply about helping people, seeing someone who’s had a bad day or week or year get a break and maybe break a smile is a good way to keep yourself working in a charity or a hospital, if that happens to be what you want to do. But it’s often not — maybe you’re overwhelmed by anger at the stupid bureaucratic nonsense that’s getting in the way of your hospital helping people, or maybe you’d like to help out in a soup kitchen but you’re terrified of violence in the area.

But, at least sometimes, those can be harnessed too. “Use your anger” isn’t exactly “use the force”, but it still gets some 40,000 hits on google offering useful advice. My feeling (which I wish I’d had earlier than today, but oh well) is there’s probably similar ways to grab most of those emotions, and turn them into allies, rather than just trying to figure out ways of making them go away.

  • frustration, anger, hate: Figure out exactly what it is that’s the deserving object of your ire, and find ways to harm it. There’s lots of entirely reasonable ways to hurt things: a death blow, divide and conquer, death by a thousand cuts, subversion and betrayal; and most entirely reasonable ways of contributing to the good of society can be rephrased into something that’s more acceptable to anger. Annoyed by ignorance on the Internet? Deal it a death blow by creating a site like Wikipedia or snopes; create a debating forum so ignorant people are fighting each other instead of you, and maybe learning something as a result; contribute to Wikipedia or snopes or just help your friends avoid spreading urban myths;  find a group of people who seem particularly ignorant, join them, become well-educated in their customs, befriend them, and then help them get access to all the knowledge they’ve been missing.
  • worry, fear, terror: Be thorough. If you’re worried anyway, you’re going to naturally be thinking of every single way every single thing can go wrong, so take advantage of that and do something about each of those things you think of. Maybe it seems more rational to ignore your worries, and just charge ahead (and maybe it is), but there’s an equally good chance that will just make you worry more and make you less effective, whereas if you actually nervously go around making sure everything is absolutely perfect, you’re at least spending your time contributing to your goals. And every little thing you do fix up is one less thing to worry about, so it’s possible you might end up naturally less worried anyway. Probably not, of course…
  • affection, appreciation, love: Dedicate your work, do it in appreciation, or in honour, and make it something that’s worthy of the object of your admiration, whether that be a person or an idea. It’s always tempting to cut corners or strive for something other than your absolute best, but much less so when what you’re doing is making a devout offering to something or someone you care about.
  • pride, arrogance, narcissism: You think you’re the best, so do something that demonstrates it. Repeat.
  • greed, envy, lust: Be a free-market capitalist — get more of what you want by doing more of what other people want.
  • embarrassment, guilt, shame: Accept, apologise, and then do something worthwhile to atone?
  • indifference, apathy, sloth: No idea. (Is this an emotion, or the absence of emotion? If the latter, find an actual emotion? If you can’t, just try to avoid watching Dexter for lifestyle tips?)

Anyway, there’s my thought for the day. YMMV. The following quote may or may not add support to the thesis presented:

Peace is a lie, there is only passion.
Through passion, I gain strength.
Through strength, I gain power.
Through power, I gain victory.
Through victory, my chains are broken.
The Force shall free me.

The Sith Code

Social meeja

Ben posts about the point of twitter:

As far as I can tell, Twitter is a flakier, crappier knockoff of Facebook, that has even less monetization potential than Facebook.

Meanwhile, identi.ca is an open source knock-off of twitter, FriendFeed is a knockoff of Facebook, LinkedIn is knocking-off a little bit of both of them, everything’s getting comments and tags and automatic recommendations, and everyone and their lolcat is starting up their own social network of some sort or another. It’s all very confusing. Anyway, as a snapshot in time, the social media thingies I’m on at the moment:

  • blogging — almost finished my fifth year of irregular blogging; now with comments enabled, and using email-address based gravtars so people can have an identity while commenting. Still like it.
  • microblogging — have accounts on twitter and identi.ca under the userid “ajtowns”, with the twitter feed syndicated into the sidebar of my blog. Mostly used for trivial techy comments or link sharing — things too banal or already known, or just not fleshed out enough, to be worth blogging. The two accounts are hooked up via ping.fm, so they mostly get the same content. I’ve got both because twitter’s popular in general, and identi.ca’s the “alternative” version for the open source crowd, and the easiest way to follow other people’s comments via those services is if you’re signed up.
  • google reader — in practice I mostly just use this to read other blogs, but I occassionally use the “share” button that shares entries with friends in my (fairly minimal) google contact list. Not really convenient, since it’s a nuisance to share random webpage that you get to by following links, so I’m probably switching to tweeting interesting tech things instead
  • facebook — good for connecting up with non-tech friends and procrastination. Some (but not all) of my techy friends link this with their microblog accounts, so I get the same updates there and here, and any responses/comments they get then get further split. Don’t much like that, hope someone will fix it. Also the only way I ever know anyone’s birthday.
  • youtube — also procrastination
  • linkedin — really good for figuring out who some tech person is, haven’t tried
  • stackoverflow — seems to be better than IRC and forums for getting useful answers to programming questions atm, and answering programming questions is kinda fun too

Having one or more social media accounts seems to be (becoming) a significant part of the way business networking gets done now too — with an @reply/comment and a friend/follow instead of  some idle chat and swapping business cards. Don’t know whether I think that’s a good thing or not, but it seems useful to be aware of anyway.

As far as contributing to the fads go, I like to think I provide useful content to a few of these (blogging, microblogging, linkedin, stackoverflow), I pay my own way for blogging (they’re my thoughts and I’d like to keep them, thanks), and I’ve followed some ads on facebook (though I don’t think actually purchased anything as a result).

Anyway, that’s my take — whether my contribution is enough to justify my slice of the computing power associated with keeping those sites running, I don’t know, but they’re currently all of some value to me. I suspect my blog is the only one of them I’m willing to keep going at all costs (especially since I know at worst I can just move it to paper).

Funding the NBN

Simon Rumble posts some thoughts on the costings for the national broadband network. He offers some working, and requests corrections, so I thought I might redo the calculations.

First, at its most basic $42B divided by 7.4M households means infrastructure costs of about $5,700 per household. That seems like a lot of cash, when (non-Telstra) ADSL2+ upfront fees are only $130, with no contract. It’s covering businesses too though, and potentially is a qualitative change to the service compared to ADSL coverage (whether due to speed or reliability). For me, that’s way more than the government should be committing to this; if people really think fibre speeds are worth $6000 per household, let an ISP sell it to them privately. I’d completely support having local neighbourhoods able to vote to have fibre (or high speed wireless or similar) installed locally, paid for by an increase in everyone’s rates, eg.

But hey, the point isn’t to see whether it’s worth going ahead, it’s to see what it’ll end up costing when/if it does. Infrastructure development is apparently going to be funded by a bond issue, which means the government sells bits of paper with “$1000 treasury bond” written on it, with a “coupon” rate that’s currently around 5.75%, and a maturity date (up to around ten years away). The government will then pay $28.75 every six months to the bond holder, until the maturity date, at which point they’ll give them the full $1000. Over a ten year period, that’s a total of $1,575 paid out. The initial price is just whatever the government can get, which might be more than $1000 or less, but certainly won’t be $1,575. If I’m reading the RBA’s numbers right, the current price for a $1000 treasury bond with a coupon of 5.75% that mature in 2021 is about $1105.42.

So what’s that mean? To get $42B now at that rate, you have to issue just under 38 million bonds. You then have to pay each person who holds one $57.50 per year, and then pay them $1000 in 2021. That’s $2.185B per year, and $38B in 2021. If you’re balancing the budget, you’re thus hoping to collect $300 per year from every household ($25 per month) over the eleven year period for the coupon costs; and you also have to come up with $38B from somewhere. If you’re going to do a Telstra again and just sell the infrastructure you built, then hopefully it’ll be worth $38B (or more) at that point  and you’re okay. If you’re hoping to have built a public asset, you’ll want to have collected roughly $3B per year more than that, so you’ll be able to pay off your bond holders and keep the infrastructure, which means about $700 per year for every household, or about $60 per month.

This is averaged over every household in range of the NBN, including ones that don’t have computers or don’t want access to the internet. That’s likely inaccurate: if it’s paid for by people who use it, and only one in ten of the people who can, do, then that’s $250 to $600 per month instead. On the other hand, if it’s taken out of income tax/GST receipts, it’ll be a different group of people who end up paying for it, and working out how that’ll actually affect tax rates or consumption or other government projects is beyond my ken really.

Those are, presumably, just infrastructure costs, so additionally you presumably need to factor in maintenance fees, tech support, external bandwidth, and other costs too — as Simon pointed out in his post, effectively all your $60/month is getting you is the equivalent of the copper wires we already take for granted and pay Telstra about $20/month for (whether directly or not); running actual data over it is an add-on either way. Going on Internode’s charges, getting 40G of data a month would be an additional $55/month (if you want less than that, I’d presume you don’t care about the NBN anyway). It’s possibly slightly worse than that, in that the $20 that Telstra gets also covers routine network maintenance after the service is initially setup, while it’s not clear the $42B (and hence $25-$60/month) does. And of course, while $42B is a very Hitchhikers numbers, as a government project it might blow its budget and require additional financing later, so multiply it out for that reason as you see fit too.

So by my count, that means retail prices are something like $25-$60 (infrastructure) plus $? (infrastructure maintenance) plus $0-$540 (unused capacity costs borne by early adopters) plus $55 or more (data, service) plus $? (corruption, waste, budget blowouts, profit), which sums to a monthly retail broadband fee between $80 or more and $700 or more.

That sounds like it’s in the right ballpark for the scale they’re considering — about $80/month for low-end fibre sounds plausible if you don’t get forced to try to provide it outside of major population areas (ie, the new 90% of the population target, not the old 98%), but only if there’s a high takeup in the area, and it’s implemented with a lot of competence. If there’s low takeup, then you get to multiply the cost accordingly; and if there’s problems in the implementation, you get to add to the cost, and lower the adoption when people avoid it.

Of course, the more likely scenario is the budget doesn’t get balanced, and the final $38B is either rolled over into ongoing debt (”we need to come clean on $38B of bonds? let’s issue more bonds then and pay the old debts with the new debts! ponzi scheme? what’s that?”), taken out of taxpayers’ hides, or we have a round of inflation so that $38B is barely enough for a morning coffee. The other issue is that $42B worth of bonds would almost double the amount of debt Australia currently has on issue, which could easily affect the price we get for our debt — and that we’d have to issue more bonds than I’ve estimated above or have a higher coupon to get the same cash now, with corresponding increases in the prices needed to keep the budget balanced.

(And of course, if there really is a credit crisis, and people aren’t willing to loan money, issuing $42B of bonds wouldn’t be possible. If there’s just a credit crisis for private borrowers, this probably just makes it worse by giving even less reason for people to loan money to anyone who doesn’t have their own mint and tax agency)

WoBloMo, epilogue

Phew, so March is over in another hour or so, and this post will be my sixteenth of the month, thus kinda completing the woblomo challenge, even if it ended up pretty damn flaky after the 19th… But hey, it was kinda fun, at least from this end.

If you’re looking for actual interesting content, you might like to check out On Topology, by John Moeller who did a better (though still imperfect) job keeping consistent on his blog, with some interesting posts on hyperspheres amongst other things.

Internode quota redux

I guess I posted my previous post too soon, because just after midnight last night the usage that had disappeared magically reappeared. Traffic shaping had already started about six hours earlier, despite internode-quota-check telling me there was a few GB left, and I’d gotten the “over quota” email, so at least it’s all consistent now. Still all a bit odd though.

internode quota, day 2

So snarky comments notwithstanding, I’m sticking with my assessment of the quota behaviour as weird and confusing, but it’s nice to have a pretty picture tracking just how weird and confusing it is. Hopefully tomorrow Internode will be all “haha, it was all just a practical joke, April fools!” and give me another 40GB to play with. Since the first of the month is my usual rollover day, I’m quietly optimistic.

Munin

Okay, so I’m late to the party, but munin is great. I modified Mark Suter’s internode-quota-check to dump output in a form suitable for munin and ended up with some graphs. Today’s is a little confusing:

internode-day

Somehow the blocks of downloads that almost used up the remainder of my quota yesterday just vanished! Awesome. Especially since there apparently wasn’t any downloading happening during those blocks. Apparently there’s rumours that the quota exempt stuff is sometimes added to your usage as it happens, then deducted later, so maybe that’s what happened.

The red “variance” line is how much expected quota usage differs from actual usage — by the above I would’ve expected to have used 2.5GB more than I actually have so far. (The expected usage is just a constant rate that will exhaust the entire quota just in time for it to rollover)

Exponential Growth

On Wednesday the 25th, I was thinking about project growth. The day before I’d posed a question to the debian-vote list:

Over the next twelve months, what single development/activity/project is going to improve Debian’s value the most? By how much? How will you be involved?

There have only been a couple of replies so far, the first of which was from Russ Allbery, who took issue with the way I’d chosen to focus on growth. Which lead to some interesting thoughts; or at least, I find them interesting.

The examples I gave talked about how much that would improve Debian from users’ perspective in percentage terms — this would make Debian three times as good for one out of every ten of our users, eg. That, in turn, implies an exponential rate of growth: if you can consistently improve at a given percentage over a given timeframe, whether that’s 100% a month or 1% a year, you’ll eventually be doing better than anything that can’t remain exponential, given enough time. On the other hand, if you can’t maintain exponential growth, you won’t be able to maintain any given percentage either — linear growth, eg, will give you percentages that drop rapidly: 100%, then 50%, then 33%, then 25%, 20%, 17%, 14%, 12.5%, 11%, 10%, etc.

The interesting thing is how that looks from a user’s perspective. If a user’s already got a working system — running Debian, or Ubuntu, or anything else — what’s the incentive to either upgrade or switch to another distribution? One is that ongoing support might disappear, so you can’t get security support, or Oracle will stop answering your calls because your OS is too old and they just can’t be bothered anymore. That’s a mostly negative approach though: you’re not expecting any benefit, so you just want to minimise the pain of upgrading. New features, behaviour changes, all that stuff is a cost, because you’re just looking to keep doing what you were doing before. The other reason to upgrade is exactly the opposite: that there are new features, or new ways of doing things that are a real benefit to you personally. Perhaps it’s a bunch of small things — a little less power usage, a few less errors here and there, less obnoxious popups, a faster boot, fewer typos around the place, some fixed bugs, some more documentation — that just add up to a more pleasant experience. Perhaps it’s one or two big things — you can replace your last Windows box, eg. Perhaps it’s something that pretty much only matters to you — you changed your name to an unpronouncable symbol, and finally your preferred font has a unicode glyph included that you can use as your real name in your email program.

But it seems to me, that if you’re going to provide a new version of your software and you want users to be happy about upgrading, then there are two things to focus on: making the changeover completely unnoticable, and making the upgrade give an appreciable benefit to an significant number of users. And if you’re going for the latter path, then that does require a percentage improvement: for x% of your users, they’re experience using the new software has to be y% more pleasant than using the old software. (And if you decide to go exclusively for the former path, there’s an easy solution: don’t change the software at all — you can’t notice changes that aren’t there)

And note that that’s actually the right answer, too: if you aren’t improving your users’ lives, you shouldn’t be releasing new versions. Upgrades come at a cost, even if they’re nominally free: some things break, you need to learn new things, and you often get forced to upgrade other things too. If you’re only making things 0.0001% better, it’s probably better to delay the upgrade until there’s a bunch of improvements that can be combined to actually make the cost worthwhile.

Getting back to the original question, it seems to me like it’s also fair to focus first on the things that are going to provide the biggest benefit; though obviously there’s plenty of room for debate over whether three small things are better than one big thing, or how to compare a short term benefit with a longer term benefit. But ultimately, I think it’s fair to say that when you multiply all the improvements all your contributors are working on, scaled by the proportion of users they affect and how much they affect those users, you want to end up with something similar to Moore’s law: that is your project becoming twice as “useful” every eighteen months. Maybe it’s a different time frame, maybe it’s three years or five years, but if it’s not in the ballpark, then you’re basically not doing your users any favours.

So how do you get there? There’s a few ways of getting that sort of growth. You can keep your userbase constant, and improve your quality. That has the benefit that you’ll likely get more users as well, so do better than you expect, but eventually you’ll hit a wall because your users will be near enough to completely happy that you just can’t make things that much better. Alternatively, you can maintain your quality and expand your userbase; which largely means finding new things to do, rather than doing the current things better. For open source development, that has the benefit that it can increase your contributor base too — if you have one contributor for every twenty users, then if your userbase increases by 100% every eighteen months, so do your contributors. And if each of those contributors is focussed on individually linear growth — improving the system enough that twenty new users will adopt it, for instance — that will rebound back into sustainable exponential growth.

Now, there are limits to that sustainability: eventually you run out of people (or just match the rate at which the population is growing, anyway), or hit the absolute limits of a quality experience. But that just means one of three things: your project should expand into other areas that still usefully contribute to humanity at a significant level, people should stop spending much time on your project and work on other projects that usefully contribute to peoples’ well-being, or the human race has pretty much hit the absolute limits of its potential. Those seem like big calls to me, and at least in the areas that interest me, there still seems like plenty of potential for big improvements.

So at the moment there are three responses to my original mail, from Russ, from Raphael Hertzog, and from (DPL candidate) Stefano Zacchiroli. And they’ve all pretty much avoided the “By how much?” part of the question. If it’s really fair to expect Debian to improve significantly (by 10%, 100%, 300%, whatever) over the course of a year — and as I’ve argued above, I think it is — not making estimates of how much benefit things will actually result in seems both a bad way to establish the project’s priorities, and somewhat disconnected from the usual philosophy of wanting to measure performance and results, that we expect from scientific and engineering endeavours.

Elastic bands

So moving onto Monday the 23rd. Something I’ve been pondering blogging about for ages now is an analogy I came up with for the way Debian is organised. I’m not quite sure of the motivation, but it goes something like this: imagine all of the people in the organsiation arranged in a circle. That circle represents everything the organisation does. Around the outside is an elastic band, holding all the people together — and that’s the organisation itself. When someone wants the organisation to do something new, they try to move to the new area, but have to stretch the elastic band to do so, which might be easy or it might be hard, depending on how rigid the elastic is, or how rigid the organisation is. If it doesn’t stretch much, when you try to extend the organisation to do new things, you’ll find instead that you’re pulling the people on the other side of the circle away from the things they’re interested in; and that they’re doing the same to you. The most obvious solution to that is often to pull harder, or to tell the other people to stop getting in your way — but a better solution is often to find some way to make the elastic more stretchy, ie to make the organisation more flexible, or to make the things it does, and the people in it less tightly coupled.

Anyway, it seems a useful way of looking at conflicts to me: are you actually growing the organisation, or are you just spending all your effort getting other people to move in the direction you want, when they don’t actually care?

You could probably extend the analogy to cover forking too — stretching the band so far it snaps, then tying it back together again. I don’t know if that’s very helpful though… Also, there’s probably more than two degrees of freedom, so a hypersphere is probably technically a better model, but, well…

Linux Aus face to face

So, catching up on my WoBloMo posts. On the 21st I was in Melbourne for the Linux Australia council meeting. Saturday was mostly organisational stuff: basically getting an idea what each of the council members thought about the approach we’d take for the rest of the year. Stewart invited Andrew Cowie to give a presentation on corporate governance and related background from LA’s history. It was pretty similar stuff to what Andrew talked about when he was on the committee (from 2003 to 2006), basically that it’s important to have a split between oversight and executive roles (ie, making sure stuff is done properly and actually doing stuff), keeping your head around all the different sorts of strategies and objectives the organisation might pursue, and focussing on being a sustainable organisation, so dealing with people coming and going and prudent management of funds and resources. In some ways it’s a difficult issue for LA because we’re at the point where we have enough resources to want to do lots of cool stuff, without having the resources to handle it in a sustainable manner; we have money, but not enough to hire staff for an extended period; we have income from the conference, but it’s not very diversified and can be quite variable; we have volunteers, but they’re often already overloaded, etc. I don’t think we came up with any answers per se, rather than just kept an awareness of the questions; but compared to a few years ago, it seems like LA’s beginning to settle into something approaching a working compromise, which is good.

The other comparison that can be made to a few years ago is more of an absence. When Andrew was on the committee, at least in the year we had in common, he and Pia had a habit of butting heads, more or less on this topic. From where I sat, it was mostly entertaining: a real live dialectic, noble scholars jousting on the field of ideas wearing their philosophys’ favours on their arms — though I gather for both Andrew and Pia it was mostly just frustrating. Particularly since there was something of an impedance mismatch in their roles within the organisation, rather than being a debate between people with equal responsibilities that someone else gets to adjudicate. To attempt to paraphrase Pia’s line of thinking (and without the benefit of it having been reiterated just a few days ago), I’d say her view was that sustainability is very much a secondary issue compared to activity and actually getting things done; that Linux Australia is a volunteer community, so make use of that and get people to do things for free so you don’t have to worry about how much money you have, and reward that contribution with kudos and appreciation, and ultimately if your organisation is doing great things, people will find a way to keep it going one way or another anyway.

I’m personally more biassed towards Andrew’s focus than Pia’s — I’d rather work on the multiplier between effort and results, than increasing efforts with the same multiplier. But not completely so: there’s no point having huge results for very little effort if nobody’s putting in any effort, after all, and there’s no point having an organisation that can sustain itself forever, but that never actually does anyone any good. So for me, it seems valuable to keep the other side of the argument close to mind.

(We discussed a bunch more practical stuff on Sunday, but I couldn’t very well have posted about that the day before, which was the WoBloMo post I’m meant to be making up for here…)

Voted

So I’ve pre-poll voted in preparation for my trip to Melbourne for the LA face-to-face. Not a very exciting range of candidates: Anna Bligh for Labor who’s premier; Mary Carroll for LNP who’s apparently the state secretary of the party; Gary Kane for the Greens who’s running on an anti-developers platform, with light rail to solve local traffic problems; a Socialist Alliance guy who doesn’t actually live in the electorate; David Rendell who wants daylight savings; Derek Rosborough who’s a serial independent candidate and wants a review of water fluoridation; Matt Coates who’s part of the “Reclaim Queensland” bunch of independents; and Merilyn Haines and Greg Martin who I couldn’t find anything out about. Hrm, I suppose one of them might be the “sex party” candidate.

A day late for WoBloMo (unless you go by Hawaii time…) but I figured I shouldn’t blog ’til I’d actually voted…

Bubbles 2: Glubba glubba in the puddles

(Random topic courtesy of Dressy Bessy)

A couple more thoughts on Sunday’s post. In comments, Brendan Scott asks “Why would a trader extrapolate against their estimate v valuation?” But there’s actually a broader question — why would anyone trade at all? The initial scenario provided infinite supply at $500 per item, and gave a randomly chosen demand at $500 per item, which was then naturally fully satisfied. If they thought it was a good idea to buy more at more than $500, they should have bought upfront, and why would they want to sell something they just paid $500 for, for less than $500? Worse, the only difference between the traders is the input they get from the random generator: their strategies are explicitly the same. So no trader can potentially be smarter than another, and profit from their stupidity, they can only profit if the random number generator gives them a lucky number, and someone else an unlucky number. So I don’t think there’s a good answer to “why would they do this?” — participating in this market is fundamentally a mistake, the way most people view the world. For instance, it ends up with a 74% chance that you’ll have less money than you started with, and starts off with perfectly equal wealth amongst all the participants, and ends up with the wealthiest individuals having over $60,000 while the poorest have less than $10.

In a real market, as JD points out you have different people having different information — though of course you’d have to actually have something to have information about. If you decided the assets were batches of ten barrels of oil ready for delivery in twenty-four months time, different people would have better or worse estimates of the value, and depending on other changes in the economy, the underlying value would change too (maybe someone discovers a cheap oil replacement and it drops, maybe there’s a war and it rises).

An interesting theory that I’d never heard of until David Pennock posted about it the other day is the “Kelly Criterion”. Given an estimate of your odds of success, and how much you’ll make, it will tell you the optimal amount to risk to get the biggest advantage from compounding returns. The idea is if you’ve got an almost sure thing, and you only risk a few dollars on it, you won’t make much; but if you continually risk everything, even on sure things, you’ll eventually lose it all, and that’s no good either. The Kelly criterion makes that idea precise, telling you exactly how much of your resources you should commit, assuming you can come up with a reasonable estimate of your odds.

The original paper was from 1956 byJohn Kelly, apparently in collaboration with Claude Shannon, and as you might therefore expect, came from an information theoretic approach, rather than an economics one. The idea was that you have a secret channel that tells you exactly what to bet on, but unfortunately it’s not a clear channel, and sometimes you mishear what you’re being told and thus bet wrong. Fortunately you’re clever enough to figure out how often this is likely to happen, and thus you can work out when to follow the tips and how much to invest in them, which gives you the aforementioned Kelly criterion. But the signal you get doesn’t have to actually be from the future, it just has to be correct predictably often. If you want to apply that to your intuition, your astrologer, or a groundhog’s shadow, that’s fine, though the lower your odds of success, the lower the amount you’ll be encouraged to invest, and thus the lower your optimal returns will be.

But that’s only relevant if you’ve got an actual meaningful signal and a chance to actually profit, which isn’t what I gave my poor automated traders. If I had, an optimal system would’ve rewarded folks with the best signal, provided useful information for someone, and transferred physical wealth from people who wanted information to people who had it. Redoing the marketplace so that was actually possible would provide a much more interesting endgame.

Nevertheless, it’s interesting to me that even without any fundamentals at all, or any complicated trading techniques, you can pretty easily get behaviour that looks like a bunch of otherwise intelligent people bidding themselves into bubbles and then crashes. If you looked at those graphs as the price of oil or milk or similar, you’d naturally go looking for a cause for the price changes: but at heart, there actually wasn’t one in that case, it was just a combination of coin flips, that happened to be more or less likely, due to trader’s habits, and how much they could happen to afford at the time. You can only reasonably fix that by changing habits, and probably the only way to do that is to bankrupt folks with bad habits so they stop it…

Bubbles: the joy and the laughter

The efficient market hypothesis — that prices in a market immediately adjust to fully reflect new information as soon as it becomes available — is probably the primary foundation of the success of markets at allocating resources: eg, making the prices people are willing to pay at supermarkets influence what farmers produce and how much oil gets drilled to power trucks and trains to transport food around the country or the world. What’s interesting is that despite the copious evidence that it works in practice — food does make it to supermarkets both more reliably and more cheaply using free markets than alternatives — it’s clearly not actually true: prices do get completely out of whack with “reality” and you get a bubble, which ends up forcing pretty painful corrections when the eventually burst. It seems like avoiding bubbles would be a win, but it’s not usually clear when they’re actually happening (after all, big price increases could actually be an accurate indication of reality, right up until a crash proves they weren’t), and in some sense it’s not really even clear why they happen in the first place.

I had a thought the other day that it might be interesting to simulate an asset market to test out an idea that I’d been pondering, to see if bubbles and crashes occurred or not. It didn’t take long to get a pretty serious price crash:

crashBut after the crash, things bounced back okay and stayed pretty stable:

recoveryThat time series is twenty times as long as the first, so except for a bit of ups and downs, it looks like it might have actually stabilised at some sort of “fundamental” price. But sadly for our virtual traders, it turned out not:

instabilityThat time period’s about six times as long as the previous, so about 120 times as long as the initial crash snapshot. And it’s not really a market that looks very pleasant to participate in either — and going on just a little further ends up with what appears to be a permanent price crash:

endgameThe scale here is back to about the same as the initial recovery.

So what’s happening? As a pure simulation, all the behaviour here is implied by the initial setup. And that’s this: fifty simulated traders each start with $11,000, randomly choose an initial valuation for the assets available to trade, and based on that valuation purchase a number of those assets at $500 each. This is then followed by a series of rounds (the graphs above are from round 1 to 160,000), where each trader offers a buy and sell price for one asset. These prices are calculated by taking their current valuation for the assets, and adjusting it by two randomly calculated percentages r3, where r is between 1% and 51%, so they’re offering to buy for less than their valuation or sell for more than it. Those offers are collected, and if there’s any overlap (someone wants to buy for more than someone else wants to sell), a mid-point price is selected, and the trades are made at that price. If an individual trader is broke, their buy price is 0; if they don’t have any assets, their sell price is set to infinity.

Apart from the initial purchase, fundamentals have absolutely no influence on this market, or, more particularly the traders’ valuations of the worth of the assets being traded. There are no dividends, there’s no intrinsic worth to the assets, no monetary benefits or drawbacks to owning assets, no external influence after the initial setup, and it’s entirely zero-sum: any profits one trader makes come at an equal cost to some other trader.

So the traders try to out-interpret the market, by taking their previous estimated value and the price of the last trade, and extrapolating linearly. So if they thought it was worth $600, and the market thought it was worth $200 more at $800, they figure next round it will have another $200 increase and be worth $1000, and the same on the way down. If there weren’t any trades in the previous round, they’ll randomly change their valuation by between -5% and +5%. So when prices start going up, everyone keeps raising their prices for the next round, until nobody has enough money to buy anymore, and the prices reverse, until it’s down to just above zero, and then it’s back the other way. Medium-term stability only happens by good luck, long-term stability only happens when enough traders are broke that the ability of traders to randomly increase their valuations for even a few rounds in a row is heavily restricted.

Of course, that means every trader is explicitly refuting the efficient market hypothesis. If you do the opposite and assume it, ie have every trader immediately set their internal valuation to the market price, you get one round of trades setting a consensus price from trades based on each traders individual, private valuation, but then that’s it. No bubbles, no crashes, no periodic variation.

Which seems like it makes sense: if you’ve got a system that allows positive feedback loops on valuations, you’ll get prices rising or crashing; if the feedback’s limited and randomised, you’ll get both happening randomly. And “extrapolate the current trend” is definitely a positive feedback loop. I’d almost argue that all technical trading is positive-feedback: it assumes there are trends, then acts in ways that partially support the trend, and partially profit from the trend. If there’s an upward price trend, you buy, increasing demand, and supporting the upward price trend; if there’s a stabilising or downward price trend, you sell, dropping demand and supporting a downward price trend.

So I guess that’s my theory: bubbles and their corresponding crashes are a natural feature of markets that have a signficant number of technical traders, that is, traders who act based on their predictions of how other traders will act, rather than any inherent value they see in the things they’re trading. That might or mightn’t be an obvious conclusion. It seems kind-of obvious, but given the amount of purely technical advice in sharemarket books — teaching you not how to evaluate companies, so much as how you should expect the market to behave based on the same information everyone else has about how the market has been behaving; maybe it’s not.

If you could go a step further and say that the only profit technical investors can expect is from the losses of other technical investors, that could make things really interesting. I suspect something like it is close to true, if you can assert that your fundamental investors never individually make a loss. You can do that by assuming they have a personal, intrinsic valuation of the asset at $x, and will never buy one for more than that, or sell one for less than that, but that valuation is naturally going to change over time due to external factors (you become richer, so you’re willing to pay more for things; your priorities change and you value the asset more or less; the asset changes to become more or less useful/pleasing), and I don’t really see how to factor that in.

Eee box

So I fell for Zazz’s “Thingy of the Day” last week and ordered an Asus Eee Box. It arrived today, and is pretty respectable — the “screws onto the back of your LCD” form-factor is pretty sweet, and having SD cards as the only removable media seems pretty decent too. Built in wireless, decent number of ports, and it all looks good. It came with Xandros installed (as opposed to XP) which I’ve now replaced with Debian, though haven’t finished setting it up yet. I’m planning on trying to make it replace azure (my co-located server) entirely, though I’m not sure it’ll actually have quite the capacity to handle that. And there’s a few things I’d possibly like to make it do in addition too, like act as a MythTV server for the PS3, provide IPv6 addresses for my home network, work with my laptop so I can have a triple-head display over ethernet since my DVI/VGA ports can’t go that far, etc.