## Blosxom to WordPress

So a quick blog on the switch to WordPress.

The main reason I decided to switch was the ability to have comments; which you can do in blosxom with various plugins, but then you have to worry about spam yourself, whereas with WordPress comments (and comment moderation) are standard and plugins to deal with spam are pretty trivial. Basically, I like having somewhere to chat with like-minded people about various tech things that interest me, and the other places I’ve done that — notably Debian mailing lists and across blogs, just don’t come close to being satisfying anymore. Lists don’t really work, because, well Debian lists have been sucking for ages now and aren’t likely to improve, and cross-blog discussions don’t tend to be very interesting because these days the ones I’m interested in are mostly aggregated onto planets, and end up looking just like mailing lists, with the same old topics discussed by the same old people with the same old conflicts. Boring.

I thought I had a second reason, but I seem to have forgotten what it was now. The two possibilities I can think of are getting my blog compliant with web standards (doctypes, properly closing tags and the like), and doing private posts. In any event, the third reason was that I’d been using Blosxom for ages now, and it was just time for a change. It’s also nice that a bunch of other folks are actively using (not to mention developing) WordPress, and there’s been a few other folks switching to it lately.

Switching to WordPress turned out to not be too hard. Unfortunately there isn’t an importer for blosxom like there seems to be for a bunch of other blogging software/sites; but that makes reasonable sense given the whole point of blosxom is that it’s easy to hack on to make it work just the way you want. I’d customised the way I wrote links, eg. So I googled around, and came across DeWitt Clinton’s blog post about his conversion. His approach was to run a special hacked up script over his blosxom directory to generate, essentially, an RSS feed containing all his posts ever, and then use WordPress’s RSS importer.

That didn’t work for me for two reasons. One was my special plugins, particularly the funny link notation I made up and the update plugin I was using; no big deal I knew I was going to have to do something about that. The other was that my blogs weren’t remotely HTML4 compliant, so when the script tried to validate them it would complain, and even if I disabled that feature, when I tried importing the results into WordPress, it would complain.

Since I still don’t have any intention of writing correct markup by hand, finding an automatic way of fixing that was the biggest trick. HTML tidy came to my rescue though — in particular, invoking:

tidy -q -asxhtml --show-errors 0 --show-body-only auto -wrap 0

seems to convert the body of a blosxom post into a legitimate HTML chunk (as opposed to a full HTML page) with minimal dramas.

I did have a few other problems — the original script was invoking $parser->Parse( "<xml>$blog</xml>") which unfortunately is validating stuff as XML rather than HTML, which apparently means it doesn’t know about things like “&ge;” meaning “?“. So I eventually figured out that adding a DOCTYPE tag, and changing the enclosing tags to <html> instead of <xml> would get me what I want. And it pretty much did.

That just left dealing with my plugins — which really just meant rewriting them as simply as possible into the script. A cleverer solution would probably invoke the plugins from the script with all the right parameters so it would work for any plugins people might be using, but I just wasn’t fussed enough. A few tests, and the final script seemed to work well enough, with the output importing properly into WordPress (right content, right timestamps, roughly right categories). The testing got a bit annoying — trying to delete all 300-odd posts to reimport the next try got annoying because I could only seem to do 20 at a time, but not annoying enough to care about. I tried not deleting them to see if WordPress would notice the duplicates and ignore/overwrite them, but it just let them appear twice, giving me 600-odd posts to delete before the next trial. Doh.

I also got rid of the whitespace compression the original script was doing, and I think that’s everything. For anyone interested, here’s my version of blosxom2wp.

As far as WordPress itself’s concerned, so far I’m pretty happy with it. It did force me to install mysql, instead of using postgresql like I’m used to — apparently there’s a fork of WordPress that does psql, but fundamentally it’s database access just isn’t very general. The Debian packages (whether you’re looking at stable, testing, unstable or experimental) were out of date, as were the Ubuntu ones, I think, and that was even before the new release (which hasn’t been uploaded to Debian either, at least as I write this). Installing it by hand wasn’t a big deal, though the packaged version does include some nice hackery to allow multiple blogs to use a single WordPress installation, which is nice, and I might look into if either David or Clinton decide they want to switch to WordPress too, or if I decide I want another (separate) blog for some reason.

What else? I didn’t like the default themes much, and switched to a pretty random and simple one pretty quickly. I guess I might try doing a customish theme some day, but probably only when I’m sure I won’t be forced into worrying about writing valid HTML or CSS by hand. The only interesting plugin I’m using is Ozh’s Better Feed (recommended via a blogger on Planet Ubuntu), pretty much entirely so there’s a link to comments included in my RSS feed.

The biggest drawback is probably the feed spamming — whether to planets or to places like Google Reader or FeedBurner. It’s a real shame the way that happens, but what can you do? As far as I can see, not much. I lost my full category heirarchy during import, I think; I seem to only have one level instead of many now. I don’t think I care though. On the other hand, I can’t say I’m too fond of the way you insert links with GUI editors — highlight link text, click link button, select url from some other window, delete “http://” string, paste the url, click ok — seems a bit tedious to me. I presume I’m missing something but I also can’t seem to put stuff in <code> tags without switching from Visual mode to HTML view, either. Weird. I guess I’m also a little worried about it’s security and such, but hopefully keeping up with upstream will work reasonably well.

On the upside, automatic support for drafts and scheduled publishing is nice, automatic revision control isn’t bad, not having to worry about accidently changing the timestamp of old posts when fixing a typo is wonderful, and having the blog engine manage pictures and scripts you’d like to include is pretty nice too. Gravatars (and identicons) are kinda cool too.

All up, I’m pretty happy with it so far.