Problems worthy of attack prove their worth by hitting back. —Piet Hein

Tuesday 12 February 2008

Forgotten EC2 instances

I noticed today that I had an EC2 development cluster running that I hadn't shut down from a few days ago. It was only a couple of instances, but even so, it was annoying. Steve Loughran had a good idea for preventing this: have the cluster shut itself down if it detects you go offline - by using your chat presence. You'd probably want to build a bit of a delay into it to avoid losing work due to some network turbulence, but it would work nicely for short lived clusters which are brought up simply to do a bit of number crunching. Alternatively, and perhaps more lo-tech, the cluster could just email you every few hours to say "I'm still here!".

I wonder how many forgotten instances are running at Amazon at any one time. Is there a mass calling of ec2-terminate-instances every month end when the owners see their bills?

Friday 1 February 2008

Apache Incubator Proposal for Thrift

There's a proposal for Thrift to go into the Apache Incubator. This seems to me to be a good move - there's increasing interest in Thrift - just look at the number of language bindings that have been contributed: Cocoa/Objective C, C++, C#, Erlang, Haskell, Java, OCaml, Perl, PHP, Python, Ruby, and Squeak at the last count. It's even fairly painless to compile on Mac OS X now, although it'd be nice to have a Java version of the compiler.

Also, there are some nice synergies with other Apache projects - it is already being used in HBase, and there are moves to make it easier to use in Hadoop Core as a serialization format (so MapReduce jobs can consume and produce Thrift-formatted data).

If the proposal is accepted it will be interesting to see what happens to Hadoop's own language-neutral record serialization package, Record I/O. The momentum is certainly with Thrift and discussions on the mailing list suggest that stuff will eventually be ported to use Thrift.