Problems worthy of attack prove their worth by hitting back. —Piet Hein

Friday 28 September 2007

A Java Servlet for Thrift

I've been playing around with Thrift (the new version that came out a few days ago), mainly to see how it might be used as a serialization mechanism in Hadoop, but also because the RPC looks useful. It doesn't come with a Java HTTP server transport, so I whipped up a servlet to make it easy. Exposing a service is then as simple as subclassing the base servlet to supply a Thrift processor to service the request. For the calculator example from the tutorial:
package server;

import tutorial.Calculator;

public class CalculatorTServlet extends TServlet {
public CalculatorTServlet() {
super(new Calculator.Processor(new CalculatorHandler()));
}
}


Invoking the service is easy - you just use the THttpClient transport. Using Thrift over HTTP allows you to use all your existing high-availability and failover infrastructure, which can be attractive. (But also see this Thrift mailing list thread which gives some more detail on how Facebook tackles high-availability and failover.)

Saturday 22 September 2007

Lucene Layer Cake

With a proposal for Pig (a query language interface for very large datasets) to go into the Apache Incubator, it looks like the Lucene family is growing once more. With so many members it gets harder to track the inter-project dependencies, so I created a quick family portrait.

Update: not long after writing this I noticed this patch to run Lucene on Hbase (a part of Hadoop) - so now my diagram's wrong. It was a bit of an oversimplification anyway - it's meant to give a rough idea of the building blocks of Lucene.

Tuesday 18 September 2007

Debugging with XStream

A little while back a colleague and I had a problem with some data in a large object graph in our system. There were tens of thousands of objects in the graph so we didn't fancy pointing a graphical debugger at it to find where the problem was residing. Most of the objects didn't define a toString method, so we used XStream to get a representation of the object graph and dump it to the console where we picked through it with command line tools and a XML editor. (We found the problem!) The idiom we used was

new XStream().toXML(objectGraph, System.out);

Thursday 13 September 2007

Ohloh's Visualizations

Ohloh seems to be positioning itself as the social networking site of the open source project world. It's also worth having a look at for its neat visualizations. I particularly like the sparklines showing commit activity for each committer on a project, the codebase history showing the number of lines of code over time, and the ability to compare projects graphically (a bit like Google Trends). These tools are great for getting a quick feel for a project that you can't really find from its website or source code repository.

Friday 7 September 2007

RESTful Web Services

RESTful Web Services by Leonard Richardson and Sam Ruby is a great book. It's about how to make the web more programmable, and tells you, through a great mix of theory and practical advice, how you can achieve this for the part of the web you're building.

It'll make you think about URLs. (Do you put state in query parameters or the path? See page 121.) It'll make you think about HTTP. (For example, the response code 201 Created, is used to show that the server created a new resource in response to a client POST request.) It'll make you think about the web.

I found it gave me a (conceptual) framework to design a RESTful API for an product we're building at Kizoom. Sometimes it can be a bit of a struggle to see how to make some operations RESTful (only four verbs remember!), but the design that emerged was actually very simple once I starting thinking about things the right way. Seems like James Strachan had a similar experience; after a rocky start he persevered and now has what looks like a great design for a Pure RESTful API to ActiveMQ via AtomPub.