Tom White: "Hadoop: The Definitive Guide" Coming Soon

Friday, 1 May 2009

"Hadoop: The Definitive Guide" Coming Soon

After a busy couple of months I've finished the writing for "Hadoop: The Definitive Guide". It's now going through the production process at O'Reilly.

You can pre-order it on Amazon and O'Reilly. You can also get the Rough Cuts version from O'Reilly to read today, although it hasn't yet been refreshed with my latest draft (I hope that will happen in the next few days).

Here's the final chapter listing. Readers of earlier drafts will notice that the number of chapters has grown: this is because the elephantine MapReduce chapter has been split into three (chapters 6, 7, and 8) to make things more digestible.

Meet Hadoop
MapReduce
The Hadoop Distributed Filesystem
Hadoop I/O
Developing a MapReduce Application
How MapReduce Works
MapReduce Types and Formats
MapReduce Features
Setting Up a Hadoop Cluster
Administering Hadoop
Pig
HBase
ZooKeeper
Case Studies

The writing's done but I still have to package up the example code. I'll be doing this soon, and it will appear on the book's website.

9 comments:

David Dunwoody said...: Congratulations, Tom!

Having worked with a few other authors, I have some appreciation for how much effort it can be.

I look forward to buying my copy.; 6 May 2009 at 22:11
Steve Loughran said...: Congratulations for finishing it! I shall place my UK order in and hope to be in there with support calls shortly afterwards!; 6 May 2009 at 23:29
Paul Carey said...: Congrats Tom, I'm looking forward to reading it.; 7 May 2009 at 09:19
Tom White said...: Thanks everyone!

@Steve - that's helping out with the support calls, right :); 7 May 2009 at 09:50
Otis Gospodnetic said...: Tom, was it really only a couple of months? It took us over 12 months to write Lucene in Action! :); 7 May 2009 at 22:54
Amr said...: Congrats Tom!; 9 May 2009 at 19:32
Arun C Murthy said...: Congratulations Tom!; 12 May 2009 at 07:55
Harold Valdivia Garcia said...: Hi...
I bought your book, but I didnt find any info about how to do subclusters.

in my research a want to specialize some regions of my hadoop cluster for example Sorting. other region for only joins, ... other region for only group by,...

I dont know if you know what I mean.

Where can i post problem?

Thanks for all; 5 August 2009 at 18:17
Tom White said...: Hi Harold,

Most people run jobs across the whole cluster. In this way, you get to multiplex the work, which leads to better overall efficiency. There are a number of work schedulers available now (e.g. the fair share scheduler) which can be used to segregate jobs, and give them guaranteed resource allocations. See http://hadoop.apache.org/common/docs/r0.20.0/fair_scheduler.html for more details.

Regarding your question about where to ask these questions - see the mailing lists at http://hadoop.apache.org/common/mailing_lists.html.

Cheers,
Tom; 6 August 2009 at 16:28