Problems worthy of attack prove their worth by hitting back. —Piet Hein

Friday, 1 May 2009

"Hadoop: The Definitive Guide" Coming Soon

After a busy couple of months I've finished the writing for "Hadoop: The Definitive Guide". It's now going through the production process at O'Reilly.

You can pre-order it on Amazon and O'Reilly. You can also get the Rough Cuts version from O'Reilly to read today, although it hasn't yet been refreshed with my latest draft (I hope that will happen in the next few days).

Here's the final chapter listing. Readers of earlier drafts will notice that the number of chapters has grown: this is because the elephantine MapReduce chapter has been split into three (chapters 6, 7, and 8) to make things more digestible.

  1. Meet Hadoop
  2. MapReduce
  3. The Hadoop Distributed Filesystem
  4. Hadoop I/O
  5. Developing a MapReduce Application
  6. How MapReduce Works
  7. MapReduce Types and Formats
  8. MapReduce Features
  9. Setting Up a Hadoop Cluster
  10. Administering Hadoop
  11. Pig
  12. HBase
  13. ZooKeeper
  14. Case Studies
The writing's done but I still have to package up the example code. I'll be doing this soon, and it will appear on the book's website.

9 comments:

David Dunwoody said...

Congratulations, Tom!

Having worked with a few other authors, I have some appreciation for how much effort it can be.

I look forward to buying my copy.

Steve Loughran said...

Congratulations for finishing it! I shall place my UK order in and hope to be in there with support calls shortly afterwards!

Paul Carey said...

Congrats Tom, I'm looking forward to reading it.

Tom White said...

Thanks everyone!

@Steve - that's helping out with the support calls, right :)

Otis Gospodnetic said...

Tom, was it really only a couple of months? It took us over 12 months to write Lucene in Action! :)

Amr said...

Congrats Tom!

Arun C Murthy said...

Congratulations Tom!

Harold Valdivia Garcia said...

Hi...
I bought your book, but I didnt find any info about how to do subclusters.

in my research a want to specialize some regions of my hadoop cluster for example Sorting. other region for only joins, ... other region for only group by,...

I dont know if you know what I mean.

Where can i post problem?

Thanks for all

Tom White said...

Hi Harold,

Most people run jobs across the whole cluster. In this way, you get to multiplex the work, which leads to better overall efficiency. There are a number of work schedulers available now (e.g. the fair share scheduler) which can be used to segregate jobs, and give them guaranteed resource allocations. See http://hadoop.apache.org/common/docs/r0.20.0/fair_scheduler.html for more details.

Regarding your question about where to ask these questions - see the mailing lists at http://hadoop.apache.org/common/mailing_lists.html.

Cheers,
Tom