Cassandra, DynamoDB, HBase, Hadoop and Big Data in general
In: Hadoop5 Jul 2011
Hadoop Summit is always interesting for Hadoopers. You get to learn the latest and greatest in Hadoop world and meet the people behind projects in the Hadoop ecosystem. In this post, I have tried to share my takeaways.
Currently there are many distributions of Hadoop floating around. Besides the main Apache Hadoop distribution, there is Cloudera, Yahoo, IBM and even Amazon uses there own distribution for their Elastic Map Reduce Service. All these distributions were born becuase the main Apache distribution is not good enough. Yahoo is now launching a separate company – Hortonworks to fix this problem. It is essentially going to be a Cloudera competitior but before they start providing support to clients, they are going to fix the main Apache Hadoop distribution. This is certainly a good news for the Hadoop community. Not only Cloudera will have more competition (and hence more support options for Hadoop), but also there will be a company focusing on making the main Apache distribution robust.
The first day began with Hortonworks sharing their plans for Hadoop in upcoming year. They clarified that they don’t have any plans to lauch a paid, enterprise version yet. You can find their presentations here. Facebook’s Kartik Ranganathan presented how Facebook uses HBase for the messaging fucntionality. He explained the architecture and went into it in great detail. HBase is certainly gaining a lot of momentum since Facebook has chosen it over other NoSQL technologies. Amongst others I liked Matei Zaharia’s presentation on the Spark Project. The Spark Project aims to provide a framework for faster big data computing. It uses the Mesos Cluster Manager. It loads data in memory to run the computation. It is especially suited for use cases where you are running computations on the same set of data again and again. Once the data is loadded into (distributed) memory, the computions can be done in seconds! Besides that there were some other noteworty presentations too. MapR is not an open source project but the product looks very solid. In my opinion they will emerge as an enterprise level alternative to Hadoop for running map reduce jobs. Their framework eliminates single point of failiure in HDFS -the Namenode, by distributing Namenode kind of metadata work across all the nodes. The framework is much faster than HDFS and provides true snapshoting capabilities besides the nicer, web based interface. MapR is also fully compatible with map reduce jobs written for Hadoop. Yahoo’s Alan Gates introduced HCatalog to everybody. HCatalog is a new Apache incubator which aims to bring interoperatbility across Pig, Map Reduce, Streaming and Hive. It’s certainly going to solve problems of the companies using multiple Hadoop world technologies. Importing data from HDFS to Hive/Pig can be reduced or saved using HCatalog.
To me the second day was better than the first day. Officially the conference was only one day. Various projects in the Hadoop ecosystem were asked to arrange their series of presentations on the next day. I attended the HBase track as we rely on HBase heavily in our company. Three presentations were very noteworthy amongst the six presentations on the day. Rocketfuel – an ad network from Redwood Shores presented their use of HBase. The interesting part was to learn how they tweaked their HBase cluster to obtain sub 20 ms real time performance. yfrog’s Jack Levin also presented their lessons learned. You can find his presentation here. He also shared their Ganglia graphs with everybody. Lars George presented his experience of writing the Oreily book – HBase – The Definitive Guide. The book will be out soon but meanwhile it’s available here for review. Overall, it was an excellent day for an HBaser like me. I must congratulate Micheal Stack (HBase project leader) for organizing a very useful track.
Overall the conference was a great experience and all the people who missed this year should seriously think about attending the next conference.