Cassandra, DynamoDB, HBase, Hadoop and Big Data in general
This year’s Hadoop Summit was the biggest ever. There were 2200 people. Barring the first day lunch hiccup for not having food for vegetarians, everything went on smoothly. Storm is getting bigger. Nathan Marz’s talk was as good as his other talks. There was nothing special, but I think the response was noteworthy. Nathan Marz [...]
Amazon recently announced DynamoDB. I have to admit, this time Amazon might have gotten it right! SimpleDB was simply a disaster. But from whatever I have read so far DynamoDB looks really promising.
In: HBase
14 Jul 2011Key HBase community members advise people not to host their HBase cluster on EC2. And they have good reasons for advising so. But in this post I am going to explain why we decided to host our HBase cluster on EC2 and why we continue to host it on EC2. When we began experimenting with [...]
In: HBase
30 May 2011Recently we learned few interesting lessons about architecting HBase on EC2. Since the lessons we learned are more related to EC2 than HBase, I decided to post it on my Amazon Web Services related blog. For those who are planning to host their HBase/Hadoop systems on EC2, it’s a must read – http://aws-musings.com/hbase-on-ec2-using-ebs-volumes-lessons-learned/
In: HBase
10 Dec 2010Distributed counters is an important functionality many distributed databases offer. For an ad network distributed counters are important for many reasons. Real time ad impressions and click data can be used for ad optimization. HBase and Cassandra both support distributed counters. Ultimately, whatever system you may choose, scaling distributed counters remains a challenge. It boils [...]
Each HBase region server hosts many regions – possibly hundreds or even thousands. How do you find out which one of them is a hotspot? We saw that CPU on one of the region server was shooting up at peak traffic. But the region server had 4 tables (and hundreds of regions) and their access [...]
In: HBase
22 Aug 2010There are two useful tutorials (HBase wiki and Yaan’s blog) on the web devoted to this topic. But I think both of them missed few steps. In spite of following the tutorials, I found myself struggling with compiling thrift and python’s No module found errors. Hence this attempt.
In: HBase
12 Jun 2010How do we accomplish real time reports in a big data system? What if you want to count ad impressions and give real time reports to your customers? HBase makes it really easy to accomplish Aggregation. I am going to tell you how we accomplished aggregation with HBase. The HTable class in HBase client API [...]
We are an ad network. We need to store impressions and clicks. We were evaluating various big data (or nosql -what ever you may choose to call it) systems for our new project. We had used HBase for past 8 months in an experimental product and are satisfied with it but the hype about Cassandra [...]