Using thrift python client with HBase

In: HBase

22 Aug 2010

There are two useful tutorials (HBase wiki and Yaan’s blog) on the web devoted to this topic. But I think both of them missed few steps. In spite of following the tutorials, I found myself struggling with compiling thrift and python’s No module found errors. Hence this attempt.

You can use python client in two ways:
1) Use jython (HBase java client library can be directly accessed in python)
2) Using thrift

In this tutorial, I am going to explain how to use python and thrift to access HBase. Here is the summary of steps you will need to follow:
1) Download thrift
2) Install thrift dependencies
3) Compile and install thrift
4) Generate HBase thrift python module
5) Add HBase thrift python module to pythonpath
6) Start HBase thrift server
7) Use the client!

Following is the detailed explanation of the steps. I am assuming that you will be using ubuntu as your development environment. That’s what I use. I am also assuming that HBase is installed and you have HBASE_HOME defined in the environment.

1) Download thrift
Download thrift by clicking on the link embedded in this sentence.

Unzip the tar.gz file using tar -xvzf  thrift-0.3.0.tar.gz. Let’s say you unzipped it in /home/horcrux/Software/thrift-0.3.0/

2) Install thrift dependencies
Thrift requires many packages for compilation. It requires boost c++ libraries, flex, mkmf and other build essentials. You can install all the dependencies by executing the following commands. ruby1.8-dev is to get mkmf installed.

sudo apt-get install build-essential
sudo apt-get install libboost1.40-dev
sudo apt-get install flex
sudo apt-get install ruby1.8-dev

3) Compile and install thrift
Execute the following commands to compile and install thrift

cd /home/horcrux/Software/thrift-0.3.0/
sudo make install

Now let’s install thrift python. The following command will make sure that the thrift module is in your pythonpath.
cd /home/horcrux/Software/thrift-0.3.0/lib/py
sudo python install

4) Generate HBase thrift python module
Once this is done, you should have thrift in your path. You should be able to execute thrift command from anywhere. Now let’s generate the Hbase thrift modeule from the Hbase.thrift config file.

thrift --gen py $HBASE_HOME/src/java/org/apache/hadoop/hbase/thrift/Hbase.thrift

This command will create gen-py folder in your thrift folder (/home/horcrux/Software/thrift-0.3.0).

5) Add HBase thrift python module to pythonpath
We need to add gen-py folder to python path. You can do so by multiple ways
a) You can add it directly at the top of your python file
import sys

b) If you are using an IDE like pydev, add it as a pythonpath source folder.
c) add it to pythonpath environemnt variable in your .bashrc.
export PYTHONPATH=$PYTHONPATH:/home/horcrux/Software/thrift-0.3.0/gen-py

6) Start HBase thrift server
You can simply start the thrift server by executing the following command:
$HBASE_HOME/bin/hbase thrift start
This will start HBase thrift server on port 9090 (default port).

7) Use the client!
Here is a sample code that will print all the table names on your HBase server:

from thrift.transport.TSocket import TSocket
from thrift.transport.TTransport import TBufferedTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase

transport = TBufferedTransport(TSocket('localhost', 9090))
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)

That’s it.

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Rob

    Great work! Any examples on how to scan through rows in a table?


  • John

    Great work! By the way,I have a question that how to command Hbase using C++?

  • jon

    This tutorial really helped me out a lot, thank you. Some things I found that helped me are:
    1. Remember to install Hbase (this one will hold you up if you’re not paying attention…)
    2. also do “sudo apt-get install python2.7-dev”
    3.if you can’t find Hbase.thrift look in “[hbaseroot]/src/main/resources/org/apache/hadoop/hbase/thrift/”
    I bet it will be there.

    The problem I’m having now is that python cannot locate the thrift modules. I would export them to the python path but I’m not sure where they wandered off to.

  • jon

    Ahh I did:

    find / *|grep /thrift/__init__

    to find the thrift module.
    I found it and did:

    export PYTHONPATH=$PYTHONPATH:/usr/lib/python2.7/site-packages/thrift

    Port 9090 was also taken so I did this:

    sh ./bin/ start thrift -p 55575

    then in my python script I just connected to 55575 instead of 9090.
    Got everything working. Thanks again!

  • Phishpit

    The path was very helpful. Thanks Jon

  • Wouter Bolsterlee

    There is also HappyBase, a developer-friendly Python library to interact with Apache HBase:

  • Vincent Lepage

    i’ll try it, we are trying to feed large volumes of data in HBase from Django… I might give a try to HappyBase, which has good reviews

  • Mnikhil

    HappyBase is much better (although it still uses Thrift) but removes that thrift cruft in the programming… I am happier to use it for my needs :-)

  • Car Window Stickers

    Wow what great explaining I think after reading your blog now I can easily use it !!

  • Anonymous

    awesome post . thanks for sharing it

  • Riki

    Using this particulars to make the base problem ease as well as finding the base smother in all the way above mentioned course of actions are certainly needful along with productive in every way. Hbase is always is difficult one to operate as well as inaccurate in terms of finding the better and ease to make it the perfect way out but using thrift python helping it to make the best process in every way . The boomerangreview also given.Thanks for sharing.

  • csnjh

    We are the professional clothing manufacturer and clothing supplier, so we manufacture kinds of custom clothing manufacturer. welcome you to come to our china clothing manufacturer and clothing factory.

blog comments powered by Disqus

WhyNosql subscription by Email

E-Mail Address:

Top Commentators

Individuals who contribute to WhyNoSQL on a regular basis, through commenting, will be rewarded here. When will you be on this list?
  • No commentators.