Using thrift python client with HBase

In: HBase

22 Aug 2010

There are two useful tutorials (HBase wiki and Yaan’s blog) on the web devoted to this topic. But I think both of them missed few steps. In spite of following the tutorials, I found myself struggling with compiling thrift and python’s No module found errors. Hence this attempt.

You can use python client in two ways:
1) Use jython (HBase java client library can be directly accessed in python)
2) Using thrift

In this tutorial, I am going to explain how to use python and thrift to access HBase. Here is the summary of steps you will need to follow:
1) Download thrift
2) Install thrift dependencies
3) Compile and install thrift
4) Generate HBase thrift python module
5) Add HBase thrift python module to pythonpath
6) Start HBase thrift server
7) Use the client!

Following is the detailed explanation of the steps. I am assuming that you will be using ubuntu as your development environment. That’s what I use. I am also assuming that HBase is installed and you have HBASE_HOME defined in the environment.

1) Download thrift
Download thrift by clicking on the link embedded in this sentence.

Unzip the tar.gz file using tar -xvzf  thrift-0.3.0.tar.gz. Let’s say you unzipped it in /home/horcrux/Software/thrift-0.3.0/

2) Install thrift dependencies
Thrift requires many packages for compilation. It requires boost c++ libraries, flex, mkmf and other build essentials. You can install all the dependencies by executing the following commands. ruby1.8-dev is to get mkmf installed.

sudo apt-get install build-essential
sudo apt-get install libboost1.40-dev
sudo apt-get install flex
sudo apt-get install ruby1.8-dev

3) Compile and install thrift
Execute the following commands to compile and install thrift

cd /home/horcrux/Software/thrift-0.3.0/
./configure
make
sudo make install

Now let’s install thrift python. The following command will make sure that the thrift module is in your pythonpath.
cd /home/horcrux/Software/thrift-0.3.0/lib/py
sudo python setup.py install

4) Generate HBase thrift python module
Once this is done, you should have thrift in your path. You should be able to execute thrift command from anywhere. Now let’s generate the Hbase thrift modeule from the Hbase.thrift config file.

thrift --gen py $HBASE_HOME/src/java/org/apache/hadoop/hbase/thrift/Hbase.thrift

This command will create gen-py folder in your thrift folder (/home/horcrux/Software/thrift-0.3.0).

5) Add HBase thrift python module to pythonpath
We need to add gen-py folder to python path. You can do so by multiple ways
a) You can add it directly at the top of your python file
import sys
sys.path.append('/home/horcrux/Software/thrift-0.3.0/gen-py')

or
b) If you are using an IDE like pydev, add it as a pythonpath source folder.
or
c) add it to pythonpath environemnt variable in your .bashrc.
export PYTHONPATH=$PYTHONPATH:/home/horcrux/Software/thrift-0.3.0/gen-py

6) Start HBase thrift server
You can simply start the thrift server by executing the following command:
$HBASE_HOME/bin/hbase thrift start
This will start HBase thrift server on port 9090 (default port).

7) Use the client!
Here is a sample code that will print all the table names on your HBase server:

from thrift.transport.TSocket import TSocket
from thrift.transport.TTransport import TBufferedTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase


transport = TBufferedTransport(TSocket('localhost', 9090))
transport.open()
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
print(client.getTableNames())

That’s it.

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Rob

    Great work! Any examples on how to scan through rows in a table?

    Thanks,
    Rob

  • John

    Great work! By the way,I have a question that how to command Hbase using C++?
    Thanks,
    John

  • jon

    This tutorial really helped me out a lot, thank you. Some things I found that helped me are:
    1. Remember to install Hbase (this one will hold you up if you’re not paying attention…)
    2. also do “sudo apt-get install python2.7-dev”
    3.if you can’t find Hbase.thrift look in “[hbaseroot]/src/main/resources/org/apache/hadoop/hbase/thrift/”
    I bet it will be there.

    The problem I’m having now is that python cannot locate the thrift modules. I would export them to the python path but I’m not sure where they wandered off to.

  • jon

    Ahh I did:

    find / *|grep /thrift/__init__

    to find the thrift module.
    I found it and did:

    export PYTHONPATH=$PYTHONPATH:/usr/lib/python2.7/site-packages/thrift

    Port 9090 was also taken so I did this:

    sh ./bin/hbase-daemon.sh start thrift -p 55575

    then in my python script I just connected to 55575 instead of 9090.
    Got everything working. Thanks again!

  • Phishpit

    The path was very helpful. Thanks Jon

  • http://uwstopia.nl/ Wouter Bolsterlee

    There is also HappyBase, a developer-friendly Python library to interact with Apache HBase: https://github.com/wbolster/happybase

  • Vincent Lepage @alephd.com

    i’ll try it, we are trying to feed large volumes of data in HBase from Django… I might give a try to HappyBase, which has good reviews

  • Mnikhil

    HappyBase is much better (although it still uses Thrift) but removes that thrift cruft in the programming… I am happier to use it for my needs :-)

  • http://www.badenewby.co.uk/car-window-stickers Car Window Stickers

    Wow what great explaining I think after reading your blog now I can easily use it !!

blog comments powered by Disqus

WhyNosql subscription by Email

Name:
E-Mail Address:

Top Commentators

Individuals who contribute to WhyNoSQL on a regular basis, through commenting, will be rewarded here. When will you be on this list?
  • No commentators.