Archive for category BigData

What’s New and Upcoming in HDFS

Another great presentation on Hadoop!

http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache-hadoop-forum

No Comments

Big Data Security with Hadoop

Great presentation!

http://www.slideshare.net/cloudera/big-data-security-apache-accumulo-echeveriafederal-big-data-apache-hadoop-forum

1 Comment

How to check on HBase Master Status?

There is no direct way to find the status. There are two ways I found that would clearly indicate the status:

  1. Web UI - Comes with the installation.

  2. Zookeeper cli

As mentioned in the previous post, “The HBase master publishes its location to clients via Zookeeper. This is done to support multimaster operation (failover).”

Its important to note that any change sto HBase Master will take place within the timeout mentioned in zookeeper.session.timeout.

If you’re like us and would like to create scripts to check on the status of the HBase Master then Zookeeper commands is the best way to find the HBase Master and the status. Here are some useful commands:

Connect to Zookeeper through HBase:

  • hbase zkcli -server ${ZKHOST}:${ZKPORT}
    ZKHost: Zookeeper Host
    ZKPORT: Zookeeper Port. Default is 2181.

Find Master Node controlling zookeeper:

  • Once connected to Zookeeper using above command then you can run the following to get the Master Node which is controlling Zookeeper at this time:

    get /hbase/master

    Sample output:

    [zk: localhost:2181(CONNECTED) 0] get /hbase/master
    �18182@nodedb2.hbdomainnodedb2.hbdomain,60000,1361996502780
    cZxid = 0xa0000000a
    ctime = Wed Feb 27 12:21:22 PST 2013
    mZxid = 0xa0000000a<br />
    mtime = Wed Feb 27 12:21:22 PST 2013
    pZxid = 0xa0000000a
    cversion = 0
    dataVersion = 0
    aclVersion = 0
    ephemeralOwner = 0x23d1d50b3980001
    dataLength = 65
    numChildren = 0


    If none of the master nodes are available then you get the following error:

    [zk: localhost:2181(CONNECTED) 1] get /hbase/master
    Node does not exist: /hbase/master


Find backup Master Nodes waiting to take control:

  • Once connected to Zookeeper through HBase then you can run the following command to return the list of Backup Master Nodes:

    ls /hbase/backup-masters

    Sample Output:

    [zk: localhost:2181(CONNECTED) 7] ls /hbase/backup-masters
    [db-hb1.hbdomain,60000,1362010238030]

    If backup Master Nodes are not available or down:

    [zk: localhost:2181(CONNECTED) 8] ls /hbase/backup-masters
    []

No Comments

HBase Master

HBase Architecture

  • For more information on HBase Architecture, refer to this link

HBase Master

  • HMaster is the implementation of the Master Server. The Master server is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes.

  • The multi-master feature introduced in 0.20.0 does not add cooperating Masters; there is still just one working Master while the other backups wait. For example, if you start 200 Masters only 1 will be active while the others wait for it to die. The switch usually takes zookeeper.session.timeout plus a couple of seconds to occur.
    http://wiki.apache.org/hadoop/Hbase/MultipleMasters

At Start Up:

  • If run in a multi-Master environment, all Masters compete to run the cluster. If the active Master loses its lease in ZooKeeper (or the Master shuts down), then then the remaining Masters jostle to take over the Master role.

  • The HBase master publishes its location to clients via Zookeeper. This is done to support multimaster operation (failover). So if the HBase master self-discovers its location as a localhost address, then it will publish that. Region servers or clients which go to Zookeeper for the master location will get back an address in that case only useful if they happen to be co-located with the master.
    http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg07296.html

  • What happens if the Hmaster goes down?
  • A common dist-list question is what happens to an HBase cluster when the Master goes down. Because the HBase client talks directly to the RegionServers, the cluster can still function in a “steady state.” Additionally, per Section 9.2, “Catalog Tables” ROOT and META exist as HBase tables (i.e., are not resident in the Master). However, the Master controls critical functions such as RegionServer failover and completing region splits. So while the cluster can still run for a time without the Master, the Master should be restarted as soon as possible.

No Comments

Big Data – Zookeeper, HDFS, HBase…

It seems BigData is all everyone is talking about these days. So I’m going to start my posts on the subject by mentioning some basics about the platform:

No Comments