Archive for category BigData
What’s New and Upcoming in HDFS
Another great presentation on Hadoop!
http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache-hadoop-forum
Big Data Security with Hadoop
Great presentation!
http://www.slideshare.net/cloudera/big-data-security-apache-accumulo-echeveriafederal-big-data-apache-hadoop-forum
How to check on HBase Master Status?
There is no direct way to find the status. There are two ways I found that would clearly indicate the status:
- Web UI - Comes with the installation.
- Zookeeper cli
As mentioned in the previous post, “The HBase master publishes its location to clients via Zookeeper. This is done to support multimaster operation (failover).”
Its important to note that any change sto HBase Master will take place within the timeout mentioned in zookeeper.session.timeout.
If you’re like us and would like to create scripts to check on the status of the HBase Master then Zookeeper commands is the best way to find the HBase Master and the status. Here are some useful commands:
Connect to Zookeeper through HBase:
hbase zkcli -server ${ZKHOST}:${ZKPORT}
ZKHost: Zookeeper Host
ZKPORT: Zookeeper Port. Default is 2181.
Find Master Node controlling zookeeper:
- Once connected to Zookeeper using above command then you can run the following to get the Master Node which is controlling Zookeeper at this time:
get /hbase/master
Sample output:
[zk: localhost:2181(CONNECTED) 0] get /hbase/master
�18182@nodedb2.hbdomainnodedb2.hbdomain,60000,1361996502780
cZxid = 0xa0000000a
ctime = Wed Feb 27 12:21:22 PST 2013
mZxid = 0xa0000000a<br />
mtime = Wed Feb 27 12:21:22 PST 2013
pZxid = 0xa0000000a
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x23d1d50b3980001
dataLength = 65
numChildren = 0
If none of the master nodes are available then you get the following error:
[zk: localhost:2181(CONNECTED) 1] get /hbase/master
Node does not exist: /hbase/master
Find backup Master Nodes waiting to take control:
- Once connected to Zookeeper through HBase then you can run the following command to return the list of Backup Master Nodes:
ls /hbase/backup-masters
Sample Output:
[zk: localhost:2181(CONNECTED) 7] ls /hbase/backup-masters
[db-hb1.hbdomain,60000,1362010238030]
If backup Master Nodes are not available or down:
[zk: localhost:2181(CONNECTED) 8] ls /hbase/backup-masters
[]
HBase Master
HBase Architecture
- For more information on HBase Architecture, refer to this link
HBase Master
- HMaster is the implementation of the Master Server. The Master server is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes.
- The multi-master feature introduced in 0.20.0 does not add cooperating Masters; there is still just one working Master while the other backups wait. For example, if you start 200 Masters only 1 will be active while the others wait for it to die. The switch usually takes zookeeper.session.timeout plus a couple of seconds to occur.
http://wiki.apache.org/hadoop/Hbase/MultipleMasters
At Start Up:
- If run in a multi-Master environment, all Masters compete to run the cluster. If the active Master loses its lease in ZooKeeper (or the Master shuts down), then then the remaining Masters jostle to take over the Master role.
- The HBase master publishes its location to clients via Zookeeper. This is done to support multimaster operation (failover). So if the HBase master self-discovers its location as a localhost address, then it will publish that. Region servers or clients which go to Zookeeper for the master location will get back an address in that case only useful if they happen to be co-located with the master.
http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg07296.html
What happens if the Hmaster goes down?
- A common dist-list question is what happens to an HBase cluster when the Master goes down. Because the HBase client talks directly to the RegionServers, the cluster can still function in a “steady state.” Additionally, per Section 9.2, “Catalog Tables” ROOT and META exist as HBase tables (i.e., are not resident in the Master). However, the Master controls critical functions such as RegionServer failover and completing region splits. So while the cluster can still run for a time without the Master, the Master should be restarted as soon as possible.
Big Data – Zookeeper, HDFS, HBase…
It seems BigData is all everyone is talking about these days. So I’m going to start my posts on the subject by mentioning some basics about the platform:
- Main Apache website: http://zookeeper.apache.org/
- Insightful Link:
http://blog.cloudera.com/blog/2013/02/how-to-use-apache-zookeeper-to-build-distributed-apps-and-why/
- http://hbase.apache.org/
- There are so many articles and blogs about why HBase is a great tool to use and how its superior to the RDBMS databases in performance. I’ve included couple of links to get things started below:
http://hstack.org/why-were-using-hbase-part-1/
http://www.stumbleupon.com/blog/why-we-love-hbase/
- Couple of tools that bundle all of the above (and more) together and make the installation, configuration and management easier are below. The main advantage of these tools is avoiding the painful task of managing each server and configuration file manually. Using these tools will enable the DBAs to update the configuration once through the UI and it will get deployed on all servers.
Cloudera
Hortonworks