How to convert Scientific Notation number to Decimal using bash?

Turns out this is pretty easy to do using awk and printf in bash!

Lets say you have a scientific value of “2.1061863333e+03” and you wish to convert it to decimal. You can do something like below:

>> echo "2.1061863333e+03" | awk {' printf "%12.2fn",$1 '}
     2106.19

  • Obviously you can do the same if you have series of values in a file using cat.
  • I found this specially useful when obtaining values from RRDs.

No Comments

Graphite: How to modify the default retention?

To modify the default retention for all metrics modify storage-schemas.conf file.
So for example if you have Graphite installed at /opt/graphite then do the following:

vi /opt/graphite/conf/storage-schemas.conf

In version 0.9.10 there should be an entry for [default_1min_for_1day] which holds the .* pattern. You can obviously change the name but in order to apply the schema to all metrics you need to keep the metrics as this. You can also start defining different schemas for different metrics. In this case we’re going to change the retention for all metrics. Here’s the original retention in storage-schemas.conf.example:

[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

The following changes the retention policy for all metrics to keep every 1 min for 365 days and afterwards one datapoint every 5 mins for 3 years.

[default_1min_for_1day]
pattern = .*
retentions = 60s:365d,300s:1095d

  • Its important to note that the higher retention policy the bigger disk space needed to store the data.

  • Graphite looks at the schemas and matches them to a pattern. So as it is reading the file it will apply the schema as it matches it in the file. So as I understand it the order of schemas will matter.

  • Any existing metrics created will not automatically adopt the new schema. You must use whisper-resize.py to modify the metrics to the new schema. The other option is to delete existing whisper files (/opt/graphite/storage/whisper) and restart carbon-cache.py for the files to get recreated again.

  • Restart of carbon-cache.py is required after making the change.

No Comments

What’s New and Upcoming in HDFS

Another great presentation on Hadoop!

http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache-hadoop-forum

No Comments

Big Data Security with Hadoop

Great presentation!

http://www.slideshare.net/cloudera/big-data-security-apache-accumulo-echeveriafederal-big-data-apache-hadoop-forum

1 Comment

How to integrate Ganglia 3.3.x+ and Graphite 0.9.x?

As of Ganglia 3.3.x the Graphite Integration plugin was integrated into gmetad. So if configured properly, when gmetad gathers the data it writes to RRDs and sends the metrics to Graphite.

Now, you might ask why would you want to integrate with Graphite?
Better graphs, more calculation options such Standard Deviation, Moving Average, many other tools that integrate to Graphite such as Tattle or Dynamic Graphing tools and much much more are reasons why you would want to integrate with Graphite.

Why would you not simply go to Graphite and skip Ganglia?
Well, in our case we needed Ganglia because we’re using HBase and Cloudera and unfortunately Cloudera provides integration to Ganglia and I believe JMX.. But if you have the option of going directly to Graphite then go for it. The Graphite website provide pretty good instructions for the install..

Finally, to integrate Ganglia 3.3.x+ with Graphite simply add the following lines to your gmetad.conf file.
carbon_server  server_name/ip
carbon_port 2003
graphite_prefix "ganglia"
  • Carbon_Port is not mandatory and can be omitted. The default port for Carbon is 2003 so if you haven’t changed it in Graphite settings then you can skip it.
  • Graphite_prefix is the name that all your clusters and metrics will reside under. Its just a directory structure.
  • You must restart gmetad service for changes to take effect.
    service gmetad restart
  • Once you restart the service you must monitor /var/log/messages for any errors.

No Comments

Installing Ganglia 3.5.x

Notes:

  • The instructions were created using CentOS 6.2 OS
  • Ganglia is made up of few components: gmetad, gmond and web UI
  • The Server includes all 3 components but the Client only includes gmond
  • These steps are for both Server and Clients. At times the instructions splits to Server and Client. If a step doesn’t include Server or Client then you can assume that the steps is for both Server and Client install.

Pre-install Steps:
  1. Selinux: Either disable it or make appropriate changes to allow the install to go through and the application will be able to listen on appropriate ports.To stop selinux permanently modify
    /etc/sysconfig/selinux
    and change
    selinux
    to
    disabled
    .
  2. IPTables: Same as Selinux. Disable it or allow ports through.Use the following commands to disable IPTables:
    service iptables stop;
    chkconfig iptables off;
    chkconfig --list iptables;
    iptables -F
    service iptables save

Installing Dependencies:

  • Server Components
    • Use Yum to install the following packages:

      • httpd.x86_64
      • httpd-devel.x86_64
      • php.x86_64
      • php-cli.x86_64
      • php-devel.x86_64
      • php-gd.x86_64
      • php-mysql.x86_64
      • php-pear.noarch
      • rrdtool-php.x86_64
      • rrdtool-devel.x86_64
      • rrdtool.x86_64
      • rrdtool-perl.x86_64
      • glibc.x86_64
      • glib2-devel.x86_64
      • glibc-devel.x86_64
      • glib2.x86_64
      • glibc-utils.x86_64
      • glibc-common.x86_64
      • gcc.x86_64
      • gcc-c++.x86_64
      • libgcc.x86_64
      • compat-gcc-34.x86_64
      • pcre.x86_64
      • pcre-devel.x86_64
      • python.x86_64
      • rrdtool-python.x86_64
      • yum install httpd.x86_64 httpd-devel.x86_64 php.x86_64 php-cli.x86_64 php-devel.x86_64 php-gd.x86_64 php-mysql.x86_64 php-pear.noarch rrdtool-php.x86_64 rrdtool-devel.x86_64 rrdtool.x86_64 rrdtool-perl.x86_64 glibc.x86_64 glib2-devel.x86_64 glibc-devel.x86_64 glib2.x86_64 glibc-utils.x86_64 glibc-common.x86_64 gcc.x86_64 gcc-c++.x86_64 libgcc.x86_64 compat-gcc-34.x86_64 pcre.x86_64 pcre-devel.x86_64 python.x86_64 rrdtool-python.x86_64
  • Client Components
    • Use Yum to install the following packages:
      • glibc.x86_64
      • glib2-devel.x86_64
      • glibc-devel.x86_64
      • glib2.x86_64
      • glibc-utils.x86_64
      • gcc-c++.x86_64
      • compat-gcc-34.x86_64
      • pcre.x86_64
      • pcre-devel.x86_64
      • python.x86_64
      • expat.x86_64
      • expat-devel.x86_64
      • apr.x86_64
      • apr-util.x86_64
      • apr-devel.x86_64
      • apr-util-devel.x86_64
      • gcc.x86_64
      • libgcc.x86_64
      • gcc-objc.x86_64
      • yum install glibc.x86_64 glib2-devel.x86_64 glibc-devel.x86_64 glib2.x86_64 glibc-utils.x86_64 gcc-c++.x86_64 compat-gcc-34.x86_64 pcre.x86_64 pcre-devel.x86_64 python.x86_64 expat.x86_64 expat-devel.x86_64 apr.x86_64 apr-util.x86_64 apr-devel.x86_64 apr-util-devel.x86_64 gcc.x86_64 libgcc.x86_64 gcc-objc.x86_64
Install Ganglia (gmetad/gmond):
  1. Install libconfuse RPMS:
    rpm --ivh libconfuse--2.6-2.el6.rf.x86_64.rpm libconfuse--devel--2.6-2.el6.rf.x86_64.rpm
  2. Decompress Ganglia gz file to /opt directory:
    tar --xzvf ganglia-3.5.0.tar.gz -C /opt
  3. Change directory to /opt and rename ganglia-3.5.0 to ganglia:
    cd /opt;
    mv ganglia-3.5.0 ganglia;
    cd ganglia;
  4. Run Configure command:
    • Server (with gmetad):
      ./configure --with-libpcre=no --with-gmetad
    • Client:
      ./configure --with-libpcre=no
  5. Run make and make install
    make
    make install
  6. (Server) Create RRD directory. Note that we run ganglia as nobody so here we change the ownership to nobody user.
    mkdir -p /var/lib/ganglia/rrds;chown -R nobody /var/lib/ganglia/rrds;
Configure GMOND:
  1. Create /etc/ganglia directory
    mkdir -p /etc/ganglia
  2. Create default configuration.
    gmond --default_config > /etc/ganglia/gmond.conf
  3. Create symbolic link from default location ganglia looks for configuration to /etc/ganglia. This is because its easier to manage all configuration files in the same place.
    ln -s /etc/ganglia/gmond.conf /usr/local/etc/gmond.conf;
  4. Change
    /etc/ganglia/gmond.conf
    to have desired metrics and port number. Note that each cluster/group must have its own unique port number. The default is 8649.
  5. Change the init file to point to correct binary:
    vi /opt/ganglia/gmond/gmond.init
    change
    GMOND
    to
    /usr/local/sbin/gmond
    .
  6. Copy gmond.init to the init.d directory and add to start up:
    cp /opt/ganglia/gmond/gmond.init /etc/rc.d/init.d/gmond;
    chkconfig --add gmond;
    chkconfig --list gmond;
Configure GMETAD:
  1. If this is a new installation then you can find the default gmetad.conf file in gmetad directory where you installed ganglia. In this case it would be
    /opt/ganglia/gmetad/gmetad.conf
    . Copy this file to
    /etc/ganglia/gmetad.conf
    .
    cp /opt/ganglia/gmetad/gmetad.conf /etc/ganglia/gmetad.conf
  2. Create a symbolic link in
    /usr/local/etc
    to the file you just copied:
    ln -s /etc/ganglia/gmetad.conf /usr/local/etc/gmetad.conf;
  3. Modify gmetad.conf and include all the clusters you wish to include in this installation.
  4. Change the init file to point to correct binary:
    vi /opt/ganglia/gmetad/gmetad.init
    change
    GMETAD
    to
    /usr/local/sbin/gmetad
    .
  5. Copy gmetad.init to the init.d directory and add to start up:
    cp /opt/ganglia/gmetad/gmetad.init /etc/rc.d/init.d/gmetad;
    chkconfig --add gmetad;
    chkconfig --list gmetad;
Install Ganglia Web
  1. Decompress Ganglia Web gz file to /usr/share directory:
    tar --xvzf ganglia--web-3.5.7.tar.gz <del>C /usr/share
  2. Change directory to /usr/share and rename gangliaweb-3.5.7 to ganglia
    cd /usr/share;<br />mv ganglia-web-3.5.7 ganglia;
  3. There are two directories required for dwoo module. Create them and change their ownership/permission as stated below:
    mkdir /usr/share/ganglia/dwoo/compiled
    mkdir /usr/share/ganglia/dwoo/cache
    chown -R 1000:1000 *
    chmod -R 777 *
Configure Ganglia Web
  1. Set correct path for where ganglia conf directory is:
    vi /usr/share/ganglia/conf_default.php
    Change $conf[‘gweb_confdir’] to the web folder. e.g.
    $conf['gweb_confdir'] = "/usr/share/ganglia";
  2. Add Ganglia web folder to Apache HTTPD:
    • Create a new conf file in /etc/httpd/conf.d
      vi /etc/httpd/conf.d/ganglia.conf
      The following is a sample you can use:
      <location /ganglia>
      Order deny,allow
      Allow from all
      </location>
      
Start all services
  1. Start gmetad service
    service gmetad start
    Check /var/log/messages.log to make sure there aren’t any errors.
  2. Start GMETAD service
    service gmond start
    Check /var/log/messages.log to make sure there aren’t any errors.
  3. Restart HTTPD service
    service httpd restart

No Comments

How to check on HBase Master Status?

There is no direct way to find the status. There are two ways I found that would clearly indicate the status:

  1. Web UI - Comes with the installation.

  2. Zookeeper cli

As mentioned in the previous post, “The HBase master publishes its location to clients via Zookeeper. This is done to support multimaster operation (failover).”

Its important to note that any change sto HBase Master will take place within the timeout mentioned in zookeeper.session.timeout.

If you’re like us and would like to create scripts to check on the status of the HBase Master then Zookeeper commands is the best way to find the HBase Master and the status. Here are some useful commands:

Connect to Zookeeper through HBase:

  • hbase zkcli -server ${ZKHOST}:${ZKPORT}
    ZKHost: Zookeeper Host
    ZKPORT: Zookeeper Port. Default is 2181.

Find Master Node controlling zookeeper:

  • Once connected to Zookeeper using above command then you can run the following to get the Master Node which is controlling Zookeeper at this time:

    get /hbase/master

    Sample output:

    [zk: localhost:2181(CONNECTED) 0] get /hbase/master
    &#239;&#191;&#189;18182@nodedb2.hbdomainnodedb2.hbdomain,60000,1361996502780
    cZxid = 0xa0000000a
    ctime = Wed Feb 27 12:21:22 PST 2013
    mZxid = 0xa0000000a<br />
    mtime = Wed Feb 27 12:21:22 PST 2013
    pZxid = 0xa0000000a
    cversion = 0
    dataVersion = 0
    aclVersion = 0
    ephemeralOwner = 0x23d1d50b3980001
    dataLength = 65
    numChildren = 0


    If none of the master nodes are available then you get the following error:

    [zk: localhost:2181(CONNECTED) 1] get /hbase/master
    Node does not exist: /hbase/master


Find backup Master Nodes waiting to take control:

  • Once connected to Zookeeper through HBase then you can run the following command to return the list of Backup Master Nodes:

    ls /hbase/backup-masters

    Sample Output:

    [zk: localhost:2181(CONNECTED) 7] ls /hbase/backup-masters
    [db-hb1.hbdomain,60000,1362010238030]

    If backup Master Nodes are not available or down:

    [zk: localhost:2181(CONNECTED) 8] ls /hbase/backup-masters
    []

No Comments

HBase Master

HBase Architecture

  • For more information on HBase Architecture, refer to this link

HBase Master

  • HMaster is the implementation of the Master Server. The Master server is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes.

  • The multi-master feature introduced in 0.20.0 does not add cooperating Masters; there is still just one working Master while the other backups wait. For example, if you start 200 Masters only 1 will be active while the others wait for it to die. The switch usually takes zookeeper.session.timeout plus a couple of seconds to occur.
    http://wiki.apache.org/hadoop/Hbase/MultipleMasters

At Start Up:

  • If run in a multi-Master environment, all Masters compete to run the cluster. If the active Master loses its lease in ZooKeeper (or the Master shuts down), then then the remaining Masters jostle to take over the Master role.

  • The HBase master publishes its location to clients via Zookeeper. This is done to support multimaster operation (failover). So if the HBase master self-discovers its location as a localhost address, then it will publish that. Region servers or clients which go to Zookeeper for the master location will get back an address in that case only useful if they happen to be co-located with the master.
    http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg07296.html

  • What happens if the Hmaster goes down?
  • A common dist-list question is what happens to an HBase cluster when the Master goes down. Because the HBase client talks directly to the RegionServers, the cluster can still function in a “steady state.” Additionally, per Section 9.2, “Catalog Tables” ROOT and META exist as HBase tables (i.e., are not resident in the Master). However, the Master controls critical functions such as RegionServer failover and completing region splits. So while the cluster can still run for a time without the Master, the Master should be restarted as soon as possible.

No Comments

Big Data – Zookeeper, HDFS, HBase…

It seems BigData is all everyone is talking about these days. So I’m going to start my posts on the subject by mentioning some basics about the platform:

No Comments

New Posts…

Its been some time since the last post. I’m going to start making an effort to put my day to day findings on the blog again.

For the past 7 months I’ve been working a lot on BigData, Oracle and MySQL… I’ll be posting some of my findings shortly.

No Comments