Integrate Pig and Apache HBase
About this task
Procedure
-
On the client node where Pig is installed, add the following string to
/opt/mapr/conf/env.sh
:export PIG_CLASSPATH=$PIG_CLASSPATH:/location-to-hbase-jar
-
If the client node where Pig is installed also has either the
mapr-hbase-regionserver
ormapr-hbase-master
packages installed, add the location of thehbase-<version>.jar
file to thePIG_CLASSPATH
variable from the previous step:export PIG_CLASSPATH="$PIG_CLASSPATH:/opt/mapr/hbase/hbase-<version>/hbase-<version>.jar"
-
If the client node where Pig is installed does not have any HBase packages installed,
copy the HBase JAR from a node that does have HBase installed to a location on the Pig
client node. Add the HBase JAR's location to the definition from previous steps:
export PIG_CLASSPATH=$PIG_CLASSPATH:/opt/mapr/lib/hbase-<version>.jar
-
List the cluster's zookeeper nodes:
maprcli node listzookeepers
-
Add the following variable to the
/opt/mapr/conf/env.sh
file;export PIG_OPTS="-Dhbase.zookeeper.property.clientPort=5181 -Dhbase.zookeeper.quorum=<comma-separated list of ZooKeeper IP addresses>"
-
Launch a Pig job and verify that Pig can access HBase tables by using the HBase table
name directly. Do not use the
hbase://
prefix.
Example
Sample env.sh file for HBase and Pig integration
[root@nmk-centos-60-3 ~]# cat /opt/mapr/conf/env.sh
#!/bin/bash
# Copyright (c) 2009 & onwards. MapR Tech, Inc., All rights reserved
# Please set all environment variable you want to be used during MapR cluster
# runtime here.
# namely MAPR_HOME, JAVA_HOME, MAPR_SUBNETS
export PIG_OPTS="-Dhbase.zookeeper.property.clientPort=5181
-Dhbase.zookeeper.quorum=10.10.80.61,10.10.80.62,10.10.80.63"
export
PIG_CLASSPATH="$PIG_CLASSPATH:/opt/mapr/hbase/hbase-<version>/conf:/usr/java/default/lib/tools.jar:/opt/mapr/hbase/hbase-<version>:/opt/mapr/hbase/hbase-<version>/hbase-<version>.jar"
export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$PIG_CLASSPATH"
export CLASSPATH="$CLASSPATH:$HADOOP_CLASSPATH"
#export JAVA_HOME=
#export MAPR_SUBNETS=
#export MAPR_HOME=
#export MAPR_ULIMIT_U=
#export MAPR_ULIMIT_N=
#export MAPR_SYSCTL_SOMAXCONN=
#export PIG_CLASSPATH=:$PIG_CLASSPATH
[root@nmk-centos-60-3 ~]#
Sample HBase insertion script
[root@nmk-centos-60-3 nabeel]# cat hbase_pig.pig
raw_data = LOAD '/user/mapr/input2.csv' USING PigStorage(',') AS (
listing_id: chararray,
fname: chararray,
lname: chararray );
STORE raw_data INTO 'sample_names' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage (
'info:fname info:lname');