Run MapReduce Jobs with HBase

Procedure

To run MapReduce jobs with data stored in HBase, set the environment variable HADOOP_CLASSPATH to the output of the hbase classpath command (use TAB completion to fill in the <version> placeholder):
$ export HADOOP_CLASSPATH=`/opt/mapr/hbase/hbase-<version>/bin/hbase classpath`

Note the backticks (`).

Example

Example: Exporting a table named t1 with MapReduce

Notes: On a node in a MapR cluster, the output directory /hbase/export_t1 will be located in the mapr hadoop filesystem, so to list the output files in the example below use the following hadoop fs command from the node's command line:

# hadoop fs -ls /hbase/export_t1

To view the output:

# hadoop fs -cat /hbase/export_t1/part-m-00000
# cd /opt/mapr/hadoop/hadoop-0.20.2
# export HADOOP_CLASSPATH='/opt/mapr/hbase/hbase-0.94.12/bin/hbase classpath'
# ./bin/hadoop jar /opt/mapr/hbase/hbase-0.94.12/hbase-0.94.12.jar export t1 /hbase/export_t1
11/09/28 09:35:11 INFO mapreduce.Export: verisons=1, starttime=0, endtime=9223372036854775807
11/09/28 09:35:11 INFO fs.JobTrackerWatcher: Current running JobTracker is:
lohit-ubuntu/10.250.1.91:9001
11/09/28 09:35:12 INFO mapred.JobClient: Running job: job_201109280920_0003
11/09/28 09:35:13 INFO mapred.JobClient:  map 0% reduce 0%
11/09/28 09:35:19 INFO mapred.JobClient: Job complete: job_201109280920_0003
11/09/28 09:35:19 INFO mapred.JobClient: Counters: 15
11/09/28 09:35:19 INFO mapred.JobClient:   Job Counters
11/09/28 09:35:19 INFO mapred.JobClient:     Aggregate execution time of mappers(ms)=3259
11/09/28 09:35:19 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
11/09/28 09:35:19 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
11/09/28 09:35:19 INFO mapred.JobClient:     Launched map tasks=1
11/09/28 09:35:19 INFO mapred.JobClient:     Data-local map tasks=1
11/09/28 09:35:19 INFO mapred.JobClient:     Aggregate execution time of reducers(ms)=0
11/09/28 09:35:19 INFO mapred.JobClient:   FileSystemCounters
11/09/28 09:35:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=61319
11/09/28 09:35:19 INFO mapred.JobClient:   Map-Reduce Framework
11/09/28 09:35:19 INFO mapred.JobClient:     Map input records=5
11/09/28 09:35:19 INFO mapred.JobClient:     PHYSICAL_MEMORY_BYTES=107991040
11/09/28 09:35:19 INFO mapred.JobClient:     Spilled Records=0
11/09/28 09:35:19 INFO mapred.JobClient:     CPU_MILLISECONDS=780
11/09/28 09:35:19 INFO mapred.JobClient:     VIRTUAL_MEMORY_BYTES=759836672
11/09/28 09:35:19 INFO mapred.JobClient:     Map output records=5
11/09/28 09:35:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=63
11/09/28 09:35:19 INFO mapred.JobClient:     GC time elapsed (ms)=35