Run MapReduce Jobs with HBase
Procedure
To run MapReduce jobs with data stored in HBase, set the environment variable
HADOOP_CLASSPATH
to the output of the hbase classpath
command (use TAB completion to fill in the <version>
placeholder):
$ export HADOOP_CLASSPATH=`/opt/mapr/hbase/hbase-<version>/bin/hbase classpath`
Note the backticks (`
).
Example
Example: Exporting a table named t1 with MapReduce
Notes: On a node in a MapR cluster, the output directory /hbase/export_t1 will be located in the mapr hadoop filesystem, so to list the output files in the example below use the following hadoop fs command from the node's command line:
# hadoop fs -ls /hbase/export_t1
To view the output:
# hadoop fs -cat /hbase/export_t1/part-m-00000
# cd /opt/mapr/hadoop/hadoop-0.20.2
# export HADOOP_CLASSPATH='/opt/mapr/hbase/hbase-0.94.12/bin/hbase classpath'
# ./bin/hadoop jar /opt/mapr/hbase/hbase-0.94.12/hbase-0.94.12.jar export t1 /hbase/export_t1
11/09/28 09:35:11 INFO mapreduce.Export: verisons=1, starttime=0, endtime=9223372036854775807
11/09/28 09:35:11 INFO fs.JobTrackerWatcher: Current running JobTracker is:
lohit-ubuntu/10.250.1.91:9001
11/09/28 09:35:12 INFO mapred.JobClient: Running job: job_201109280920_0003
11/09/28 09:35:13 INFO mapred.JobClient: map 0% reduce 0%
11/09/28 09:35:19 INFO mapred.JobClient: Job complete: job_201109280920_0003
11/09/28 09:35:19 INFO mapred.JobClient: Counters: 15
11/09/28 09:35:19 INFO mapred.JobClient: Job Counters
11/09/28 09:35:19 INFO mapred.JobClient: Aggregate execution time of mappers(ms)=3259
11/09/28 09:35:19 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
11/09/28 09:35:19 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
11/09/28 09:35:19 INFO mapred.JobClient: Launched map tasks=1
11/09/28 09:35:19 INFO mapred.JobClient: Data-local map tasks=1
11/09/28 09:35:19 INFO mapred.JobClient: Aggregate execution time of reducers(ms)=0
11/09/28 09:35:19 INFO mapred.JobClient: FileSystemCounters
11/09/28 09:35:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=61319
11/09/28 09:35:19 INFO mapred.JobClient: Map-Reduce Framework
11/09/28 09:35:19 INFO mapred.JobClient: Map input records=5
11/09/28 09:35:19 INFO mapred.JobClient: PHYSICAL_MEMORY_BYTES=107991040
11/09/28 09:35:19 INFO mapred.JobClient: Spilled Records=0
11/09/28 09:35:19 INFO mapred.JobClient: CPU_MILLISECONDS=780
11/09/28 09:35:19 INFO mapred.JobClient: VIRTUAL_MEMORY_BYTES=759836672
11/09/28 09:35:19 INFO mapred.JobClient: Map output records=5
11/09/28 09:35:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=63
11/09/28 09:35:19 INFO mapred.JobClient: GC time elapsed (ms)=35