Running MapReduce Jobs with HBase

About this task

To run MapReduce applications with data stored in HBase, use a command such as the following to export table data to the HPE Ezmeral Data Fabric file system:

$ hadoop jar /opt/mapr/hbase/hbase-1.1.13/lib/hbase-server-1.1.13.0-mapr-1912.jar export t1 /user/mapr/t1

$  hbase org.apache.hadoop.hbase.mapreduce.Export t1 /user/mapr/t4

The result is the same because of the tools included in the hbase-server.jar file:

$ hadoop fs -ls /user/mapr/t1/
Found 2 items
-rwxr-xr-x   3 mapr mapr          0 2019-11-11 15:00 /user/mapr/t1/_SUCCESS
-rw-r--r--   3 mapr mapr        249 2019-11-11 15:00 /user/mapr/t1/part-m-00000
$ hadoop fs -ls /user/mapr/t4/
Found 2 items
-rwxr-xr-x   3 mapr mapr          0 2019-11-11 15:09 /user/mapr/t4/_SUCCESS
-rw-r--r--   3 mapr mapr        249 2019-11-11 15:09 /user/mapr/t4/part-m-00000
$

Following is an example of the full output:

$ hadoop jar /opt/mapr/hbase/hbase-1.1.13/lib/hbase-server-1.1.13.0-mapr-1912.jar export t1 /user/mapr/t1
19/11/11 14:59:41 INFO mapreduce.Export: versions=1, starttime=0, endtime=9223372036854775807, keepDeletedCells=false
19/11/11 14:59:42 INFO mapreduce.TableMapReduceUtil: Configured mapr.hbase.default.db hbase
19/11/11 14:59:42 INFO client.ConnectionFactory: ConnectionFactory receives mapr.hbase.default.db(hbase), set clusterType(HBASE_ONLY), user(mapr), hbase_admin_connect_at_construction(false)
19/11/11 14:59:42 INFO zookeeper.RecoverableZooKeeper: Process identifier=TokenUtil-getAuthToken connecting to ZooKeeper ensemble=node5.cluster.com:5181
19/11/11 14:59:43 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x2c306a57 connecting to ZooKeeper ensemble=node5.cluster.com:5181
19/11/11 14:59:43 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x100044f486eff26
19/11/11 14:59:45 INFO impl.TimelineClientImpl: Timeline service address: https://node5.cluster.com:8190/ws/v1/timeline/
19/11/11 14:59:45 INFO client.MapRZKBasedRMFailoverProxyProvider: Updated RM address to node5.cluster.com/192.168.33.15:8032
19/11/11 14:59:47 INFO client.ConnectionFactory: mapr.hbase.default.db unsetDB is neither MapRDB or HBase, set HBASE_MAPR mode since mapr client is installed.
19/11/11 14:59:47 INFO client.ConnectionFactory: ConnectionFactory receives mapr.hbase.default.db(unsetDB), set clusterType(HBASE_MAPR), user(mapr), hbase_admin_connect_at_construction(false)
19/11/11 14:59:47 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6b63e6ad connecting to ZooKeeper ensemble=node5.cluster.com:5181
19/11/11 14:59:48 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
19/11/11 14:59:48 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x100044f486eff2a
19/11/11 14:59:48 INFO mapreduce.JobSubmitter: number of splits:1
19/11/11 14:59:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572957695341_0001
19/11/11 14:59:48 INFO mapreduce.JobSubmitter: Kind: HBASE_AUTH_TOKEN, Service: 9161aa11-2f19-4b20-82f8-9678db86e0a7, Ident: (org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier@0)
19/11/11 14:59:49 INFO security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
19/11/11 14:59:51 INFO impl.YarnClientImpl: Submitted application application_1572957695341_0001
19/11/11 14:59:51 INFO mapreduce.Job: The url to track the job: https://node5.cluster.com:8090/proxy/application_1572957695341_0001/
19/11/11 14:59:51 INFO mapreduce.Job: Running job: job_1572957695341_0001
19/11/11 15:00:05 INFO mapreduce.Job: Job job_1572957695341_0001 running in uber mode : false
19/11/11 15:00:05 INFO mapreduce.Job:  map 0% reduce 0%
19/11/11 15:00:13 INFO mapreduce.Job:  map 100% reduce 0%
19/11/11 15:00:15 INFO mapreduce.Job: Job job_1572957695341_0001 completed successfully
19/11/11 15:00:15 INFO mapreduce.Job: Counters: 42
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=136674
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                MAPRFS: Number of bytes read=59
                MAPRFS: Number of bytes written=249
                MAPRFS: Number of read operations=11
                MAPRFS: Number of large read operations=0
                MAPRFS: Number of write operations=39
        Job Counters
                Launched map tasks=1
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6111
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=6111
                Total vcore-seconds taken by all map tasks=6111
                Total megabyte-seconds taken by all map tasks=6257664
                DISK_MILLIS_MAPS=3056
        Map-Reduce Framework
                Map input records=3
                Map output records=3
                Input split bytes=59
                Spilled Records=0
Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=68
                CPU time spent (ms)=1620
                Physical memory (bytes) snapshot=246943744
                Virtual memory (bytes) snapshot=3582681088
                Total committed heap usage (bytes)=287309824
        HBase Counters
                BYTES_IN_REMOTE_RESULTS=0
                BYTES_IN_RESULTS=93
                MILLIS_BETWEEN_NEXTS=518
                NOT_SERVING_REGION_EXCEPTION=0
                NUM_SCANNER_RESTARTS=0
                NUM_SCAN_RESULTS_STALE=0
                REGIONS_SCANNED=1
                REMOTE_RPC_CALLS=0
                REMOTE_RPC_RETRIES=0
                RPC_CALLS=3
                RPC_RETRIES=0
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=249

The following table shows the tools included in the hbase-server.jar:

Name¹	Class²	Description
rowcounter	RowCounter	Count rows in HBase table
CellCounter	CellCounter	Count cells in HBase table
export	Export	Write table data to HPE Ezmeral Data Fabric file system
import	Import	Import data written by Export
importtsv	ImportTsv	Import data in TSV format
completebulkload	LoadIncrementalHFiles	Complete a bulk data load
copytable	CopyTable	Export a table from local cluster to peer cluster
verifyrep	VerifyReplication	Compare the data from tables in two different clusters NOTE This function does not work for incrementColumnValues cells since the timestamp is changed after being appended to the log.
WALPlayer	WALPlayer	Replay WAL files
exportsnapshot	ExportSnapshot	Export the specific snapshot to a given file system

¹ Class is used for hbase.org.apache.hadoop.hbase.mapreduce.<class>....

² Name is used for hadoop jar /opt/mapr/hbase/hbase-1.1.13/lib/hbase-server-1.1.13.0-mapr-1912.jar <name>...