Integrate Spark with HBase

Integrate Spark with HBase or MapR-DB when you want to run Spark jobs on HBase or MapR-DB tables.

About this task

If you installed Spark with the MapR Installer, these steps are not required.

Procedure

  1. Configure the HBase version in the /opt/mapr/spark/spark-<version>/mapr-util/compatibility.version file:
    hbase_versions=<version>
  2. If you want to create HBase tables with Spark, add following property to hbase-site.xml:
    <property>
    <name>hbase.table.sanity.checks</name> 
    <value>false</value>
    </property>
  3. Copy the hbase-site.xml to {SPARK_HOME}/conf/ directory on each Spark node.
  4. For Spark 1.4.1 or Spark 1.5.2-1512: Add the following line to spark.executor.extraClassPath in the /opt/mapr/spark/spark-<version>/ conf/spark-defaults.conf file: /opt/mapr/hbase/hbase-<version>/lib/*
  5. To verify the integration, complete the following steps:
    1. Create an HBase or MapR-DB table and populate it with some data.
    2. Run the following command as the mapr user or as a user that mapr impersonates: MASTER=<master-url> <spark-home>/bin/run-example HBaseTest <table-name>
      The master URL for the cluster is either spark://<host>:7077 , yarn-client, or yarn-cluster.