MapR 5.1 is at End of Life (EOL) and no longer supported. Please see the latest documentation. This documentation is not being updated.

Home
5.1 Ecosystem
This guide contains a section for each open source project that MapR supports. You can learn how to install, configure, use, and integrate each project within the context of a MapR cluster.
Spark
Spark Standalone
Configuring Spark Standalone
Configure High Availability for SparkMaster
Configure High Availability for the Spark Master so that the master does not become the single point of failure.

MapR 5.1 Documentation

5.1 Ecosystem
This guide contains a section for each open source project that MapR supports. You can learn how to install, configure, use, and integrate each project within the context of a MapR cluster.
- AsyncHBase
- Cascading
- Drill
- Flume
- HBase
- HCatalog
- Hive
- HttpFS
- Hue
- Impala
- Mahout
- Myriad
- Oozie
- Pig
- OpenStack Sahara (Mitaka)
- Sentry
- Spark
  - Spark Feature Support
    MapR supports most Spark features. However, there a few exceptions.
  - Spark Standalone
    - Installing Spark Standalone
      The following instructions explain how to install Spark Standalone using manual steps.
    - Upgrading Spark Standalone
      The following instructions explain how to upgrade an existing installation of Spark 1.x.
    - Configuring Spark Standalone
      - Configure High Availability for SparkMaster
        Configure High Availability for the Spark Master so that the master does not become the single point of failure.
      - Configure Scratch Directory for Spark Standalone
    - Using Spark Standalone
  - Spark on YARN
  - Integrating Spark
  - Spark 2.0.0 Developer Preview
    Apache Spark is an open-source processing engine that you can use to process Hadoop data. Although MapR does not yet ship a Spark 2.0.0 package, you can install and use Spark 2.0.0 on a non-secure MapR 5.1 cluster or on a secure MapR 5.1 cluster that uses MapR-SASL authentication.
- Sqoop
- Storm
- Third Party Solutions

Configure High Availability for SparkMaster

Configure High Availability for the Spark Master so that the master does not become the single point of failure.

By using ZooKeeper to provide leader election and some state storage, you can launch multiple masters in your cluster that are connected to the same ZooKeeper instance. Zookeeper elects one master to be the “leader” and the others remain in standby mode. If the leader goes down, Zookeeper elects another master, recovers the old master’s state, and resumes scheduling.

Set SPARK_DAEMON_JAVA_OPTS in spark-env.sh with the appropriate ZooKeeper information for the cluster.

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER  
-Dspark.deploy.zookeeper.url=<zookeeper1:5181,zookeeper2:5181,...> 
-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.client=false

Restart Spark Master and Spark HistoryServer services:

maprcli node services -nodes <node-ip> -name spark-master -action restart
maprcli node services -nodes <node-ip> -name spark-historyserver -action restart

On the master node, restart the Spark slaves as the map user:

/opt/mapr/spark/spark-<version>/sbin/stop-slaves.sh
/opt/mapr/spark/spark-<version>/sbin/start-slaves.sh