Installing Spark on YARN

About this task

This document contains instructions to install Spark on YARN using manual steps. You can also install Spark on YARN using the MapR Installer.

Spark is distributed as two separate packages:

Package	Description
mapr-spark	Install this package on each node where you want to install Spark. This package is dependent on the mapr-client package.
mapr-spark-historyserver	Install this optional package on Spark History Server nodes. This package is dependent on the mapr-spark package and mapr-core package.

To install Spark on YARN (Hadoop 2), execute the following commands as root or using sudo:

Verify that JDK 1.7 or later is installed on node where you want to install Spark.
Create the /apps/spark directory on MapR-FS and set the correct permissions on the directory.
```
hadoop fs -mkdir /apps/spark
hadoop fs -chmod 777 /apps/spark
```
Install the packages.
On Ubuntu
```
apt-get install mapr-spark mapr-spark-historyserver
```
On RedHat / CentOS
```
yum install mapr-spark mapr-spark-historyserver
```
NOTE: The mapr-spark-historyserver package is optional.
If you want to integrate Spark with MapR Streams, install the Streams Client on each Spark node.
- On Ubuntu:
```
 apt-get install mapr-kafka
```
- On RedHat/CentOS:
```
yum install mapr-kafka
```
Run the configure.sh command:
```
/opt/mapr/server/configure.sh -R
```
To test the installation, run the following command as the mapr user:
```
MASTER=yarn-client /opt/mapr/spark/spark-<version>/bin/run-example org.apache.spark.examples.SparkPi 10
```
This command will fail if it is run as the root user.