Installing Spark on YARN
About this task
Spark is distributed as two separate packages:
Package | Description |
---|---|
mapr-spark | Install this package on each node where you want to install Spark. This package is dependent on the mapr-client package. |
mapr-spark-historyserver | Install this optional package on Spark History Server nodes. This package is dependent on the mapr-spark package and mapr-core package. |
To install Spark on YARN (Hadoop 2), execute the following commands as
root
or using sudo:
Procedure
- Verify that JDK 1.7 or later is installed on node where you want to install Spark.
-
Create
the
/apps/spark
directory on MapR-FS and set the correct permissions on the directory.hadoop fs -mkdir /apps/spark hadoop fs -chmod 777 /apps/spark
-
Install the packages.
- On Ubuntu
-
apt-get install mapr-spark mapr-spark-historyserver
- On RedHat / CentOS
-
yum install mapr-spark mapr-spark-historyserver
NOTE: Themapr-spark-historyserver
package is optional. -
If you want to integrate Spark with MapR Streams, install the Streams Client on each
Spark node.
- On Ubuntu:
apt-get install mapr-kafka
- On RedHat/CentOS:
yum install mapr-kafka
- On Ubuntu:
-
Run the configure.sh command:
/opt/mapr/server/configure.sh -R
-
To test the installation, run the following command as the mapr user:
MASTER=yarn-client /opt/mapr/spark/spark-<version>/bin/run-example org.apache.spark.examples.SparkPi 10
This command will fail if it is run as the root user.