Integrate Hue with Spark
About this task
NOTE: Spark Notebook is a beta feature
that utilizes the Spark REST Job Server (Livy).
Complete the following steps as
the root
user or by using sudo
:Procedure
-
Install the
mapr-hue-livy-3.8.1
package on the node were you have installed themapr-spark
package and configured Spark.- On Ubuntu
-
apt-get install mapr-hue-livy
- On RedHat/ CentOS
-
yum install mapr-hue-livy
NOTE: If you do not install themapr-hue-livy
package on a node were the mapr-spark package is installed, the Livy service will not start. -
For Spark 1.3.1: Copy
javax.servlet-api-3.1.0.jar
to the spark lib directory.cp /opt/mapr/hue/hue-<version>/apps/spark/java-lib/javax.servlet-api-3.1.0.jar /opt/mapr/spark/spark-<version>/lib/
-
In the spark-env.sh file, configure SPARK_SUBMIT_CLASSPATH environment variable to
include the classpath to the servlet jar before the MAPR_SPARK_CLASSPATH.
SPARK_SUBMIT_CLASSPATH=$SPARK_SUBMIT_CLASSPATH:/opt/mapr/spark/spark-<version>/lib/javax.servlet-api-3.1.0.jar:$MAPR_SPARK_CLASSPATH
-
In the
[spark]
section of thehue.ini
, set thelivy_server_host
parameter to the host where the Livy server is running.[spark] # IP or hostname of livy server. livy_server_host=ubuntu500
NOTE: If the Livy server runs on the same node as the Hue UI, you are not required to set this property as the value defaults to the local host. -
If Spark jobs run on YARN, perform the following steps:
-
Restart the Spark REST Job Server (Livy).
maprcli node services -name livy -action restart -nodes <livy node>
-
Restart Hue.
maprcli node services -name hue -action restart -nodes <hue node>
-
Restart Spark.
maprcli node services -name spark-master -action restart -nodes <space delimited list of nodes>
Results
-
NOTE: To access the Notebook UI, select Spark from the Query Editor in the Hue interface.
- If needed, you can use the MCS or
maprcli
to start, stop, or restart the Livy Server. For more information, see Starting, Stopping, and Restarting Services.
NOTE: Troubleshooting Tip
If you have more that one version of Python
installed, you may see the following error when executing Python
samples:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe...
Workaround:
Set the following environment variables in /opt/mapr/spark/spark-<version>/conf/spark-env.sh:
export PYSPARK_PYTHON=/usr/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/bin/python2.7