Configure Spark with the NodeManager Local Directory Set to MapR-FS

About this task

This procedure configures Spark to use the mounted NFS directory instead of the /tmp directory on the local file system:

Procedure

  1. Install the mapr-nfs and nfs-utils packages if they are not already installed. For reference, see Installing MapR Software.
  2. Start the NFS service by using the MapR Control System (MCS) or maprcli (NFS gateway):
    /opt/mapr/initscripts/mapr-nfsserver start
    /opt/mapr/initscripts/mapr-nfsserver status
  3. Mount NFS to MapR-FS on a cluster node by following the steps in Mounting NFS to MapR-FS on a Cluster Node.
  4. To configure Spark Shuffle on NFS, complete these steps on all nodes:
    1. Create a local volume for Spark Shuffle:
      sudo -u mapr maprcli volume create -name mapr.$(hostname -f).local.spark -path /var/mapr/local/$(hostname -f)/spark -replication 1 -localvolumehost $(hostname -f)
    2. Point the NodeManager local directory to the Spark Shuffle volume mounted through NFS by setting the following property in the yarn-site.xml file on the NodeManager nodes:
      <property>
          <name>yarn.nodemanager.local-dirs</name>
          <value>/mapr/my.cluster.com/var/mapr/local/${mapr.host}/spark</value>
      </property>
      
    3. Restart the NodeManager service (and the Resource Manager service on the main node) to pick up the yarn-site.xml changes:
      maprcli node services -name nodemanager -action restart -nodes <node 1> <node 2> <node 3>
      maprcli node services -name resourcemanager -action restart