Deployment Modes

Spark is preconfigured for YARN and does not require any additional configuration to run.

Two deployment modes can be used to launch Spark applications on YARN:

  • In yarn-cluster mode, jobs are managed by the YARN cluster. The Spark driver runs inside an Application Master (AM) process that is managed by YARN. This means that the client can go away after initiating the application.
  • In yarn-client mode, the Spark driver runs in the client process, and the Application Master is only used to request resources from YARN.

MapR recommends using yarn-cluster mode instead of yarn-client mode. If the Spark client that runs the job exits after submitting the job, there is no impact on actual job completion.

Note: In yarn-cluster mode, the local directories used by the Spark executors and the Spark driver are the local directories that are configured for YARN (yarn.nodemanager.local-dirs). If you specify a different path with SPARK_LOCAL_DIRS (as you would for Spark running in standalone mode), that path will be ignored.