Deployment Modes
Spark is preconfigured for YARN and does not require any additional configuration to run.
Two deployment modes can be used to launch Spark applications on YARN:
- In
yarn-cluster
mode, jobs are managed by the YARN cluster. The Spark driver runs inside an Application Master (AM) process that is managed by YARN. This means that the client can go away after initiating the application. - In
yarn-client
mode, the Spark driver runs in the client process, and the Application Master is only used to request resources from YARN.
MapR recommends
using yarn-cluster
mode instead
of yarn-client
mode. If the
Spark client that runs the job exits after submitting the job,
there is no impact on actual job completion.
Note:
In yarn-cluster
mode, the local directories
used by the Spark executors and the Spark driver are the local
directories that are configured for YARN
(yarn.nodemanager.local-dirs
). If you specify a
different path
with SPARK_LOCAL_DIRS
(
as you
would for Spark running in standalone mode), that path will be
ignored.