Configuring a Multitenant Cluster
Drill operations are memory and CPU-intensive. Currently, Drill resources are managed
outside of any cluster management service, such as the MapR warden service. In a
multitenant or any other type of cluster, YARN-enabled or not, you configure memory and
memory usage limits for Drill by modifying drill-env.sh
as described in
the section, "Configuring Drill Memory" in Apache Drill documentation.
drill-env.sh
allocates resources for Drill to use during
query execution, while configuring the following properties in
warden-drill-bits.conf
prevents warden from committing the resources
to other processes.
service.heapsize.min=<some value in MB>
service.heapsize.max=<some value in MB>
service.heapsize.percent=<a whole number>
Set the service.heapsize
properties in
warden.drill-bits.conf
regardless of whether you changed defaults in
drill-env.sh
or not.
"Configuring Drill in a YARN-enabled MapR Cluster" shows an example of
setting the service.heapsize
properties. The
service.heapsize.percent
is the percentage of memory for the service
bounded by minimum and maximum values. Typically, users change
service.heapsize.percent
because using a percentage setting
increases or decreases resources according to different node configurations. For more
information about the service.heapsize
properties, see the section,
"warden.<servicename>.conf."
/opt/mapr/conf/conf.d:
warden.drill-bits.conf
warden.nodemanager.conf
warden.resourcemanager.conf
Configure Drill memory by modifying warden.drill-bits.conf
in YARN and
non-YARN clusters. Configure other resources by modifying
warden.nodemanager.conf
and
warden.resourcemanager.conf
in a YARN-enabled cluster.
Configuring Drill in a YARN-enabled MapR Cluster
To add Drill to a YARN-enabled cluster, change memory resources to suit your application. For example, you have 120G of available memory that you allocate to following workloads in a Yarn-enabled cluster:
File system = 20G HBase = 20G Yarn = 20G OS = 8G
If Yarn does most of the work, give Drill 20G, for example, and give Yarn 60G. If you expect a heavy query load, give Drill 60G and Yarn 20G.
YARN consists of two main services:
- ResourceManager: There is at least one instance in a cluster, more if you configure high availability.
- NodeManager: There is one instance per node.
warden.resourcemanager.conf
and
warden.nodemanager.conf
files set ResourceManager and NodeManager
memory to the following
defaults:service.heapsize.min=64
service.heapsize.max=325
service.heapsize.percent=2
/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/yarn-env.sh
You do not
set the -Xmx
option, allowing memory to grow as needed.MapReduce v1 Resources
/opt/mapr/conf/warden.conf
control
MapReduce v1
memory:mr1.memory.percent=50
mr1.cpu.percent=50
mr1.disk.percent=50
Modify these settings to reconfigure MapReduce v1 resources to suit your application needs. Remaining memory is given to YARN applications.
MapReduce v2 and other Resources
warden.conf
.service.command.<servicename>.heapsize.percent
service.command.<servicename>.heapsize.max
service.command.<servicename>.heapsize.min
Configure memory for other services in the same manner. For more information about managing memory in a MapR cluster, see the following sections:
How to Manage Drill CPU Resources
Currently, you do not manage CPU resources within Drill. Use Linux cgroups to manage the CPU resources.