Summary of Basic Guidelines For Nodes Running MapReduce v1 Jobs
Here is a summary of the basic guidelines:
-
Operating system memory: about 2 GB. Give the
operating system enough memory that it does not have to swap. There
are three parameters in the
warden.conf
file that specify a minimum and maximum heap size, and a target percentage of total RAM to use for the OS, as well as upper and lower bounds on the resulting value (in MB):-
service.command.os.heapsize.percent=10
-
service.command.os.heapsize.max=4000
-
service.command.os.heapsize.min=256
-
The maximum memory is the lower of the percent and the maximum.
-
MapR-FS memory: By default, 20% (Community
Edition and Enterprise Edition) or 35% (Enterprise Database
Edition) of total memory. To give more memory to MapR-FS, increase
the following parameters in
warden.conf
:-
service.command.mfs.heapsize.percent=35
-
service.command.mfs.heapsize.maxpercent=85
-
service.command.mfs.heapsize.min=512
-
The maximum memory is the lower of the percent and the maximum.
-
Memory per reducer: 3.5 GB (add -Xmx3500m to
mapred.reduce.child.java.opts
inmapred-site.xml
) -
Memory per mapper: 800 MB (add -Xmx800m to
mapred.map.child.java.opts
inmapred-site.xml
). In general, mappers should get memory equal to the value of theio.sort.mb
parameter plus a certain amount of overhead. Twice the value of theio.sort.mb
parameter is a generous starting point, and you can adjust up or down as necessary. -
Chunk size: 256 MB (default) for most
clusters; 128 MB on very small clusters or if application-level
compression is present.
- To change the default chunk size for the cluster, use the
maprcli config save
command. Example:maprcli config save -values ‘{“cldb.default.chunk.sizemb”:”128”}’
Note: Changes to the default chunk size only impact new chunks. - To change the chunk size on a specific directory, use the
hadoop mfs
command. Example:hadoop mfs -setchunksize 128 /mapr/myvolume/directory_a
- To change the default chunk size for the cluster, use the
Remember to change the value of the
io.sort.mb
parameter when you change the chunk
size.
-
io.sort.mb: 380 MB (default) for most
clusters; 190 MB for very small clusters or if application-level is
present. Change the value in the
mapred-site.xml
file. Default is 380 MB; in general,io.sort.mb
should be approximately 150% of the chunk size. -
Number of reducers: No more than half to two
thirds as many reducers as there are disks: set the value of the
mapred.tasktracker.reduce.tasks.maximum
parameter inmapred-site.xml
(the default is -1, which means calculate automatically). -
Number of mappers: twice as many mappers as
disks, up to (cores-2) x 2 for Community Edition and Enterprise
Edition, or (cores-4) x 2 for Enterprise Database Edition: set the
value of the
mapred.tasktracker.map.tasks.maximum
parameter inmapred-site.xml
(the default is -1, which means calculate automatically). -
Set the following additional parameters
in mapred-site.xml:
-
mapred.job.reuse.jvm.num.tasks
: -1 (Default: -1, which means calculate automatically) -
mapreduce.tasktracker.prefetch.maptasks
: 0.5 (Default: 0.0) -
mapred.reduce.parallel.copies
: 12 (Default: 12) -
mapred.reduce.slowstart.completed.maps
: 0.95 (Default: 0.95)
-
- Use sampling to examine the data and create a list of partitioning keys