Summary of Basic Guidelines For Nodes Running MapReduce v1 Jobs

Here is a summary of the basic guidelines:

  • Operating system memory: about 2 GB. Give the operating system enough memory that it does not have to swap. There are three parameters in the warden.conf file that specify a minimum and maximum heap size, and a target percentage of total RAM to use for the OS, as well as upper and lower bounds on the resulting value (in MB):
    • service.command.os.heapsize.percent=10
    • service.command.os.heapsize.max=4000
    • service.command.os.heapsize.min=256

The maximum memory is the lower of the percent and the maximum.

  • MapR-FS memory: By default, 20% (Community Edition and Enterprise Edition) or 35% (Enterprise Database Edition) of total memory. To give more memory to MapR-FS, increase the following parameters in warden.conf:
    • service.command.mfs.heapsize.percent=35
    • service.command.mfs.heapsize.maxpercent=85
    • service.command.mfs.heapsize.min=512

The maximum memory is the lower of the percent and the maximum.

  • Memory per reducer: 3.5 GB (add -Xmx3500m to mapred.reduce.child.java.opts in mapred-site.xml)
  • Memory per mapper: 800 MB (add -Xmx800m to mapred.map.child.java.opts in mapred-site.xml). In general, mappers should get memory equal to the value of the io.sort.mb parameter plus a certain amount of overhead. Twice the value of the io.sort.mb parameter is a generous starting point, and you can adjust up or down as necessary.
  • Chunk size: 256 MB (default) for most clusters; 128 MB on very small clusters or if application-level compression is present.
    • To change the default chunk size for the cluster, use the maprcli config save command. Example: maprcli config save -values ‘{“cldb.default.chunk.sizemb”:”128”}’ Note: Changes to the default chunk size only impact new chunks.
    • To change the chunk size on a specific directory, use the hadoop mfs command. Example: hadoop mfs -setchunksize 128 /mapr/myvolume/directory_a

Remember to change the value of the io.sort.mb parameter when you change the chunk size.

  • io.sort.mb: 380 MB (default) for most clusters; 190 MB for very small clusters or if application-level is present. Change the value in the mapred-site.xml file. Default is 380 MB; in general, io.sort.mb should be approximately 150% of the chunk size.
  • Number of reducers: No more than half to two thirds as many reducers as there are disks: set the value of the mapred.tasktracker.reduce.tasks.maximum parameter in mapred-site.xml (the default is -1, which means calculate automatically).
  • Number of mappers: twice as many mappers as disks, up to (cores-2) x 2 for Community Edition and Enterprise Edition, or (cores-4) x 2 for Enterprise Database Edition: set the value of the mapred.tasktracker.map.tasks.maximum parameter in mapred-site.xml (the default is -1, which means calculate automatically).
  • Set the following additional parameters in mapred-site.xml:
    • mapred.job.reuse.jvm.num.tasks: -1 (Default: -1, which means calculate automatically)
    • mapreduce.tasktracker.prefetch.maptasks: 0.5 (Default: 0.0)
    • mapred.reduce.parallel.copies: 12 (Default: 12)
    • mapred.reduce.slowstart.completed.maps: 0.95 (Default: 0.95)
  • Use sampling to examine the data and create a list of partitioning keys