Example 5

On this example 1000-node cluster, each node has the following characteristics:

  • 4 cores
  • 32 GB memory
  • 2 drives

This hardware has too few cores to accomplish much parallelism, but has a lot of memory. Because the hardware is memory-heavy and core-light, give as much memory as possible to MapR-FS; the map output is likely to fit in memory, reducing disk I/O.

  • 4 map slots
  • 1 reduce slot
  • Chunk size: 256 MB
  • Leftover 16 GB memory given to MapR-FS
  • Set mapred.reduce.slowstart.completed.maps to 0