Use Compression

Network

To use network bandwidth more efficiently, use compression over the wire. If you use application-level compression, turn off MapR-FS compression and reduce the chunk size to 128MB and io.sort.mb to 190 MB.

Disk I/O

Disk reads can be a significant load, because there are many more reads than writes in a MapReduce job. To improve disk I/O, use MapR-FS compression on input and output volumes as well as the volumes used for intermediate files. Use Hadoop sequence files for input and output in order to avoid the overhead of converting to and from Java types in addition to enabling compression.

Configuring Compression

To turn off MapR-FS compression for map outputs, set mapreduce.maprfs.use.compression=false. To turn on LZO or any other compression, set mapreduce.maprfs.use.compression=false and mapred.compress.map.output=true. For more details on selecting a compression algorithm, see Compression.