Use Compression
Network
To use network bandwidth more efficiently, use compression over the wire. If you use
application-level compression, turn off MapR-FS compression and reduce the chunk size to
128MB and io.sort.mb
to 190 MB.
Disk I/O
Disk reads can be a significant load, because there are many more reads than writes in a MapReduce job. To improve disk I/O, use MapR-FS compression on input and output volumes as well as the volumes used for intermediate files. Use Hadoop sequence files for input and output in order to avoid the overhead of converting to and from Java types in addition to enabling compression.
Configuring Compression
To turn off MapR-FS compression for map outputs, set
mapreduce.maprfs.use.compression=false.
To turn on LZO or any other
compression, set mapreduce.maprfs.use.compression=false
and
mapred.compress.map.output=true.
For more details on selecting a
compression algorithm, see Compression.