Create a Map Task Pipeline to Prefetch Tasks
When a task is completed, the TaskTracker informs the JobTracker
that a slot is available. MapR allows the JobTracker to
over-schedule tasks on TaskTracker nodes in advance of the
availability of slots, creating a pipeline. To avoid wasting time,
you can prefetch a certain percentage of tasks in anticipation of
the end of tasks in progress. It is important to set this
correctly; if it is too low, time is wasted waiting for
communication via heartbeats; if it is too high, parallelism
suffers because tasks arrive too soon and must wait to be
processed. This optimization allows TaskTracker to launch each map
task as soon as the previous running map task finishes. The number
of tasks to over-schedule should be about 25-50% of total number of
map slots. You can adjust this number with the parameter
mapreduce.tasktracker.prefetch.maptasks
in the
mapred-site.xml.