Create a Map Task Pipeline to Prefetch Tasks

When a task is completed, the TaskTracker informs the JobTracker that a slot is available. MapR allows the JobTracker to over-schedule tasks on TaskTracker nodes in advance of the availability of slots, creating a pipeline. To avoid wasting time, you can prefetch a certain percentage of tasks in anticipation of the end of tasks in progress. It is important to set this correctly; if it is too low, time is wasted waiting for communication via heartbeats; if it is too high, parallelism suffers because tasks arrive too soon and must wait to be processed. This optimization allows TaskTracker to launch each map task as soon as the previous running map task finishes. The number of tasks to over-schedule should be about 25-50% of total number of map slots. You can adjust this number with the parameter mapreduce.tasktracker.prefetch.maptasks in the mapred-site.xml.