Drill-on-YARN Limitations

Drill-on-YARN has the following limitations:
Hanging requests
Drill-on-YARN and YARN “hang” if YARN cannot fulfill a container request. YARN provides no information about why a request “hangs.”
/tmp directory
The default YARN settings cause Drillbits to become unmanaged within a short amount of time due to a /tmp directory issue. See the Exclude the YARN Container Directory from tmpwatch section in Step 3: Configure YARN to Run Drill for information on how to resolve the issue.
Container size
The default YARN settings do not allow a default Drill cluster to run due to the default YARN container size. See the Increase Maximum Container Size section in Step 3: Configure YARN to Run Drill for information on how to resolve the issue.
Drill disk usage
You can specify Drill disk usage to YARN, but Drill will use all disks regardless of the setting. There is no effective way to manage a Drill cluster that:
  • resizes based on load.
  • is rack-aware in its smaller state.
YARN chooses arbitrary nodes perhaps resulting in large network reads. (MD-1028, MD-1089)
Node Labels
Although the Apache YARN documentation states that you can associate node labels with YARN container requests, some people have noticed that the feature does not work in practice. While Drill-on-YARN configuration has settings to associate Drillbit container requests with node labels, doing so is not supported. To use node labels, associate node labels with YARN queues as described in the YARN configuration step in the Migrate Drill to Run Under YARN documenation.