Handling Disk Failures

When a disk fails, MapR raises the node-level alarm NODE_ALARM_DISK_FAILURE on the node with the failed disk (or disks). At the same time, other disks in the same storage pool as the failed disk are taken offline. You can look at the MapR Control System (MCS) and click on Cluster>Dashboard to see a cluster heatmap of each node and a list of alarms, similar to this:

By hovering your mouse over the , you can get more information about the reason for the failure. By clicking on the , you can display node-specific information including an alarm summary like the one below:

When you see a disk failure alarm, examine the log file at /opt/mapr/logs/faileddisk.log and check the Failure Reason field.