Node Alarms

The Node Alarms view displays information about alarms on any node in the cluster that has raised an alarm.

The first two columns display

  • Hlth - a color indicating the status of each node (see Cluster Heat Map)
  • Hostname - the hostname of the node

The remaining columns are based on alarm type, such as:

  • Version Alarm - one or more services on the node are running an unexpected version
  • No Heartbeat Alarm - no heartbeat has been detected for over 5 minutes, and the node is not undergoing maintenance
  • UID Mismatch Alarm - services in the cluster are being run with different usernames (UIDs)
  • Duplicate HostId Alarm - two or more nodes in the cluster have the same Host ID
  • Too Many Containers Alarm - the number of containers on this node reached the maximum limit
  • Excess Logs Alarm - debug logging is enabled on this node, which can fill up disk space
  • Disk Failure Alarm - a disk has failed on the node (the disk health log indicates which one failed)
  • Time Skew Alarm - the clock on the node is out of sync with the master CLDB by more than 20 seconds
  • Root Partition Full Alarm - the root partition ("/") on the node is 99% full and running out of space
  • Installation Directory Full Alarm - the partition /opt/mapr on the node is running out of space (95% full)
  • Core Present Alarm - a service on the node has crashed and created a core dump file
  • High FileServer Memory Alarm - the FileServer service on the node has high memory consumption
  • Pam Misconfigured Alarm - the PAM authentication on the node is configured incorrectly
  • TaskTracker Local Directory Full Alarm - the local directory used by the TaskTracker is full, and the TaskTracker cannot operate as a result
  • CLDB Alarm - the CLDB service on the node has stopped running
  • FileServer Alarm - the FileServer service on the node has stopped running
  • JobTracker Alarm - the JobTracker service on the node has stopped running
  • TaskTracker Alarm - the TaskTracker service on the node has stopped running
  • HBase Master Alarm - the HBase Master service on the node has stopped running
  • HBase RegionServer Alarm - the HBase RegionServer service on the node has stopped running
  • NFS Gateway Alarm - the NFS Gateway service on the node has stopped running
  • Webserver Alarm - the WebServer service on the node has stopped running
  • HostStats Alarm - the HostStats service on the node has stopped running
  • Metrics write problem Alarm - metric data was not written to the database, or there were issues writing to a logical volume

See Alarms Reference.

Note the following behavior on the Node Alarms view:

  • Clicking a node's Hostname navigates to the Node Properties View, which provides detailed information about the node.
  • The left pane of the Node Alarms view displays the available topologies. Click a topology name to view only the nodes in that topology.

Buttons:

  • Properties - navigates to the Node Properties View
  • Forget Node - opens the Forget Node dialog to remove the node(s) from active management in this cluster. Services on the node must be stopped before the node can be forgotten.
  • Manage Services - opens the Manage Node Services dialog, which lets you start and stop services on the node
  • Change Topology - opens the Change Node Topology dialog, which lets you change the rack or switch path for a node