Node Alarms

The Node Alarms view displays information about alarms on any node in the cluster that has raised an alarm.

The first two columns display

Hlth - a color indicating the status of each node (see Cluster Heat Map)
Hostname - the hostname of the node

The remaining columns are based on alarm type, such as:

Version Alarm - one or more services on the node are running an unexpected version
No Heartbeat Alarm - no heartbeat has been detected for over 5 minutes, and the node is not undergoing maintenance
UID Mismatch Alarm - services in the cluster are being run with different usernames (UIDs)
Duplicate HostId Alarm - two or more nodes in the cluster have the same Host ID
Too Many Containers Alarm - the number of containers on this node reached the maximum limit
Excess Logs Alarm - debug logging is enabled on this node, which can fill up disk space
Disk Failure Alarm - a disk has failed on the node (the disk health log indicates which one failed)
Time Skew Alarm - the clock on the node is out of sync with the master CLDB by more than 20 seconds
Root Partition Full Alarm - the root partition ("/") on the node is 99% full and running out of space
Installation Directory Full Alarm - the partition /opt/mapr on the node is running out of space (95% full)
Core Present Alarm - a service on the node has crashed and created a core dump file
High FileServer Memory Alarm - the FileServer service on the node has high memory consumption
Pam Misconfigured Alarm - the PAM authentication on the node is configured incorrectly
TaskTracker Local Directory Full Alarm - the local directory used by the TaskTracker is full, and the TaskTracker cannot operate as a result
CLDB Alarm - the CLDB service on the node has stopped running
FileServer Alarm - the FileServer service on the node has stopped running
JobTracker Alarm - the JobTracker service on the node has stopped running
TaskTracker Alarm - the TaskTracker service on the node has stopped running
HBase Master Alarm - the HBase Master service on the node has stopped running
HBase RegionServer Alarm - the HBase RegionServer service on the node has stopped running
NFS Gateway Alarm - the NFS Gateway service on the node has stopped running
Webserver Alarm - the WebServer service on the node has stopped running
HostStats Alarm - the HostStats service on the node has stopped running
Metrics write problem Alarm - metric data was not written to the database, or there were issues writing to a logical volume

See Alarms Reference.

Note the following behavior on the Node Alarms view:

Clicking a node's Hostname navigates to the Node Properties View, which provides detailed information about the node.
The left pane of the Node Alarms view displays the available topologies. Click a topology name to view only the nodes in that topology.

Buttons:

Properties - navigates to the Node Properties View
Forget Node - opens the Forget Node dialog to remove the node(s) from active management in this cluster. Services on the node must be stopped before the node can be forgotten.
Manage Services - opens the Manage Node Services dialog, which lets you start and stop services on the node
Change Topology - opens the Change Node Topology dialog, which lets you change the rack or switch path for a node