MapR 5.1 is at End of Life (EOL) and no longer supported. Please see the latest documentation. This documentation is not being updated.

Home
5.1 Administration
This section contains administration information such as managing your cluster, data, users, jobs and security information such as configuring your security environment and auditing.
Administrator Guide
Monitoring the Cluster
Monitoring Performance
Analyzing Job Metrics
The Job Metrics Database
Example: Using MapR Metrics to Diagnose a Faulty Network Interface Card (NIC)
In this example, a node in your cluster has a NIC that is intermittently failing. This condition is leading to abnormally long task completion times due to that node being occasionally unreachable. In the Metrics interface, you can display a job's average and maximum task attempt durations for both map and reduce attempts. A high variance between the average and maximum attempt durations suggests that some task attempts are taking an unusually long time. You can sort the list of jobs by maximum map task attempt duration to find jobs with such an unusually high variance.

MapR 5.1 Documentation

5.1 Administration
This section contains administration information such as managing your cluster, data, users, jobs and security information such as configuring your security environment and auditing.
- Administrator Guide
  - Managing Users and Groups
  - Managing Licenses
  - Managing the Cluster
  - Managing Data with Volumes
  - Monitoring the Cluster
    - Monitoring Cluster Health
    - Monitoring Performance
      - Setting up the MapR Metrics Database
      - Service Metrics
      - Analyzing Job Metrics
        The Job Metrics Database
        Results Filtering
        Metrics Protocol Buffers
        Example: Using MapR Metrics to Diagnose a Faulty Network Interface Card (NIC)
        In this example, a node in your cluster has a NIC that is intermittently failing. This condition is leading to abnormally long task completion times due to that node being occasionally unreachable. In the Metrics interface, you can display a job's average and maximum task attempt durations for both map and reduce attempts. A high variance between the average and maximum attempt durations suggests that some task attempts are taking an unusually long time. You can sort the list of jobs by maximum map task attempt duration to find jobs with such an unusually high variance.
      - Monitoring Node Metrics
      - Configuring Balancer Settings
      - Third-Party Monitoring Tools
    - Setting Up Alarm Notifications
    - Checking Alarms
    - Identifying the Log File Associated with a Storage Pool
  - Cluster Resource Allocation
  - Managing the MapReduce Mode
  - Managing Jobs and Applications
  - Maintenance Schedule
  - YARN Cgroups
  - JobTracker High Availability
  - Node Manager Restart
  - ResourceManager High Availability
  - Performance Tuning
  - Troubleshooting Cluster Administration
- Security Guide

Example: Using MapR Metrics to Diagnose a Faulty Network Interface Card (NIC)

In this example, a node in your cluster has a NIC that is intermittently failing. This condition is leading to abnormally long task completion times due to that node being occasionally unreachable. In the Metrics interface, you can display a job's average and maximum task attempt durations for both map and reduce attempts. A high variance between the average and maximum attempt durations suggests that some task attempts are taking an unusually long time. You can sort the list of jobs by maximum map task attempt duration to find jobs with such an unusually high variance.

Click the name of a job name to display information about the job's tasks, then sort the task attempt list by duration to find the outliers. Because the list of tasks includes information about the node the task is running on, you can see that several of these unusually long-running task attempts are assigned to the same node. This information suggests that there may be an issue with that specific node that is causing task attempts to take longer than usual.

When you display summary information for that node, you can see that the Network I/O speeds are lower than the speeds for other similarly configured nodes in the cluster. You can use that information to examine the node's network I/O configuration and hardware and diagnose the specific cause.