Node Types

In a production MapR cluster, some nodes are typically dedicated to cluster coordination and management, and other nodes are tasked with data storage and processing duties. An edge node provides user access to the cluster, concentrating open user privileges on a single host. In smaller clusters, the work is not so specialized and a single node may perform data processing as well as management.

Nodes Running ZooKeeper and CLDB

High latency on a ZooKeeper node can lead to an increased incidence of ZooKeeper quorum failures. A ZooKeeper quorum failure occurs when the cluster finds too few copies of the ZooKeeper service running. If the ZooKeeper node is also running other services, competition for computing resources can lead to increased latency for that node. If your cluster experiences issues relating to ZooKeeper quorum failures, consider reducing or eliminating the number of other services running on the ZooKeeper node.

Nodes for Data Storage and Processing

Most nodes in a production cluster are data nodes. Data nodes can be added or removed from the cluster as requirements change over time.

FileServer, TaskTracker and NodeManager run on data nodes. You may want to tune TaskTracker for fewer slots on nodes that include both management and data services. For more information, see Resource Allocation for Jobs and Applications.

Edge Nodes

So-called Edge nodes provide a common user access point for the MapR webserver and other client tools. Edge nodes may or may not be part of the cluster, as long as the edge node can reach cluster nodes. Nodes on the same network can run client services, MySQL for Metrics, and so on.

Node Requirements for Table Replication

If you are using table replication, you need to designate nodes on the destination cluster as gateway nodes. You have to install the mapr-gateway package on these nodes.

You also need to install the HBase 0.98.9 client on your web server nodes (mapr-hbase); otherwise, table replication features will not work in the MCS.