Service Layout in a Cluster

How you assign services to nodes depends on the scale of your cluster and the MapR license level. For a single-node cluster, no decisions are involved. All of the services you are using run on the single node. On medium clusters, the performance demands of the CLDB and ZooKeeper services requires them to be assigned to separate nodes to optimize performance. On large clusters, good cluster performance requires that these services run on separate nodes.

The cluster is flexible and elastic. Nodes play different roles over the lifecycle of a cluster. The basic requirements of a node are not different for management or for data nodes.

As the cluster size grows, it becomes advantageous to locate control services (such as ZooKeeper and CLDB) on nodes that do not run compute services (such as TaskTracker). The MapR Congi Converged Community Edition not include HA capabilities, which restricts how many instances of certain services can run. The number of nodes and the services they run will evolve over the life cycle of the cluster.

The architecture of MapR software allows virtually any service to run on any node, or nodes, to provide a high-availability, high-performance cluster. Below are some guidelines to help plan your cluster's service layout.

NOTE: It is possible to install MapR Hadoop on a one- or two-node demo cluster. Production clusters may harness hundreds of nodes, but five- or ten-node production clusters are appropriate for some applications.