Example Cluster Designs

You can design your cluster in one of the following modes:

  • MapReduce Classic: All nodes in the cluster run MapReduce v1
  • YARN: All nodes in the cluster run YARN (MapReduce v2 and other applications that can run on YARN) .
  • Mixed-Mode: Nodes in the cluster can run YARN or MapReduce v1

Small Converged Community Edition Cluster

For a small cluster, assign the CLDB, JobTracker, NFS, and WebServer services to one node each. A hardware failure on any of these nodes would result in a service interruption, but the cluster can be recovered. Assign the ZooKeeper service to the CLDB node and two other nodes. Assign the FileServer and TaskTracker services to every node in the cluster.

Example Service Configuration for a 5-Node Cluster with the Converged Community Edition - Runs MapReduce Classic (MapReduce v1)

This cluster has several single points of failure, at the nodes with CLDB, JobTracker and NFS.

Small High-Availability Cluster with the Converged Enterprise Edition, Hadoop module

A small Enterprise Edition cluster can ensure high availability (HA) for all services by providing at least two instances of each service, eliminating single points of failure.

Example Service Configuration for a 5-Node Cluster with the Converged Enterprise Edition, Hadoop module - Runs MapReduce Classic (MapReduce v1)

The example below depicts a 5-node, high-availabilty cluster with HBase installed. ZooKeeper is installed on three nodes. CLDB, JobTracker, and HBase Master services are installed on two nodes each, spread out as much as possible across the nodes:

This example put CLDB and ZooKeeper services on the same nodes and places JobTracker services on other nodes, but this is somewhat arbitrary. The JobTracker service can coexist on the same node as ZooKeeper or CLDB services.

Example Service Configuration for a 10-Node Cluster with the Converged Enterprise Edition, Hadoop module - Runs in Mixed-Mode (both MapReduce v1 and YARN)

The example below depicts a 10-node, high-availability cluster that can run both MapReduce v1 jobs and YARN applications. ZooKeeper, CLDB, JobTracker, and ResourceManager is installed on three nodes:

Large High-Availability Cluster with the Converged Enterprise Edition, Hadoop module

On a large cluster designed for high availability (HA), services can be assigned to nodes similar to the following example, which depicts a 150-node cluster.

In the following 150-node cluster example:

  • The majority of nodes are dedicated to the TaskTracker and odeManager services.
  • ZooKeeper, CLDB, JobTracker, and ResourceManager are installed on three nodes each.
  • The NFS server is installed on most machines, providing high network bandwidth to the cluster.
  • Rhe configuration is in mixed-mode, meaning, both MapReduce v1 and YARN are running.