Replicating and Indexing from a Single MapR Source Cluster

You can replicate to one or more MapR clusters and index table data in one or more Elasticsearch clusters all from a single MapR cluster.

For example, in the following diagram the customers table in the MapR cluster newyork is being replicated to the MapR cluster sanfrancisco. The gateways for replication are in the sanfrancisco cluster. The customers table is also being indexed in a single Elasticsearch cluster.

In the following diagram, the customers table in the MapR cluster newyork is being replicated to the MapR cluster sanfrancisco. The gateways for replication are in the sanfrancisco cluster. The customers table is also being indexed in multiple Elasticsearch clusters. The gateways for indexing are running on newyork. Updates to the customers table are sent to the gateways, which distributes them to Elasticsearch nodes running in the different Elasticsearch clusters. Those nodes distribute the updates to nodes where shards of the destination index are located.

Replication to another MapR cluster

For replication to another MapR cluster, MapR-DB knows to use the gateways that are in the remote MapR cluster because the name of the that cluster is associated with the gateways.

For example,

If you use the maprcli cluster gateway set command on the newyork cluster, in the -dstcluster parameter of this command you would specify the name of the remote MapR cluster: sanfrancisco. MapR-DB would then understand that replication to this cluster goes through gateways A and B.
If you chose to use a DNS record, the record would look like this, where A and B are the hostnames or IP addresses of the gateways in the newyork cluster: gateway.sanfrancisco IN TXT “A B”
If you choose not to use either of these methods but instead rely on an entry in your cluster’s mapr-clusters.conf file (assuming that gateways A and B were on CLDB nodes in sanfrancisco, the entry in this file would start with the cluster name “sanfrancisco”.

Indexing Table Data in an Elasticsearch Cluster

For indexing table data in an Elasticsearch cluster, MapR-DB knows to use the gateways that are in the source MapR cluster, and the name of that cluster is associated with the gateways.

When you install the mapr-gateway package on a node, you specify the MapR cluster that the node is a part of.

If you use the maprcli cluster gateway set command on the newyork cluster, in the -dstcluster parameter of this command, you would specify the name newyork. MapR-DB would then understand that indexing to the Elasticsearch cluster goes through gateways C and D.
If you chose to use a DNS record, the record would look like this, where C and D are the hostnames or IP addresses of the gateways in the sanfrancisco cluster: gateway.newyork IN TXT “C D”
If you choose not to use either of these methods but instead rely on an entry in your cluster’s mapr-clusters.conf file (assuming that gateways C and D were on CLDB nodes in the MapR cluster newyork, the entry in this file would start with the cluster name “newyork”.