Replicating and Indexing from a Single MapR Source Cluster
You can replicate to one or more MapR clusters and index table data in one or more Elasticsearch clusters all from a single MapR cluster.
For example, in the following diagram the customers table in the MapR cluster newyork is being replicated to the MapR cluster sanfrancisco. The gateways for replication are in the sanfrancisco cluster. The customers table is also being indexed in a single Elasticsearch cluster.
In the following diagram, the customers table in the MapR cluster
newyork is being replicated to the MapR cluster
sanfrancisco. The gateways for replication are in the
sanfrancisco cluster. The customers
table
is also being indexed in multiple Elasticsearch clusters. The gateways for
indexing are running on newyork. Updates to the
customers
table are sent to the gateways, which distributes
them to Elasticsearch nodes running in the different Elasticsearch clusters. Those
nodes distribute the updates to nodes where shards of the destination index are
located.
Replication to another MapR cluster
For replication to another MapR cluster, MapR-DB knows to use the gateways that are in the remote MapR cluster because the name of the that cluster is associated with the gateways.
- If you use the
maprcli cluster gateway set
command on the newyork cluster, in the-dstcluster
parameter of this command you would specify the name of the remote MapR cluster: sanfrancisco. MapR-DB would then understand that replication to this cluster goes through gateways A and B. - If you chose to use a DNS record, the record would look like this, where A and B are
the hostnames or IP addresses of the gateways in the
newyork
cluster:gateway.sanfrancisco IN TXT “A B”
- If you choose not to use either of these methods but instead rely on an entry in your
cluster’s
mapr-clusters.conf
file (assuming that gateways A and B were on CLDB nodes in sanfrancisco, the entry in this file would start with the cluster name “sanfrancisco”.
Indexing Table Data in an Elasticsearch Cluster
For indexing table data in an Elasticsearch cluster, MapR-DB knows to use the gateways that are in the source MapR cluster, and the name of that cluster is associated with the gateways.
When you install the mapr-gateway
package on a node, you specify the
MapR cluster that the node is a part of.
- If you use the
maprcli cluster gateway set
command on the newyork cluster, in the-dstcluster
parameter of this command, you would specify the name newyork. MapR-DB would then understand that indexing to the Elasticsearch cluster goes through gateways C and D. - If you chose to use a DNS record, the record would look like this, where C and D are
the hostnames or IP addresses of the gateways in the
sanfrancisco
cluster:gateway.newyork IN TXT “C D”
- If you choose not to use either of these methods but instead rely on an entry in your
cluster’s
mapr-clusters.conf
file (assuming that gateways C and D were on CLDB nodes in the MapR cluster newyork, the entry in this file would start with the cluster name “newyork”.