Master-slave replication
Several different topologies are possible for master-slave replication:
Replication from one source table to one or more replica tables
In this topology, updates on a source table are replicated to one or more replicas, but updates to the replicas are not replicated back to the source table.
For example, in this diagram updates to the
customers
table in the cluster
sanfrancisco
are being replicated to the
newyork
and hyderabad
clusters. The
circles marked G each represent a MapR gateway.
However, changes to the table in the newyork
and
hyderabad
clusters are not replicated back to the
table in the sanfrancisco
cluster.
You can also replicate within a single cluster. In this example,
the cluster sanfrancisco
contains both the source
table and the replica.
Many-to-one replication
Multiple source tables can replicate to a single replica. In
this diagram, operations on customers
tables in three
different clusters are replicated via gateways to the
customers
table in the newyork
cluster.
One-to-many replication
A single source table can replicate to multiple replicas. In
this diagram, operations on the customers
table in the
sanfrancisco
cluster are replicated via gateways to
replicas in three other clusters.
Replication loops
When three or more tables need to be kept in sync, you can set up master-slave replication between pairs of them to form a replication loop. Operations on a table are propagated to the other clusters in the loop, but there is no attempt to reapply the operations at the originating table. This is because the operations are tagged with a universally unique identifier (UUID) that identifies the table where the operations originated.
In this diagram, for example, operations on the
customers
table in the hyderabad
cluster
are replicated first to the customers
table in the
tokyo cluster. The operations are then replicated from the
tokyo
cluster to the customers
table in
the sanfrancisco
cluster. Finally, the operations are
replicated from the sanfrancisco
cluster to the
customers
table in the newyork
cluster.
The newyork
cluster does not replicate the operations
to the customers
table in the hyderabad
cluster.
Master-slave replication in two directions
You can combine master-slave replication configurations to replicate data between clusters. Two clusters engaged in replication can each act as a source cluster and a destination cluster.
In this example, the data in the customers
table in
the cluster sanfrancisco
is replicated to the
customers
table in the cluster newyork
.
At the same time, the data in the products
table in
the newyork
cluster is replicated to the
products
table in the cluster
sanfrancisco
.
In all master-slave configurations, changes made to replica tables are not replicated back to source tables. Therefore, if the replicated data is modified at the replica by client applications, the replica will become out of sync with the source table.
For example, you might replicate the two column families
personal
and purchases
from the
customer
table in the sanfrancisco
cluster to the customers
table in the
newyork
cluster, as in this diagram. (For simplicity,
the blue circle labeled G represents two or more gateways, rather
than one as in the other diagrams in this topic.)
In master-slave replication, no updates to a replica are
replicated back to the source. Any updates that applications might
make to those two column families in the customers
table in the newyork
cluster will not be replicated to
the customers
table in the sanfrancisco
cluster.
However, you don’t have to protect a replica from all updates
that are not due to replication. For example, the
customers
table in the newyork
cluster
might have an additional column family that is not populated with
replicated data: reviews
.