Setting Up Variations of Master-Slave Replication
These replication topologies are variations of master-slave replication.
Prerequisites
- Configure one or more gateways in the destination cluster. See MapR Gateways.
- If the source and destination clusters are secured, set up security for replication between the clusters. See Configuring MapR Clusters for Replication Between Tables.
- Run the
maprcli table info
command on the source table to verify that you have the following permissions:-
readperm
, which is required for reading from the table. -
replperm
, which is required for replicating from the table.
-
- Run the
maprcli table info
command on the destination table (if it already exists) to verify that you have the following permissions:-
bulkload
, which is required for the initial copy of source data into the destination table. -
replperm
, which is required for receiving replicated updates from the source table.
-
About this task
Many-to-One Replication
In this topology, two or more source tables replicate to a single replica.
In this diagram, the clusters sanfrancisco
,
hyderabad
, and tokyo
each have a table
named customers
. The operations on these tables are replicated
to a replica in the newyork
cluster. The blue circles represent
gateways.
One-to-Many Replication
In this topology, a single source table replicates to two or more replicas.
In this diagram, the operations on the customers
table in the
cluster sanfrancisco
are replicated to replicas in the
newyork
, tokyo
, and
hyderabad
clusters. The blue circles represent
gateways.
Master-Slave Replication in Two Directions
In this topology, a source table in replicates to a replica in a destination cluster, while a source table in that destination cluster replicates to a replica in the cluster where the first source table is located.
This replication topology is a variation of master-slave replication.
In this diagram, operations on the customers
table in the
sanfrancisco
cluster are replicated to the
newyork
cluster. At the same time, operations on the
products
table in the newyork
cluster are
replicated to the sanfrancisco
cluster.
All updates from a source table arrive at a replica after having been authenticated at a gateway. Therefore, access control expressions on the replica that control permissions for updates to column families and columns are irrelevant; gateways have the implicit authority to update replicas.
Procedure
- Log into both the source and destination clusters.
-
To set up this replication topology, follow one of these procedures:
NOTE: Although this procedure describes the steps to take in the
maprcli
, you can set up this replication topology in the MapR Control Service (MCS). Log into MCS and select MapR Tables in the navigation menu. Select a table to be the source table and click the Replicas tab. The actions for setting up replication are in this location.
Automatic Setup
About this task
maprcli table replica autosetup
command, which performs these steps:Procedure
- Create a table on the replication cluster. This table has the same column families as the source table.
- Declare the new table to be a replica of the source table.
- Declare the source table as an upstream source for the replica.
- Load the a copy of the source data into the replica.
- Start replication.
Example
maprcli table replica autosetup -path <path to source table> -replica <path to replica>
For example, to set up replication from the customers
table in the
sanfrancisco
cluster to a new customers
table
in the newyork
cluster, you could use this command:
maprcli table replica autosetup -path /mapr/sanfrancisco/customers -replica /mapr/newyork/customers
To set up replication from the customersA
table in the
sanfrancisco
cluster to a new customersB
table
in the same cluster, you could use this command:
maprcli table replica autosetup -path /mapr/sanfrancisco/customersA -replica /mapr/sanfrancisco/customersB
This command takes three optional parameters: - -columns
- The value is a comma-separated list of items with the following syntax:
<column family>
<column family>:<column>
For example, to replicate only the column familypurchases
and the columnstars
in thereviews
column family, the value would look like this:-columns purchases,reviews:stars
- -synchronous
- This parameter specifies whether replication is synchronous or
asynchronous. Asynchronous is the default. The values are
yes
for synchronous andno
for asynchronous. - -multimaster
- This parameter specifies whether to set up a multi-master topology. The
values are
yes
andno
. For setting up a basic master-slave topology, accept the default value.
Manual Setup
About this task
maprcli
, follow these
steps: Procedure
-
Create the replica manually with the
maprcli table create command
. Use the-copyMetaFrom
option to ensure that the metadata for the replica is identical to the metadata for the source table. Metadata includes column families, access control expressions (ACEs), and other attributes.
For example, to create the replicamaprcli table create -path <path to the replica> -copyMetaFrom <path to the source table>
customers
in thenewyork
cluster and use the metadata from the source table in thesanfrancisco
cluster, you could use this command:maprcli table create -path /mapr/newyork/customers -copymetafrom /mapr/sanfrancisco/customers
-
Register the replica as a replica of the source table by running the
maprcli table replica add
command.
For example, to register themaprcli table replica add -path <path to the source table> -replica <path to the replica> -paused true
customers
table in thenewyork
cluster as a replica of thecustomers
table in thesanfrancisco
cluster, you could use this command:
Themaprcli table replica add -path /mapr/sanfrancisco/customers -replica /mapr/newyork/customers -paused true
-paused
parameter ensures that replication does not start immediately after you register the source table as a source for this replica. You do this registration in step d. -
Verify that you specified the correct replica by running the
maprcli table replica list
command.
To verify that the customers table in themaprcli table replica list -path <path to the source table>
newyork
cluster is a replica of thecustomers
table in thesanfrancisco
cluster, you could look at the output of this command:maprcli table replica list -path /mapr/sanfrancisco/customers
-
Authorize replication between the tables by defining the source table as
the upstream table for the replica by running the
maprcli table upstream add
command. Definition of the upstream table ensures that a table cannot replicate updates to any replica. Replication depends on a two-way agreement between the owners of the two tables.
To add themaprcli table upstream add -path <path to the replica> -upstream <path to the source table>
customers
table in thesanfrancisco
cluster as an upstream source for thecustomers
table in thenewyork
cluster:maprcli table upstream add -path /mapr/newyork/customers -upstream /mapr/sanfrancisco/customers
-
Verify that you specified the correct source table by running the
maprcli table upstream list
command.
To verify this in our example scenario, you could use this command:maprcli table upstream list -path <path to the replica>
maprcli table upstream list -path /mapr/newyork/customers
-
If you set
-paused
totrue
when adding the replica, follow these steps:
What to do next
About this task
maprcli
table replica add
command did not specify column families or columns to
replicate), the new column family is not automatically created at the replica.You can add the new column family to the replica only if the entire source table is being replicated, then updates to the new column family will immediately start being replicated. You do not need to carry out the next steps. Continue only if you are replicating a subset of column families and columns.
If you are replicating a subset of column families and columns, follow these steps to add a new column family to the replica:
Procedure
-
Pause replication by running the
maprcli table replica pause
command. -
Add the new column family to the replica by running the
maprcli table replica edit
command. -
Copy the data for this column family from the source table into the replica
by using the MapR-DB CopyTable utility. Use the
-columns
parameter to specify the name of the column family. -
Resume replication by running the
maprcli table replica resume
command.