Setting Up Master-Slave Replication of MapR Streams Streams
In this topology, you replicate in one direction. Replica streams can be in a remote MapR cluster or in the MapR cluster where their source streams are located.
Prerequisites
About this task
In this diagram, changes to the activity
stream in the
sanfrancisco
cluster are replicated to the
activity
stream in the newyork
cluster. The
circles represent MapR gateways.
In this diagram, changes to the activity1
stream in the
sanfrancisco
cluster are replicated to the
activity2
stream within the same cluster.
All updates from a source stream arrive at a replica
stream after having been authenticated at a gateway. Therefore, the
produceperm
access control expressions on the replica stream is
irrelevant; gateways have the implicit authority to publish messages to topics in
replica streams.
- You must replicate all of the topics that are in a stream. You cannot select only a subset of topics to replicate.
- The maximum number of replicas that a stream can replicate to is 64.
- The maximum number of upstream sources that a replica can accept data from is 64.
- In a Master-Slave setup, you cannot have two masters with the same topic name replicating to the same slave. It creates a conflict for that topic name. This is similar to Multi-Master replication where you must have separate topic names for Master1 (Cluster1) and Master2 (Cluster2).
- In many-to-one replication, topics with the same name should not be
replicated to an aggregate replica.
For example, suppose that a company wants to replicate the
activity
stream in thesanfrancisco
cluster and theactivity
stream in thenewyork
cluster to theactivity
stream in thesingapore
cluster. Both the replicated streams contain a topic namedtopic_1
. The offsets for messages in thesanfrancisco
version oftopic_1
might also be used for messages in thenewyork
version. Therefore, replication of both topics to the same stream could lead to conflicts in the replica stream.For the replication of the
activity
streams to avoid such conflicts, it's safest to replicate each to a separate stream. In this diagram, the blue lines show the replication of theactivity
stream in thesanfrancisco
cluster to theactivity_sf
stream in thesingapore
cluster, and the green lines show the replication of theactivity
stream in thenewyork
cluster to theactivity_ny
stream in thesingapore
cluster. Even though topic names are the same in both replicated streams, there are no conflicts.
Procedure
-
Set up replication automatically by following these steps:
-
Set up replication manually by following these steps:
- Create the replica manually with the
maprcli stream create
command. Use the-copyMetaFrom
option to ensure that the metadata for the replica is identical to the metadata for the source stream.maprcli stream create -path <path to the replica> -copyMetaFrom <path to the source stream>
For example, to create the replica
activity
in thenewyork
cluster and use the metadata from the source stream in thesanfrancisco
cluster, you could use this command:maprcli stream create -path /mapr/newyork/activity -copymetafrom /mapr/sanfrancisco/activity
- Register the replica as a replica of the source stream by running the
maprcli stream replica add
command.maprcli stream replica add -path <path to the source stream> -replica <path to the replica> -paused true
For example, to register the
activity
stream in thenewyork
cluster as a replica of theactivity
stream in thesanfrancisco
cluster, you could use this command:maprcli stream replica add -path /mapr/sanfrancisco/activity -replica /mapr/newyork/activity -paused true
The
-paused
parameter ensures that replication does not start immediately after you register the source stream as a source for this replica. You do this registration in step 4. - Verify that you specified the correct replica by running the
maprcli stream replica list
command.maprcli stream replica list -path <path to the source stream>
To verify that the
activity
stream in thenewyork
cluster is a replica of theactivity
stream in thesanfrancisco
cluster, you could look at the output of this command:maprcli stream replica list -path /mapr/sanfrancisco/activity
- Authorize replication between the streams by defining the source stream
as the upstream stream for the replica by running the
maprcli stream upstream add
command.Definition of the upstream stream ensures that a stream cannot replicate updates to any replica. Replication depends on a two-way agreement between the owners of the two streams.
maprcli stream upstream add -path <path to the replica> -upstream <path to the source stream>
To add the
activity
stream in thesanfrancisco
cluster as an upstream source for theactivity
stream in thenewyork
cluster:maprcli stream upstream add -path /mapr/newyork/activity -upstream /mapr/sanfrancisco/activity
- Verify that you specified the correct source stream by running the
maprcli stream upstream list
command.maprcli stream upstream list -path <path to the replica>
To verify this in our example scenario, you could use this command:
maprcli stream upstream list -path /mapr/newyork/activity
- Load the replica with data from the source stream by using the
mapr copystream
utility. - Start replication with the command
maprcli stream replica resume
.Here is the
maprcli
command:maprcli stream replica resume -path <path to the source stream> -replica <path to the replica>
For our example scenario, you could use this command:
maprcli stream replica resume -path mapr/sanfrancisco/activity -replica /mapr/newyork/activity
- Create the replica manually with the