Setting Up Master-Slave Replication of MapR Streams Streams

In this topology, you replicate in one direction. Replica streams can be in a remote MapR cluster or in the MapR cluster where their source streams are located.

Prerequisites

About this task

In this diagram, changes to the activity stream in the sanfrancisco cluster are replicated to the activity stream in the newyork cluster. The circles represent MapR gateways.

In this diagram, changes to the activity1 stream in the sanfrancisco cluster are replicated to the activity2 stream within the same cluster.

All updates from a source stream arrive at a replica stream after having been authenticated at a gateway. Therefore, the produceperm access control expressions on the replica stream is irrelevant; gateways have the implicit authority to publish messages to topics in replica streams.

Restrictions:
  • You must replicate all of the topics that are in a stream. You cannot select only a subset of topics to replicate.
  • The maximum number of replicas that a stream can replicate to is 64.
  • The maximum number of upstream sources that a replica can accept data from is 64.
  • In a Master-Slave setup, you cannot have two masters with the same topic name replicating to the same slave. It creates a conflict for that topic name. This is similar to Multi-Master replication where you must have separate topic names for Master1 (Cluster1) and Master2 (Cluster2).
  • In many-to-one replication, topics with the same name should not be replicated to an aggregate replica.

    For example, suppose that a company wants to replicate the activity stream in the sanfrancisco cluster and the activity stream in the newyork cluster to the activity stream in the singapore cluster. Both the replicated streams contain a topic named topic_1. The offsets for messages in the sanfrancisco version of topic_1 might also be used for messages in the newyork version. Therefore, replication of both topics to the same stream could lead to conflicts in the replica stream.

    For the replication of the activity streams to avoid such conflicts, it's safest to replicate each to a separate stream. In this diagram, the blue lines show the replication of the activity stream in the sanfrancisco cluster to the activity_sf stream in the singapore cluster, and the green lines show the replication of the activity stream in the newyork cluster to the activity_ny stream in the singapore cluster. Even though topic names are the same in both replicated streams, there are no conflicts.

Procedure

For each source/replica pair, follow either of these steps:
  • Set up replication automatically by following these steps:
    1. Log into both the source and destination clusters.
    2. Run the command maprcli stream replica autosetup, which performs these steps:
      1. Create a stream on the replication cluster. If the replica already exists, the command terminates with an error message.
      2. Declare the new stream to be a replica of the source stream.
      3. Declare the source stream as an upstream source for the replica stream.
      4. Copy messages from the source stream to the replica stream.
      5. Start replication.

      Here is the syntax of the command:

      maprcli stream replica autosetup -path\
      <path to source stream> -replica <path to replica stream>
      NOTE: This command sets up asynchronous replication. If you want to set up synchronous replication or use any of the other optional parameters, see stream replica autosetup.

      For example, to set up replication from the activity stream in the sanfrancisco cluster to a new activity stream in the newyork cluster, you could use this command:

      maprcli stream replica autosetup -path 
      /mapr/sanfrancisco/activity -replica /mapr/newyork/activity

      To set up replication from the activityA stream in the sanfrancisco cluster to a new activityB stream in the same cluster, you could use this command:

      maprcli stream replica autosetup -path 
      /mapr/sanfrancisco/activityA -replica /mapr/sanfrancisco/activityB
  • Set up replication manually by following these steps:
    1. Create the replica manually with the maprcli stream create command. Use the -copyMetaFrom option to ensure that the metadata for the replica is identical to the metadata for the source stream.
      maprcli stream create -path <path to the replica> 
      -copyMetaFrom <path to the source stream>

      For example, to create the replica activity in the newyork cluster and use the metadata from the source stream in the sanfrancisco cluster, you could use this command:

      maprcli stream create -path /mapr/newyork/activity 
      -copymetafrom /mapr/sanfrancisco/activity
    2. Register the replica as a replica of the source stream by running the maprcli stream replica add command.
      maprcli stream replica add -path <path to the source stream> 
      -replica <path to the replica> -paused true

      For example, to register the activity stream in the newyork cluster as a replica of the activity stream in the sanfrancisco cluster, you could use this command:

      maprcli stream replica add -path /mapr/sanfrancisco/activity 
      -replica /mapr/newyork/activity -paused true

      The -paused parameter ensures that replication does not start immediately after you register the source stream as a source for this replica. You do this registration in step 4.

    3. Verify that you specified the correct replica by running the maprcli stream replica list command.
      maprcli stream replica list -path <path to the source stream>

      To verify that the activity stream in the newyork cluster is a replica of the activity stream in the sanfrancisco cluster, you could look at the output of this command:

      maprcli stream replica list -path /mapr/sanfrancisco/activity
    4. Authorize replication between the streams by defining the source stream as the upstream stream for the replica by running the maprcli stream upstream add command.

      Definition of the upstream stream ensures that a stream cannot replicate updates to any replica. Replication depends on a two-way agreement between the owners of the two streams.

      maprcli stream upstream add -path <path to the replica> -upstream 
      <path to the source stream>

      To add the activity stream in the sanfrancisco cluster as an upstream source for the activity stream in the newyork cluster:

      maprcli stream upstream add -path /mapr/newyork/activity -upstream 
      /mapr/sanfrancisco/activity
    5. Verify that you specified the correct source stream by running the maprcli stream upstream list command.
      maprcli stream upstream list -path <path to the replica>

      To verify this in our example scenario, you could use this command:

      maprcli stream upstream list -path /mapr/newyork/activity
    6. Load the replica with data from the source stream by using the mapr copystream utility.
    7. Start replication with the command maprcli stream replica resume.

      Here is the maprcli command:

      maprcli stream replica resume -path <path to the source stream> 
      -replica <path to the replica>

      For our example scenario, you could use this command:

      maprcli stream replica resume -path mapr/sanfrancisco/activity 
      -replica /mapr/newyork/activity