Setting Up Multi-Master Replication of MapR Streams Streams

In this replication topology, there are two master-slave relationships, with each stream playing both the role of a master and a slave. Client applications update both streams and each stream replicates updates to the other.

Prerequisites

  • Configure the MapR clusters that will be involved in your replication scenario. See Configuring MapR Clusters for Replication Between Streams.
  • Run the command maprcli stream info on each stream to verify that you have the adminperm permission.
  • Ensure that producers are not publishing messages to either stream.

About this task

All updates from a source stream arrive at a replica after having been authenticated at a gateway.

In this diagram, the activity stream on the cluster sanfrancisco replicates updates to the activity stream in the cluster newyork. The latter stream in turn replicates updates to the former stream. MapR-DB tags each operation with the universally unique ID (UUID) that it has assigned the stream at which the operation originated. Therefore, operations are replicated only once and are not replicated back to the originating stream.

In this diagram, both streams are in a single cluster. Operations on stream activity1 are replicated to stream activity2 and vice versa.

Restrictions:

  • You must replicate all of the topics that are in a stream. You cannot select only a subset of topics to replicate.
  • The maximum number of replicas that a stream can replicate to is 64.
  • The maximum number of upstream sources that a replica can accept data from is 64.
  • Names of topics must be unique on all streams. Messages are assigned sequential offsets. The offsets for messages in a topic in one copy could conflict with the offsets for messages in the other copy. As a result, messages could be lost.

    In this diagram, producers on both streams are publishing messages to the same topic. This will not work reliably and should not be done.

    In this diagram, topic names are unique to each stream. Therefore, offsets for the messages published to the topics cannot conflict.

Procedure

To set up multi-master replication of streams:
  1. Log into both the source and destination clusters.
  2. Run the command maprcli stream replica autosetup, which performs these steps for you:
    1. Create a stream on the destination cluster. If the replica already exists, the command terminates with an error message.
    2. Declare the new stream to be a replica of the source stream.
    3. Declare the source stream as an upstream source for the replica stream.
    4. Copy messages from the source stream into the replica stream.
    5. Declare the source stream to be a replica of the new stream.
    6. Declare the new stream to be an upstream source for the source stream.
    7. Start replication.

    Here is the syntax of the command:

    maprcli stream replica autosetup -path <path to source stream> 
    -replica <path to replica stream> -multimaster yes

    For example, to set up replication between the activity stream in the sanfrancisco cluster and a new activity stream in the newyork cluster, you could use this command:

    maprcli stream replica autosetup -path /mapr/sanfrancisco/activity 
    -replica /mapr/newyork/activity -multimaster yes

    This procedure sets up asynchronous replication. If you want to set up synchronous replication, use the optional -synchronous parameter. This parameter specifies whether replication is synchronous or asynchronous. Asynchronous is the default. The values are yes for synchronous and no for asynchronous.