Stream Replication

You can replicate streams to other data-fabric clusters worldwide, or to other streams within a data-fabric cluster.

There are many scenarios in which replicating HPE Ezmeral Data Fabric Streams streams can be useful.

Basic Primary-Secondary Replication

For example, suppose that your company has a factory in Nagoya, and sensors in the equipment track different metrics. The sensors are producers publishing messages to a stream named metrics. The applications that use the collected metrics would read the messages from the stream, playing the role of consumers. With replication, the factory could create a stream in the nagoya cluster and maintain a backup of the stream in the nagoya_ha cluster.

This type of replication is called basic primary-secondary replication because replication is in one direction only. The metrics stream in the nagoya_ha cluster is considered to be a replica. The original metrics stream is considered to be the upstream source for the replica. This type of replication is simple to set up with the command maprcli stream replica autosetup.

Suppose further that your company also has a factory in Kaesong that collects metrics from its equipment, analyzes the data, and replicates its own metrics streams to a backup.

Many-to-One Replication

Your company's headquarters are in San Francisco and you want data analysts there to analyze all data company-wide. You can replicate the two metrics streams that are in the your factories to the metrics stream in the sanfrancisco cluster. In this scenario, the replica is the metrics stream in the sanfrancisco cluster. This replica has two upstream sources: the metrics streams that are replicated from the two factories.

This type of replication, called many-to-one replication, requires that the topics in each stream have unique names, so that message offsets do not conflict. For example, suppose both factories have an assembly line named Line 2 and the topic in each factory's stream for collecting metrics from this line is named line_2. At some point, the Nagoya factory and the Kaesong factory both replicate messages that use the same offsets. Since offsets are replicated together with messages, messages can be overwritten in this case.

To avoid this type of problem, the sensors for Line 2 in the Nagoya factory might publish to a topic named line_2_nagoya, the sensors for Line 2 in the Kaesong factory might publish to a topic named line_2_kaesong, and so on. The consolidated stream in San Francisco would contain the topics line_2_nagoya and line_2_kaesong.

Multi-Master Replication

Another kind of of replication that can be useful is multi-master replication. You can use it when you need two streams, both to send updates to and receive updates from the other stream. Each stream is a replica and an upstream source. HPE Ezmeral Data Fabric Streams keeps both streams synchronized with each other. This type of replication is also simple to set up with the command maprcli stream replica autosetup.

As with many-to-one replication, the names of the topics in each stream must be unique across both streams, so that offsets for messages do not conflict.

Updates are applied to replica streams by data-fabric gateways. See Gateways and Stream Replication for more information.