Mirroring Topics from a MapR Cluster to an Apache Kafka Cluster
You can use MirrorMaker to mirror data continuously from MapR streams in MapR clusters to Apache Kafka clusters.
Prerequisites
- Because this procedure requires that MirrorMaker be run from the MapR cluster, ensure that the mapr-kafka package is installed on the node that you choose to run MirrorMaker from.
- Configure the node as a MapR client.
- Ensure that the ID of the user who runs MirrorMaker has the
consumeperm
permission on the MapR stream.
About this task
After you start MirrorMaker, it launches a configurable number of consumer threads to read topics that are in a stream in a MapR cluster and a number of producers to write the messages from those topics into topics in an Apache Kafka cluster.
Before running MirrorMaker, you create a file that contains the required configuration parameters for the consumers that read from the stream in the MapR cluster. You also create a file that contains the required configuration parameters for the producers that publish to the Apache Kafka cluster. You point to these files in the MirrorMaker command.
You can either specify the topics to mirror or the topics not to mirror. In the
former case, you use the whitelist
parameter to provide a
Java-style regular expression that matches the names of the topics that you want to
mirror. In the latter case, you use the blacklist
parameter to
provide a Java-style regular expression that matches the names of the topics that
you do not want to mirror.
Procedure
-
Create a file that contains the required properties and values for consumers to
use. When you run MirrorMaker, you point to this file by using the
consumer.config
parameter.Property Description streams.record.strip.streampath
Set the value of this property to true. In messages that are written to MapR streams, the names of topics include the paths and names of the streams in which those topics are located. Apache Kafka needs only the names of the topics. This parameter removes the path and name of the stream that the topics will be mirrored from. streams.consumer.default.stream
Specifies the path and name of the stream that the topics will be mirrored from. group.id
A unique string that identifies the consumer group the consumers started by MirroMaker belong to. -
Create a file that contains the required properties and values for producers to
use. When you run MirrorMaker, you point to this file by using the
producer.config
parameter.Property Description metadata.broker.list
Specifies where the producers can find a one or more brokers to determine the leader for each topic. This does not need to be the full set of Brokers in your cluster but should include at least two in case the first Broker is not available. No need to worry about figuring out which Broker is the leader for the topic (and partition), the Producer knows how to connect to the Broker and ask for the meta data then connect to the correct Broker. bootstrap.servers
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The producers will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,...
. Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).producer.type
Specifies whether the messages are published asynchronously in batches or as data is received by producers. The values are async
andsync
.compression.codec
Specifies the compression codec for all messages that are generated by producers. The possible values are none
,gzip
,snappy
, andlz4
. -
Run MirrorMaker with this command to start mirroring topics from MapR Streams
to Apache Kafka:
Syntax
bin/kafka-run-class.sh kafka.tools.MirrorMaker --new.consumer --consumer.config <File that lists consumer properties and values> --num.streams <Number of consumer threads> --producer.config <File that lists producer properties and values> [--whitelist=<Java-style regular expression for specifying the topics to mirror>] [--blacklist=<Java-style regular expression for specifying the topics not to mirror>]
Parameter Description consumer.config
The path and name of the file that lists the consumer properties and their values. new.consumer
Specifies to use consumers that use the Apache Kafka 0.90 API library. num.streams
Use this parameter to specify the number of mirror consumer threads to create. Note that if you start multiple mirror maker processes then you may want to look at the distribution of partitions on the source cluster. If the number of consumption streams is too high per mirror maker process, then some of the mirroring threads will be idle by virtue of the consumer rebalancing algorithm (if they do not end up owning any partitions for consumption). producer.config
The path and name of the file that lists the producer properties and their values. whitelist
A Java-style regular expression for specifying the topics to copy. Commas (',') are interpreted as the regex-choice symbol ('|'). If you use this parameter, do not use the
blacklist
parameter.blacklist
A Java-style regular expression for specifying the topics not to copy. Commas (',') are interpreted as the regex-choice symbol ('|'). If you use this parameter, do not use the
whitelist
parameter.
Example
In this example, the file that lists the properties and values for the consumer that
will read messages from the topics in MapR Streams is named
consumers.props
. It contains this list:
streams.record.strip.streampath=true
streams.consumer.default.stream=/myStream
group.id=cg1
The file that lists the properties and values for the producers that will publish
messages to topics in Apache Kafka is named producers.props
. It
contains this list:
metadata.broker.list=10.10.89.78:9092
bootstrap.servers =10.10.83.93:9092
producer.type=sync
compression.codec=none
The topics to mirror all have names that begin with na_west
. When
running the command, we can use "na_west*"
as the regular
expression to use for the whitelist parameter.
bin/kafka-run-class.sh kafka.tools.MirrorMaker --new.consumer
--consumer.config consumers.props --num.streams 2 --producer.config producers.props
--whitelist="na_west*"