MapR Streams Concepts
- Messages are key/value pairs, where keys are optional and have a use that is described later. The values contain the data payload, which can be text, images, video files, or any other type of data.
- Topics are logical collections of messages. For example, you might have an application
that monitors the logs for mission-critical software. Your monitoring application could
send informational messages to a topic named
info, warning messages to a topic named warnings, and error messages to a topic named
errors. Different downstream applications might monitor each topic.
- Partitions, which exist within topics, are parallel, ordered, immutable sequences of
messages that are continually appended to. Topics can contain multiple partitions, which
make topics scalable by spreading the load for a topic across multiple servers.
The downstream applications that read messages can read from multiple partitions within a topic for faster performance than would be possible if they read from a single partition per topic. Downstream applications can also scale by having separate instances read from separate partitions.
Messages are assigned offsets when published to partitions. Offsets are monotonically increasing and are local to partitions. The order of messages is preserved within individual partitions, but not across partitions.
- Producers and Consumers
- Producers are data-generating applications that you create, such as sensors in
automobiles or activity loggers in servers. They produce messages and send them to a MapR Streams producer
client library. This client library buffers incoming messages and then sends them in
batches to the MapR Streams server. The
server publishes the messages to the topics that producers have specified.
Consumers are also applications that you create, such as analytics applications, reporting tools, or enterprise dashboards. They request unread messages from topics that they are interested in. A consumer client library sends unread messages, which consumers extract data from.
You can write producers and consumers in Java. MapR Streams supports the core methods of the Apache Kafka 0.90 Java API.
You can also set the values of various configuration parameters to tune how MapR Streams interacts with each producer and consumer.
- Consumers subscribe to topics. When a consumer subscribes to a topic or partition, it
means that the consumer wants to receive messages from that topic or partition. For
example, an analytics application might subscribe to the topics
rfids_productB, and more to track movement of products from factories to distribution centers. A reporting tool might subscribe to the topics
meters_SW, and more to get a report of electricity usage in different geographic regions that a power company services.
A subscription is the list of the topics that a consumer is subscribed to.
- A stream is a collection of topics that you can manage together in these ways:
- Set security policies that apply all of the topics in that stream.
- Set a default number of partitions for each new topic that is created in the stream.
- Set a time-to-live for messages in every topic in the stream. Every message in every topic in a stream expires after a duration of time, unless you set the time-to-live to 0, meaning messages never expire.
A single volume can contain multiple streams, and therefore a large number of topics.
You can replicate streams to other streams in the same or different MapR clusters. For example, you can create a backup copy of a stream for producers and consumers to fail over to if the original stream goes offline.
- MapR Streams Server and Client Libraries
- The server manages streams, topics, and partitions and handles requests from the producer client library and the consumer client library.
- Producer client library
- This client side library which is part of the producer process receives the messages that are sent by producers, buffers the messages, and sends them to the server, which then publishes the messages and sends the client acknowledgements.
- Consumer client library
- This client side library which is part of the consumer process receives requests from consumers to poll subscriptions for unread messages, reads messages from topic partitions, and sends messages to consumers.