Querying Messages in MapR Streams

You can develop Java applications or modify existing Java applications to query messages that are in MapR streams by using the methods in the Streams class.

Description of the Streams Class

By using the Streams class, your applications can query MapR streams directly. You do not have first to load the data from your streams into another cluster that is used only for analytics and reporting.

The methods in this class each return a DocumentStore object that contains the messages in topics for a specified stream. The DocumentStore interface is a part of the open-source OJAI API.

You can then use the DocumentStore.find() methods to query the messages that are in the DocumentStore object.

The logical schema of each message is as follows. Your analytics applications can run queries on these fields.
{
    "_id":<STRING>,
    "topic":<STRING>,
    "partition":<SHORT>,
    "offset":<LONG>,
    "timestamp":<LONG>,
    "producer":<VARCHAR>,
    "key":<BINARY>,
    "value":<VARBINARY>
}
Field Description
_id A STRING value that represents the ID of the topic in which the message is located.
topic A STRING value that represents the name of the topic in which the message is located.
partition A SHORT value that represents the index of the partition in the topic.
offset A LONG value that represents the position of the message within a partition.
timestamp A LONG value that represents the date and time at which the message was sent to the stream.
producer A VARCHAR value that represents the value of the client.id configuration parameter for the producer that published the message. MapR Streams does not require a value for this configuration parameter, so the value for this field could be empty.
key A BINARY value that represents the key of the message. MapR Streams does not require each message to have a key, so this value could be empty. The configuration parameter key.serializer for the producer that published the message specifies the means by which the key was serialized.

Your application can deserialize the key by using the appropriate deserialization class in the org.apache.kafka.common.serialization package or a class that implements the Deserializer interface.

value A VARBINARY value that represents the value of the message. The configuration parameter value.serializer for the producer that published the message specifies the means by which the value was serialized.

Your application can deserialize the value by using the appropriate deserialization class in the org.apache.kafka.common.serialization package or a class that implements the Deserializer interface.

Sample Application

mapr streamanalyzer

Links to Javadoc:

Building Applications that Use the Streams Class

You can compile and run a Java application that uses the required JAR file from the MapR Maven repository or from the MapR installation.

Using the JAR File from the MapR Maven Repository

  1. Add MapR's Maven repository to your pom.xml file, if it is not already added:
        <repositories>
        <repository>
          <id>mapr-releases</id>
          <url>http://repository.mapr.com/nexus/content/repositories/releases</url>
          <snapshots><enabled>true</enabled></snapshots>
          <releases><enabled>true</enabled></releases>
        </repository>
      </repositories>
  2. Add a dependency to the mapr-streams project:
        <dependency>
          <groupId>com.mapr.streams</groupId>
          <artifactId>mapr-streams</artifactId>
          <version>5.1.0-mapr</version>
        </dependency>

Using JARs from the MapR Installation

  • Use the following command to compile applications:
    javac -cp `mapr classpath` <Application jars>
  • Use the following command to run applications:
    java -cp `mapr classpath`:. <Main class jar> <commandline arguments>