Secondary Indexes

Beginning with MapR 6.0, MapR-DB JSON natively supports secondary indexes on fields in JSON tables. Indexes provide you with flexible, high performance access to data stored in MapR-DB.

How Do I Get Started?

The following diagram provides links to topics you need to understand and use Secondary Indexes. This includes conceptual information about indexes, how to decide what indexes to create, how to set up and use indexes, the maprcli commands used to create and maintain indexes, and how to query your data to leverage indexes. The information is organized based on roles.

Describes secondary index concepts, including use cases, types of indexes, types of queries that benefit from indexes, and how indexes are implemented. Describes the overall workflow for using secondary indexes. This includes the roles of different users and the workflow steps involved. Describes how to design secondary indexes to provide the most benefit to MapR-DB JSON queries. Describes how to manage secondary indexes including creating, deleting, and listing indexes, setting up your cluster for querying, and troubleshooting. Describes how to use the OJAI API library to query JSON tables, including special considerations related to secondary indexes. Describes how to leverage indexes when issuing SQL queries with Drill. Describes how to use the MapR-DB Shell to query JSON tables and view the contents of secondary indexes.

What are Secondary Indexes?

A secondary index (also sometimes referrred to in this documentation as an index) is a special table that stores a subset of document fields from a JSON table. The index orders its data on a set of fields, defined as the indexed fields. This is in contrast to the JSON table that orders its data on the table's primary key (rowId or rowKey). If you have administrator privileges, you can create one or more indexes on each JSON table. After the indexes are created, applications can leverage them to accelerate query response times.

Secondary indexes provide efficient access to a wider range of queries on data in MapR-DB. They allow queries to efficiently query data through fields other than the primary key. This results in MapR-DB supporting a broader set of use cases. Applications that benefit include rich, interactive business applications and user-facing analytic applications. Secondary indexes also enable Business Intelligence tools and ad-hoc queries on operational datasets. See Uses for Secondary Indexes for more information.

Important: Secondary indexes can be created only on MapR-DB JSON tables.

Why Use Secondary Indexes?

With the ever increasing amount of data stored in MapR-DB JSON, indexing that data becomes critical. Without indexes, queries unnecessarily scan large amounts of data from the underlying JSON table. Queries could potentially scan every document in the table, even if they contain conditions that limit the documents to select. Query performance suffers and resource bottlenecks are inevitable when you use this data model.

Without indexes, applications and query layers resort to limited interactivity to avoid performance concerns. Using indexes solves this limitation in application scale, by reducing the number of documents client applications read, even when querying large data sets. This reduces I/O and CPU costs, resulting in improved performance.

The functionality and benefits of indexing available in MapR-DB are similar to that of indexes in relational databases. The difference is that MapR-DB indexes provide performance benefits at high scale, in combination with JSON flexibility on the query side and simplicity on the management side.

How Can I Use Secondary Indexes?

You can leverage MapR-DB secondary indexes by using either the OJAI API or MapR Drill.

OJAI is the business application development interface on MapR-DB. Typically, business applications are characterized by ultra low latency and extremely high throughput. When you build an application using OJAI, filtering and sorting through the API can leverage secondary indexes to accelerate query response times.

Drill is the analytics SQL interface on MapR-DB. Drill is a distributed SQL query engine that provides interactive response time for operational analytics, Business Intelligence (BI) tools like Tableau, and ad-hoc queries on MapR-DB. With MapR Drill, SQL queries can also leverage secondary indexes to accelerate query response times.

Regardless of whether queries originate from OJAI or Drill SQL, each interface seamlessly selects the optimal indexes to use. You do not need to write explicit code or provide directives on which indexes to use. If an appropriate index exists for a query, MapR-DB leverages the index.

For more information about the OJAI API, see the OJAI API Library.

For information about MapR Drill, see Apache Drill on MapR.

Additional Resources