Architecture

MapR-DB is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. It is used to add real-time, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and wide column data models.

MapR-DB is implemented as a log-based database within the framework of MapR-FS. As a log-based database, periodically, logged operations must be written to disk. In MapR-DB, table regions (also called tablets) and smaller structures within them are stored partially as b-trees which together with write-ahead log (WAL) files comprise log-structured-merge trees. Write-ahead logs for the smaller structures within regions are periodically restructured by rolling merge operations on the b-trees. Because MapR-DB performs these merges at small scales, applications running against MapR-DB see no significant effects on latency while the merges are taking place.
Note: Apache HBase also uses the term regions. Apache HBase is deprecated.

Why use MapR-DB?

  • Integrated analytics with SQL: Integration with Drill for MapR provides a low-latency distributed SQL-like query engine for large-scale datasets, including structured and semi-structured/nested data.
  • Operational analytics: MapR-DB can be run in the same cluster as Apache™ Hadoop® and Apache Spark, letting you immediately analyze or process live, interactive data. Data silos can also be eliminated to speed the data-to-action cycle, while also enabling a more efficient data architecture.
  • Global distribution of applications: Application access to MapR-DB tables is distributable on a global scale.
  • Flexible data model: MapR-DB can be used as both a document database and a wide-column database. As a document database, JSON documents are stored in MapR-DB JSON tables. As a wide-column database, binary files are in stored MapR-DB binary tables.

How is MapR-DB related to MapR-FS?

MapR-DB tables are implemented directly in the MapR file system (MapR-FS) where Mapr-DB tables (both binary and JSON tables) are created in logical units called volumes.

What Design Factors are Important?

  • Rowkey Optimization: The design of a table's rowkeys affects the speed at which client applications can access data and the database performance if hotspotting occurs. The better the design, the faster the data access. See Table Rowkey Design for more information.
  • Column Family Optimization: Column families enable you to group related sets of data and restrict queries to a defined subset, leading to better performance. When you design a column family, think about what kinds of queries are going to be used the most often, and group your columns accordingly. See Column Families in JSON Tables and Column Families in Binary Tables for more information.
  • Replication Implementation: The design of table replication (in addition to the automatic replication that occures with table regions within a volume) depends on your desired outcome and the complexity of your environment. See Table Replication for more implementation.
  • Security Implementation: Security can be implemented at many different levels including for table replication, JSON documents, and general access. Determining what level and where is part of the architectural design. See Security on JSON Tables, Security and Replication, and Security.