MapR 5.1 is at End of Life (EOL) and no longer supported. Please see the latest documentation. This documentation is not being updated.

Home
5.1 DB & Streams
This section covers the MapR-DB, MapR-Streams, and MapR-Gateways features. Each feature may contain both administrator and developer topics.
MapR-DB
Architecture of MapR-DB
MapR-DB's architecture gives it a large of advantages over other NoSQL databases.
MapR-DB operates directly on the file system
MapR-DB tables are implemented directly in the MapR file system (MapR-FS).

MapR 5.1 Documentation

5.1 DB & Streams
This section covers the MapR-DB, MapR-Streams, and MapR-Gateways features. Each feature may contain both administrator and developer topics.
- MapR-DB
  - Architecture of MapR-DB
    MapR-DB's architecture gives it a large of advantages over other NoSQL databases.
    - MapR-DB operates directly on the file system
      MapR-DB tables are implemented directly in the MapR file system (MapR-FS).
    - Tablets are stored in containers in MapR-FS
    - Containers are stored in volumes in MapR-FS
      MapR provides volumes as a way to organize data and manage cluster performance. A volume is a logical unit that allows you to apply policies to a set of files, directories, and tables. Volumes are used to enforce disk usage limits, set replication levels, define snapshots and mirrors, and establish ownership and accountability.
  - Performance Enhancements for SSD Based MapR-DB Deployments
  - MapR-DB as a Document Database
    MapR-DB supports OJAI documents as a native data store. OJAI documents are stored in a compact binary format, not as plain ASCII text.
  - MapR-DB as a Wide-Column Database
- MapR Streams
  MapR Streams brings integrated publish/subscribe messaging to the MapR Converged Data Platform.
- MapR Gateways
  A MapR gateway mediates one-way communication between a source MapR cluster and a destination cluster. MapR-DB binary tables and MapR Streams streams can be replicated.

MapR-DB operates directly on the file system

MapR-DB tables are implemented directly in the MapR file system (MapR-FS).

One of the resulting advantages is that MapR-DB has no layers to pass through when performing operations on data. MapR-DB runs inside of the MFS process, which reads from and writes to disks directly. In contrast, Apache HBase running on the Hadoop file system (HDFS) must communicate with the HDFS process, which in turn must communicate with the ext3 file system, which itself ultimately writes data to disks. The approach taken by MapR-DB eliminates such process hops, duplicate caching, and needless abstractions, with the consequence of optimizing I/O operations on your data.

Another advantage is the absence of compaction delays that arise due to I/O storms as logged operations are merged with structures on disk. MapR-DB, like several other NoSQL databases, is a log-based database. Periodically, logged operations must be written to disk. In MapR-DB, tablets (called regions in Apache HBase) and smaller structures within them are stored partially as b-trees which together with write-ahead log (WAL) files comprise log-structured-merge trees. Write-ahead logs for the smaller structures within tablets are periodically restructured by rolling merge operations on the b-trees. Because MapR-DB performs these merges at small scales, applications running against MapR-DB see no significant effects on latency while the merges are taking place.