Unique Features of the MapR Distribution

Administrators who are familiar with ordinary Apache Hadoop will appreciate the MapR distribution's real-time read/write storage layer. MapR APIs are 100% compliant with HDFS while eliminating the Namenode, which is a single point of failure. Furthermore, MapR utilizes raw disks and partitions without RAID or Logical Volume Manager, greatly improving performance. Many Hadoop installation documents discuss considerations around HDFS and Namenodes, while MapR Hadoop's solution eliminates the guesswork, making it simpler to install.

The MapR Filesystem (MapR-FS) stores data in volumes, which are logical partitions of the filesystem. Each volume is made up of one or more data containers, which hold the files associated with a volume, and a metadata container that stores information about those files. By holding metadata in a volume's container, the metadata distributes itself among all nodes in the cluster, making MapR-FS extremely scalable and resilient. The Container Location Database (CLDB) service runs across mutiple cluster nodes and provides a directory of container locations.

A process called Warden runs on all nodes to manage, monitor, and report on the services on each node. The MapR cluster uses Apache ZooKeeper to coordinate between services running across multiple nodes. ZooKeeper prevents service conflicts by enforcing a set of rules and conditions that determine which instance of each service is the master. Warden will not start any services unless ZooKeeper is reachable and more than half of the configured ZooKeeper nodes (a quorum) are live.

MapR also provides native table storage, called MapR-DB. The MapR HBase Client is used to access table data via the open-standard Apache HBase API. MapR-DB simplifies and unifies administration for both structured table data and unstructured file data on a single cluster. If you plan to use MapR-DB exclusively for structured data, then you do not need to install the Apache HBase Master or RegionServer. However, Master and RegionServer services can be deployed on an MapR cluster if your applications require them, for example, during the migration period from Apache HBase to MapR-DB. The MapR HBase Client provides access to both Apache HBase tables and MapR-DB. MapR-DB is available in MapR's Converged Community Edition and Converged Enterprise Edition with Hadoop and Database modules.

Licensing Choices

The MapR Hadoop distribution is licensed in tiers.

Table 1. License Choices
License Edition Modules Description
Converged Community Edition N/A

An unlimited, free, community-supported MapR edition with one free NFS Gateway.

This edition includes Hadoop, MapR-DB, and MapR Streams. However, real-time global replication of MapR-DB tables or MapR Streams is not included.
Converged Enterprise Edition Each of the modules in the Converged Enterprise Edition unlocks a portion of the total platform capabilities. MapR Edition that enables enterprise class features such as high availability, multi-tenancy, and disaster recovery.
Hadoop module Enables enterprise class features for analytic use cases, such as highly-available NFS and support for Hadoop services like YARN and MapReduce.
Database module Enables enterprise class features for the operational NoSQL database, such as MapR-DB JSON, binary tables, and real-time global database replication.
Streams module Enables enterprise class features for publish/subscribe event streaming, such as MapR Streams and real-time global stream replication.