MapR File System

The MapR Data Platform provides a unified data solution for structured data (tables) and unstructured data (files).

MapR File System (MapR-FS) is a random read-write distributed file system that allows applications to concurrently read and write directly to disk. The Hadoop Distributed File System (HDFS), by contrast, has append-only writes and can only read from closed files. Because HDFS is layered over the existing Linux file system, a greater number of input/output (I/O) operations decrease the cluster’s performance. MapR-FS also eliminates the Namenode associated with cluster failure in other Hadoop distributions, and enables special features for data management and high availability.

The storage system architecture used by MapR-FS is written in C/C++ and prevents locking contention, eliminating performance impact from Java garbage collection.

The following table highlights some of the MapR-FS features:
Feature Description
Storage pools A group of disks that MapR-FS writes data to.
Containers An abstract entity that stores files and directories in MapR-FS. A container always belongs to exactly one volume and can hold namespace information, file chunks, or table chunks for the volume the container belongs to.
CLDB A service that tracks the location of every container.
Volumes A management entity that stores and organizes containers. Used to distribute metadata, set permissions on data in the cluster, and for data backup. A volume consists of a single name container and a number of data containers.
Direct Access NFS Enables applications to read data and write data directly into the cluster.
POSIX Clients The loopbacknfs and FUSE-based POSIX clients connect to one or more MapR clusters and allow app servers, web servers, and applications to write data directly and securely to the MapR cluster.