What's New in Version 5.1.0

MapR Streams

MapR Streams brings integrated publish/subscribe messaging to the MapR platform.

Producer applications can publish messages to topics, which are logical collections of messages, that are managed by MapR Streams. Consumer applications can then read those messages at their own pace. All messages published to MapR Streams are persisted, allowing future consumers to “catch-up” on processing, and analytics applications to process historical data.

In addition to reliably delivering messages to applications within a single data center, MapR Streams can continuously replicate data between multiple clusters, delivering messages globally. Like other MapR services, MapR Streams has a distributed, scale-out design, allowing it to scale to billions of messages per second, millions of topics, and millions of producer and consumer applications.

Topics in MapR Streams are grouped into streams, which administrators can apply security, retention, and replication policies to. Combined with MapR-FS and MapR-DB in the MapR platform, streams allow organizations to create a centralized, secure, data lake that unifies files, database tables, and message topics.

MapR Streams is ideal for a variety of use cases, including:

Application event pipelines: Many types of applications generate event or log data that needs to be centrally stored and analyzed to gain insights about user activity or application performance. MapR Streams simplifies these pipelines by transporting events to a central location where they can undergo event-by-event transformation and analysis.
Database change capture: Most modern databases allow users to generate an event each time an entry is added or modified. These events can be produced to MapR Streams to keep systems like search indices and caches synchronized, as well as feed security or notification applications.
Internet of Things: The explosion in the number of smart devices and sensors has created many situations in which billions of data points are created by millions of geographically dispersed sensors. MapR Streams provides a reliable, global transport for these messages, allowing analytics to be done both at the source and at a central location.

MapR Streams also integrates with open source ecosystem projects. You can use Apache Flume, Spark, and Storm to produce, process, and consume MapR Streams messages. However, integrating MapR Streams with Spark is currently a beta feature.

Document Support in MapR-DB

MapR-DB now supports JSON documents as a native data store.

With this support, you can:

Store data that is hierarchical and nested, and that evolves over time.
Read and write individual document fields, subsets of fields, or whole documents from and to disk. To update individual fields or subsets of fields, there is no need to read entire documents, modify them, and then write the modified documents to disk.
Build Java applications with the MapR-DB JSON API, MapR's Java API for working with JSON documents in MapR-DB, and the Open JSON Application Interface (OJAI), an open-source Java API for easily managing complex, evolving, hierarchical JSON data. These two APIs together let you use many more data types than the standard types that JSON supports, make it easy to create complex queries, and require no connection or configuration objects for accessing JSON tables.
Filter query results within MapR-DB before results are returned to client applications.
Run client applications on Linux, OS X, and Windows systems.
Perform complex data analysis on your JSON data with Apache Drill or other analytical tools in real time without having to copy data to another cluster.
Scale your data to span thousands of nodes.
Control read and write access to single fields and subsets of fields within a JSON table by using access-control expressions (ACEs).
Control the disk layout of single fields and subdocuments within JSON tables.

Security

Version 5.1 introduces several security features:

File Access Control Expressions (ACEs)

MapR now supports the use of Boolean expressions when setting permissions on files, directories, and whole volumes. These Access Control Expressions (ACEs) overcome inherent limitations of POSIX-mode bits by allowing a much higher level of expressiveness when granting file permissions to users, groups, and roles. Additionally, whole-volume ACEs offer administrators a volume-level "filter" over file and directory permissions, ensuring that the data in a given volume is only accessible to specific individuals or groups of individuals. Whole volume ACEs will be especially relevant in multi-tenant environments.

Impersonation C APIs for File System

MapR-FS now supports impersonation through the following native C APIs:

`hdfsConnectAsUser()`	Connect to a MapR-FS cluster as specified user.
`hdfsConnectNewInstance()`	Connect to a MapR-FS cluster to get a new instance of the filesystem handle.
`hdfsConnectAsUserNewInstance()`	Connect to a MapR-FS cluster as the specified user to get a new instance of the filesystem handle.

For more information, see MapR-FS C APIs.

Improved Auditing Capabilities for MapR-FS and DB Operations

MapR now supports selective auditing of certain filesystem and table operations, allowing users to include or exclude those operations explicitly from the cluster’s audit logs. The list of operations that are supported for selective auditing are listed here and documentation for including and/or excluding operations from auditing is available here.

FUSE-Based POSIX Client

The new MapR FUSE-based POSIX client runs as a userspace process to connect to one or more MapR clusters and allow app servers, web servers, and applications to read and write data directly and securely to the MapR clusters like a Linux filesystem. The POSIX client is now available in two performance tiers: basic and platinum.

Open JSON Application Interface (OJAI™)

OJAI is a general-purpose data access layer that sits on databases, file systems, and message streams. OJAI defines classes and interfaces for storing and accessing structured, semi-structured, and unstructured data. An OJAI implementation then specifies how the data is stored. The OJAI library provides a sample implementation.

The OJAI API is designed for scalable processing of big data, such as machine learning.

Myriad

MapR 5.1.0 includes support for Apache Myriad. Myriad enables the co-existence of Apache Hadoop and Apache Mesos on the same physical infrastructure. By running Hadoop YARN as a Mesos framework, YARN applications and Mesos frameworks can run side-by-side while dynamically sharing cluster resources. Resource allocation and management of Mesos services is supported by MapR Warden. The configure.sh script provides additional options for configuring Myriad.

MapR Myriad is based on Apache Myriad version 0.1. See the Apache Myriad open source documentation for more information.

Performance Enhancements for SSD-Based MapR-DB Deployments

For DB workloads on high-end servers, MapR has made several enhancements to increase its already industry-leading performance. This feature is automatically enabled with a fresh install on any servers with SSD devices configured for MapR storage.

Optimized Failover

MapR runs a wide variety of concurrent applications in a highly available fashion and in the event of a node failure, MapR components detect and automatically recover from the failure while activities on other nodes in the cluster continue normally. With this release, for clients that experience latency during the recovery process, MapR has made several improvements to reduce latency and allow fast failover.

YARN Enhancements

MapR 5.1.0 runs the 2.7.0 version of Hadoop plus some additional, critical bug fixes that became available after Hadoop 2.7.0 was released.

Support for New Operating Systems

MapR 5.1.0 supports RHEL/CentOS 7.1 and Oracle Linux.