What's New in Version 6.0.0

The 6.0.0 MapR release supplies new features for all of the components of the converged data platform as well as expanding the MapR Monitoring tool. In addition, installing the MapR converged data platform has been simplified.

Powerful Data Access on MapR-DB with Native Secondary Indexes.

MapR-DB has new capabilities and performance improvements in this release. You can use the MapR-DB JSON document database to enable powerful data access for either operational or analytics applications. You can develop new types of applications that support complex user-interaction patterns. If you are a business user, you can perform optimized and high-performance SQL queries using familiar business intelligence and anlytics tools.

See this page to start using Secondary Indexes

Here is a summary of the key features added to MapR-DB for this release.

Native Secondary indexes support for MapR-DB JSON tables.

  • Secondary-index support is built into MapR-DB. No external indexing system necessary. An administrator, or someone with administrator priviledges, can create one or more indexes on each JSON table with a unique field. Once created, you can leverage indexes via OJAI APIs for application development or via Drill for BI/Analytics purposes to improve query performance.
    • Support for single field and composite indexes
    • Support for covering, non-covering and hashed indexes
    • Standard data types support for secondary indexes. This includes NULL, BOOLEAN,STRING, BYTE, SHORT,INT,LONG, FLOAT,DOUBLE, DATE, TIME, TIMESTAMP, BINARY
    • Applicability of all enterprise capabilities of MapR such as HA, Snapshots, Security to secondary indexes.
    • Global and Automatic index synchronization without application developer intervention and/or additional coding
  • Application-level consistency via "Read Your Own Write" support allows your application to see the change it makes immediately after taking effect on the index table
  • OJAI API and MapR Drill/SQL integrations that leverage indexes to increase query performanc

Self Service Operational Analytics with Optimized MapR Drill and SQL Integration with MapR-DB

  • MapR Drill SQL queries will seamlessly leverage MapR-DB secondary indexes to signicantly improve query performance.
  • MapR Drill SQL queries will avoid large scans when they are unnecessary.
  • MapR-DB provides a statistics and cost-based optimizer for index selection. It supports index intersection and a variety of other optimizations.
  • MapR-DB provides support for filter and sorting operators.

Rich Application Development with Apache OJAI enhancements

  • New Sort and Limit operators are available in Apache OJAI APIs
  • Obtain high performance with Apache OJAI API-based queries by leveraging MapR-DB secondary indexes
  • MapR-DB provides index support for Filter and Sorting operators

Change Data Capture API on MapR-DB

Built on the foundation of global-table-replication features, the MapR-DB Change Data Capture (CDC) API provides a powerful and easy-to-use interface to support real-time integration of changes arriving at a MapR-DB table to arbitrary, external systems. You can now build an application to consume and process the MapR-DB table data changes published as ‘change log’ streams in real time in a highly scalable way.

See this page to get started using Change Data Capture

CDC enables you to accomplish work such as:

  • Track changes happening to the MapR-DB (Inserts, Updates, Deletes) and perform real time processing on the data
  • Synchronize data in MapR-DB with a downstream search index (Such as Elasticsearch, Solr), materialized views or in-memory caches.

The CDC feature includes the following capabilities

  • You can use ordered and “At-least-once” delivery to send a change data record stream to each registered change log.
  • The publish/Subscribe model supports the pull model of change-data-propagation
  • An arbitrary external system can consume changes in MapR-DB tables globally
  • You can use CDC with both MapR-DB Binary and JSON tables.

Redesigned administration platform with expanded functionality for managing and monitoring your data.

  • New Cluster Overview Dashboard - Get a single-pane-of-glass view of your cluster environment to determine cluster utilization, node and service health, events that need attention.
  • Enhanced Alarm Management - Handle alarms efficiently with new categorization, easy access to information, and selective notifications.
  • Intuitive, easy data administration that includes a new Streams administrative interface - Navigate and manage all of your volume, table, and stream data elements in a consistent way. Each now has a dedicated interface.
  • Integration with Spyglass - Analyze metrics and log information from Spyglass right within MCS.

Replica Autosetup with Directcopy

The replica autosetup feature for tables and streams has a new, default directcopy option. Replica autosetup with directcopy uses gateways to perform all replication setup steps including the initial population of data into the replica table or replica stream.

Replica autosetup with directcopy provides the following benefits:
  • Replica autosetup operations do not block the client from submitting additional requests. When setting up replication, the process to copy source data to the replica can be time consuming. With directcopy, the client does not need to wait for the replica autosetup request to complete before submitting another request.
  • Source cluster retries replica autosetup operations in case of failure. The source cluster keeps track of the replica autosetup progress. This allows the source cluster to resume autosetup operations in the event of an intermittent failure. If you choose to not use directcopy, user intervention is required if a failure occurs.
  • Throttling of copy table or copy stream operations is done by default. Throttling prevents the initial copy of data from the source to the replica table from consuming all cluster resources.

Replica autosetup uses the directcopy option by default when you choose to automatically set up replication in the MCS and when you use the maprcli to automatically setup replication.

Replica Autosetup for MapR-DB Tables

Replica Autosetup for Streams

Security Enhancements

Simplified Security. In MapR 6.0, it is easier to enable security. You can use the simple "Enable Security" check box in the installer to enable security for the core platform and the installed ecosystem components. Alternatively, you can use the configure.sh -secure command to enable security for the core and the ecosystem components.

Built-In Security

Cross-Cluster Security Script. The configure-crosscluster.sh utility, which you can use to set up cross-cluster security between two clusters, has been enhanced to configure clusters for secure communication.

Configuring Cross Cluster Security

Global Cloud Data Fabric Enablement

Cloud-Scale Multitenancy, OpenStack Manila Plugin, and File Migration Services are built into MapR Converged Data Platform 6.0.

To support hosting multiple organizations on the same data platform while ensuring intra-organization privacy and intra-organization policies, you can use the multitenancy and OpenStack Manila features.

Multitenancy for MFS

OpenStack Manila

Real-time, automatic movement of all types of files from edge to cloud is available using the File Migration Service. The service deploys to each edge site, watches for new files, and immediately transfers them to the cloud. It's intelligent use of MapR metadata service ensures reliability and performance.

Migrating Files from MapR Edge Cluster to S3

MapR-FS Enhancements

The following enhancements are available in MapR-FS:

Enhancements to MapR-Drill

There are multiple new features for MapR-Drill. See the Drill release notes for details.

New Terminology: MapR Expansion Pack

MapR Expansion Pack (MEP) replaces all instances of MapR Ecosystem Pack (MEP) in MapR 5.2.x and 6.0 documentation. MEPs now contain more than just Hadoop Ecosystem components. They can contain connectors and developer APIs that provide common Hadoop Ecosystem interfaces (for example, Kafka Connect) to MapR components.

Some MapR user interfaces continue to use the term MapR Ecosystem Pack. These interfaces will be updated eventually.