What's New in EEP 3.0

Provides a summary of the new functionality in EEP 3.0.

EEP 3.0 provides a series of stability and security fixes for Spark and improves the speed of ETL and batch processing with a faster version of Hive.

New Features and Additions

HPE Ezmeral Data Fabric Database OJAI Connector for Apache Spark
The HPE Ezmeral Data Fabric Database OJAI Connector for Apache Spark is a new API that makes it easier to build real-time or batch pipelines between your data and HPE Ezmeral Data Fabric Database and leverage Spark within the pipeline. This feature includes:
  • Two new APIs that allow you to load data from a HPE Ezmeral Data Fabric Database JSON table to a Spark RDD or save a Spark RDD to a HPE Ezmeral Data Fabric Database JSON table
  • A custom partitioner that allows you to partition data for better performance
  • Data locality: when the connector reads data from HPE Ezmeral Data Fabric Database, it uses the data locality feature of HPE Ezmeral Data Fabric Database to spawn the Spark executors

For more information, see Understanding the HPE Ezmeral Data Fabric Database OJAI Connector for Spark.

HPE Ezmeral Data Fabric Database Binary Connector for Apache Spark
The new HPE Ezmeral Data Fabric Database Binary Connector for Apache Spark allows you to write applications that consume HBase binary tables and use them in Spark. Features include:
  • Writing directly to HBase HFiles for bulk insertion into HBase
  • Spark SQL can draw on tables that are represented in HBase

For more information, see HPE Ezmeral Data Fabric Database Binary Connector for Apache Spark.

HPE Ezmeral Data Fabric Streams C Applications (librdkafka)
As of MapR maintenance release 5.2.1, you can develop C applications for HPE Ezmeral Data Fabric Streams. The HPE Ezmeral Data Fabric Streams C Client is a distribution of librdkafka that integrates with MapR Streams.

For more information, see HPE Ezmeral Data Fabric Streams C Applications.

HPE Ezmeral Data Fabric Streams Python Applications
As of MapR 5.2.1, you can create Python applications for HPE Ezmeral Data Fabric Streams using the MapR Streams Python client. The Streams Python client is a binding for librdkafka and contains support for high-level consumers.

For more information, see HPE Ezmeral Data Fabric Streams Python Applications.

Key Upgrades

Apache Spark 2.1.0
Spark 2.1 in the MapR converged data platform brings improvements in enterprise-ready stability and security, including:
  • More than 1200 fixes on the Spark 2.x line
  • MapR-SASL support for encrypted Thrift-server connections
  • Scalable partition handling
  • Stable data-type APIs

For more information, see Apache Spark Feature Support.

Apache Hive 2.1.1
EEP 3.0 provides a faster version of Hive to improve the speed of data-processing tasks, to reduce latency for interactive queries, and to increase throughput for batch queries. Key improvements include:
  • 2x faster ETL through an enhanced cost-based optimizer (CBO), faster type conversions, and dynamic partition pruning
  • New HiveServer UI with new diagnostics and monitoring tools
  • Dynamically partitioned hash joins, which provide unsorted inputs in order to eliminate the sorting step.
  • Vectorized query execution that greatly reduces the CPU usage for typical query operations, like scans, filters, aggregates, and joins

For more information, see Hive.

Apache Drill 1.10
Continuing on the iterative releases, Drill 1.10 is another important milestone for Apache Drill. Numerous enhancements have been added to this release for BI tool integration, end-to-end security, performance, and usability enhancements. Highlights of this release include:
  • Tableau native connectivity
  • Support for Kerberos and MapR-SASL authentication between the client and Drillbit
  • Support for the CREATE TEMPORARY TABLE AS (CTTAS) command
  • Ability to query data with Hue 3.12 (experimental only)
  • Improved compatibility with Hive/Spark-generated Parquet files

For more information, see the Drill Introduction.