Spark 3.3.1.0 - 2301 (EEP 9.1.0) Release Notes

This section provides reference information, including new features, patches, and known issues for Spark 3.3.1.0.

The notes below relate specifically to the Hewlett Packard Enterprise Distribution for Apache Hadoop. For more information, you may also want to consult the open-source Spark 3.3.1 Release Notes.

These release notes contain only Hewlett Packard Enterprise specific information and are not necessarily cumulative in nature. For information about how to use the release notes, see Ecosystem Component Release Notes.

Spark Version 3.3.1.0
Release Date January 2023
HPE Version Interoperability See Component Versions for Released EEPs and EEP Components and OS Support.
Source on GitHub https://github.com/mapr/spark
GitHub Release Tag 3.3.1.0-eep-2301
Maven Artifacts https://repository.mapr.com/maven/
Package Names Navigate to https://package.ezmeral.hpe.com/releases/MEP/ and select your EEP and OS to view the list of package names.

Hive Support

  • Starting from Spark 3.1.2, Spark supports Hive 2.3.

Delta Lake Support

Spark 3.2.0 and later provides Delta Lake support on HPE Ezmeral Data Fabric. See Apache Spark Feature Support.

New in This Release

  • For a complete list of new features, see the open-source Spark 3.3.1 Release Notes.
    • Updated Spark to version 3.3.1.0.
    • CVE fixes.
    • Bug fixes.

Fixes

This HPE release includes the following new fixes since the latest Spark release. For details, refer to the commit log for this project in GitHub.

GitHub Commit Date (YYYY-MM-DD) Comment
176b5a2 2022/11/01 MapR [SPARK-1124] Text4Shell - CVE-2022-42889
e9c3f35 2022/11/01 MapR [SPARK-1116] manageSSLKeys.sh script uses hard-coded path '/home'
0a5b234 2022/11/01 MapR [SPARK-1115] Remove code duplicates in configure.sh
cc287bf 2022/11/01 MapR [SPARK-1094] Spark worker is started on unsecured port 8481 on slave node
c5bf37f 2022/11/01 MapR [SPARK-1105] Connection to STS fails on cluster with FIPS
cd1209d 2022/11/01 MapR [SPARK-1106] Regulate dependencies in dep-blacklist.txt via configure.sh
1dd1e3d 2022/11/01 MapR [SPARK-1108] Parallel jobs running causes errors with manageSSLKeys.sh
8a383e9 2022/11/01 MapR [SPARK-1103] Excessive logs for spark beeline
fb2d3f6 2022/11/01 MapR [SPARK-1097] Parallel jobs running under non mapr user causes errors with manageSSLKeys.sh
037d777 2022/11/01 MapR [SPARK-1087] Spark default log is info
db53510 2022/11/02

MapR [SPARK-1127] Backport Spark-3.3.1 to EEP

553b19e 2022/12/05 MapR [SPARK-1131] Update protobuf-java version to 3.21.9
458370e 2022/12/15 MapR [SPARK-988] Check log4j versions for Spark Simba ODBC andJDBC Drivers
93a0483 2022/12/18 MapR [SPARK-1134] Update Spark in EEP 9.1.0 to OJAI 3.2.0
086d91b 2022/12/23 MapR [SPARK-1137] SPARK-1081 fix for EEP-9.1.0
adf391c 2023/01/04 MapR [SPARK-1139] Update Spark in EEP 9.1.0 to Antlr Runtime version 4.9.3

Known Issues and Limitations

  • When you enable the SSL in a mixed (FIPS and non-FIPS) configuration, Spark application run fails. To run Spark applications, set spark.ssl.ui.enabled option to false in spark-defaults.conf configuration file.

  • If you are using Spark SQL with Derby database without Hive or Hive Metastore installation, you will see the Java Runtime Exception. See Apache Spark Feature Support for workaround. Spark does not support log4j1.2 logging on HPE Ezmeral Data Fabric.

  • SPARK-1099: Non-mapr user is unable to insert values into Hive table by using Spark Thrift Server
    Symptoms:
    Navigate to Spark Beeline as a non-mapr user and connect to Spark Thrift Server.
    !connect jdbc:hive2://<node1.cluster.com>:2304/default;ssl=true;auth=maprsasl
    Create a table:
    CREATE TABLE nonmaprctastest2 (key int);
    insert into table nonmaprctastest2 values 1, 2, 3;
    The following error occurs:
    Caused by: java.lang.RuntimeException: Cannot create staging directory: 'maprfs:/user/hive/warehouse/nonmaprctastest2/.hive-staging_hive_2022-08-23_11-38-31_177_3217175113512758641-4': User mapruser1(user id 5001) has been denied access to create .hive-staging_hive_2022-08-23_11-38-31_177_3217175113512758641-4
    Cause:
    In Hive 2.x, permissions for all the tables in maprfs:///user/hive/warehouse/ directory are set to 777. However, in Hive 3.x, permissions for table directories are set to 755. In EEP, Spark Thrift Server creates the table as a user who started the Spark Thrift Server. When Hive 3.x changes the user to the user who did not start he Spark Thrift Server, the user can no longer make write operation with tables.
    Workaround:
    You can choose one of the following workarounds:
    • After creating the Hive table, set permissions to 777 in maprfs:///user/hive/warehouse directory.
    • After creating the Hive table, set owner to the user who created the Hive table.
    • Use HiveServer2 instead of Spark Thrift Server which uses impersonation.

Resolved Issues

  • None.