Getting Started with Iceberg

Summarizes what you need to know to begin using Iceberg with HPE Ezmeral Data Fabric release 7.6.x.

Version Support

HPE Ezmeral Data Fabric 7.6.x has been tested with:

Other data-processing engines, such as open-source Spark, PrestoDB, Flink, and data-processing technologies, such as Snowflake, have not been tested.

Catalog Support

Catalogs manage the metadata for datasets and tables in Iceberg. You must specify the catalog when interacting with Iceberg tables through Spark. The following built-in catalogs have been tested for use with Data Fabric 7.6.x:
  • HiveCatalog
  • HadoopCatalog

Spark Setup for Iceberg

Setting up Spark to use Iceberg is a two-step process:
  1. Add the org.apache.iceberg:iceberg-spark-runtime-<spark.version>_<scala.version>:<iceberg.version> jar file to your application classpath. Add the runtime to the jars folder in your spark directory. Add it directly to the application classpath by using the --package or --jars option.
  2. Configure a catalog. For information about using catalogs with Iceberg, see Catalogs.

For examples, see the Spark and Iceberg Quickstart.

Configuring Your Spark Application

Consider adding the following parameters to your Spark application:
spark.sql.catalog.<catalog_name>.type=hive
spark.sql.catalog.<catalog_name>.warehouse=<path_to_your_warehouse>
spark.sql.catalog.<catalog_name>=org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.legacy.pathOptionBehavior.enabled=true