Iceberg Support

Describes support for Iceberg in HPE Ezmeral Data Fabric 7.6.x.

Apache Iceberg

Apache Iceberg is an open-source table format that helps to simplify the data processing of huge data sets on a file system or object store. Iceberg brings the simplicity of SQL tables to huge data sets.

Iceberg has the following capabilities:
  • Iceberg tables are fast, safe, scalable, and can easily integrate with analytics engines like Spark, PrestoDB, Hive, and so on.
  • Iceberg supports Atomicity, Consistency, Isolation, and Durability (ACID) transactions.
  • You can use analytics engines like Spark, PrestoDB, Hive, and Impala to safely perform ACID transactions on the same table at the same time.
  • Iceberg supports schema evolution, hidden partitioning, partition layout evolution, and time travel, which minimize unpleasant surprises.

For details, see the Apache Iceberg documentation.

Data Fabric and Iceberg

Starting from Data Fabric 7.6.x, you can perform the following operations in the HPE Ezmeral Data Fabric Object Store:
  • Create a schema for Avro, ORC, or Parquet data types, and modify the schema if needed.
  • Create Iceberg tables using a specific schema and perform ACID transactions.
  • Create a snapshot of a table to check time travel.
  • Grant access permissions for an Iceberg table to different users.
  • Perform data migration of data files into an Iceberg table, as well as migrate the metadata.
  • Query an Iceberg table through Apache Spark.
  • Create an Iceberg table in an external S3 bucket and query it through the HPE Ezmeral Data Fabric Object Store.

With these features, you can build a reliable and scalable Data-Lakehouse architecture.