Open Table Formats

Open Table Formats - A Comparative Study

I would like to present a series of concise comparisons regarding the internals of different table formats. While I have offered detailed insights into each format, I believe it's valuable to highlight both their commonalities and distinctions. These comparisons will emphasize objective facts about their functionalities, steering clear of personal judgments or opinions. My aim is to foster a constructive dialogue rather than engage in debates about which table format may be preferable.

Disclaimer: I understand that the debate over Open Table formats can be quite contentious. My intention with any discussion about them is to remain objective and help foster a deeper understanding of the technology involved. I truly appreciate the passion behind these discussions!

Last Updated: April 2026 — Reflects Apache Iceberg 1.10.x (Spec v2/v3), Delta Lake 4.1.x, Apache Hudi 1.1.x, and Apache Paimon 1.4.x.

Open Lakehouse Architecture

Definition: A data lakehouse combines the best elements of data lakes and data warehouses, enabling both cost-effective storage and efficient analytics. It offers a unified platform for various workloads like SQL analytics, data science, and machine learning.
Key Characteristics:ACID Transactions: Supports reliable data modifications, ensuring data consistency.
Schema Evolution: Allows flexible schema changes to adapt to evolving data requirements.
Unified Governance: Provides tools for managing data access, security, and auditing.
Scalable Storage: Leverages cost-effective storage solutions like cloud object stores.
Open Formats: Uses open file formats like Parquet and ORC for data storage, avoiding vendor lock-in.
Support for Streaming Data: Enables real-time data ingestion and processing.
BI Support: Supports business intelligence (BI) and reporting tools.
Interoperability: Modern lakehouse formats increasingly support cross-format compatibility through standards like UniForm (Delta Lake), Iceberg REST Catalog, and Paimon's Iceberg metadata compatibility layer.

Core Components of Table Formats

Data Files: Store the actual data in columnar formats like Parquet, ORC, or Avro.
Metadata: Data that provides information about data files and their lineage and structure.
Table Metadata: Contains table schema, partitioning information, and other table-level properties.
Commit Logs: Transaction logs record changes to the table, enabling features like time travel and data versioning.
Snapshots: Represent a consistent view of the table at a specific point in time.
Indexes: Enhance query performance by providing optimized data access paths (optional).
Deletion Vectors: Compact row-level delete markers (bitmaps) that avoid rewriting entire data files, now supported across Iceberg (v3), Delta Lake (3.0+), and Paimon.
Statistics Files: Columnar statistics and bloom filters stored alongside data (e.g., Iceberg Puffin files, Hudi metadata table) for accelerated query planning.

Key Features and Benefits of Open Lakehouse Table Formats

ACID Transactions: Ensure data consistency during concurrent reads and writes, preventing data corruption.
Time Travel: Enables querying historical versions of the table for auditing, debugging, and data recovery.
Schema Evolution: Simplifies schema changes without requiring full table rewrites, adapting to evolving data structures.
Partitioning: Improves query performance by dividing data into smaller, more manageable segments based on specified columns.
Hidden Partitioning: Iceberg and Paimon support partition transforms (e.g., bucket, truncate, year, month) that decouple physical layout from SQL queries, eliminating the need for users to include partition columns in WHERE clauses.
Data Compaction: Optimizes data storage by merging small files into larger ones, improving read performance.
Data Skipping: Reduces I/O by skipping irrelevant data files based on metadata information.
Data Versioning: Provides a history of table changes, supporting rollback and reproducibility.
Branching: Lets you create lightweight, isolated branches of a table without duplicating underlying data.
Tagging: Easily manage versions, collaborate, and maintain data consistency in large-scale environments.
Liquid Clustering: Delta Lake's approach to adaptive, mutable data clustering that replaces static Hive-style partitioning with flexible CLUSTER BY semantics.
Incremental Clustering: Paimon's z-order/hilbert/order sorting during compaction to optimize data layout at low cost.
Deletion Vectors: Row-level soft deletes stored as compact bitmaps, enabling fast DELETE/UPDATE operations without full file rewrites (Delta Lake 3.0+, Iceberg v3, Paimon MOW mode).

Comparative Analysis of Table Formats

Apache Iceberg (Latest: 1.10.1, Spec v2 stable / v3 progressive)

Key Concepts:
Catalog Integration: Supports multiple catalogs including Hive, Hadoop, Iceberg REST Catalog (OpenAPI standard), Apache Polaris (incubating), Project Nessie, Apache Gravitino, Unity Catalog, and cloud-managed catalogs.
Iceberg REST Catalog: An OpenAPI-based standard that has become the de facto catalog interoperability layer, enabling cross-engine and cross-platform catalog federation.
Partitioning: Supports hidden partitioning with transforms (identity, bucket, truncate, year, month, day, hour) that decouple physical layout from queries.
Table Evolution: Supports adding, dropping, and renaming columns without data migration. v3 adds default column values and type widening.
Data Management: Includes features for data cleanup, compaction, consistency, and expire snapshots.
Branches and Tags: Branching provides lightweight, isolated environments for data manipulation. Tags provide named snapshot references for versioning and data management.
Views: SQL view support for defining logical views over Iceberg tables (added in 1.5.x).
Spec v2 (Stable): Row-level deletes via position-delete files and equality-delete files.
Spec v3 (Progressive): Extended types and capabilities including nanosecond timestamps (timestamp_ns, timestamptz_ns), multi-argument transforms, default column values, variant type, geometry/geography types, deletion vectors, and row lineage tracking.
Puffin Statistics Files: Column-level statistics stored in a compact binary format for accelerated scan planning.
Object Store File Layout: Optimized file layout for cloud object stores that avoids deep directory hierarchies.
Table Path Re-Write: Introduced in 1.8.0, enables migration of tables between storage locations (e.g., HDFS to S3) while preserving full history and metadata.
File Organization:
Metadata Location: Stores table metadata in a metadata directory, which contains files like metadata.json, snap-*.avro, and manifest-*.avro.
Data Files: Data is stored in Parquet, ORC, or Avro files within partition directories.
Manifest Files: Track the data files belonging to each snapshot.
Delete Files: Position-delete and equality-delete files (v2), deletion vectors (v3).
Puffin Files: Statistics files for bloom filters and column-level NDV stats.

Apache Hudi (Latest: 1.1.1)

Key Concepts:
Table Types:
- Copy-on-Write (CoW): Data is immutable; updates rewrite entire data files. Simpler, more efficient for read-heavy workloads.
- Merge-on-Read (MoR): Updates are written as delta logs and merged during reads. More efficient for write-heavy workloads.
Data Organization: Hudi organizes data into partitions, which contain data files. Metadata is stored in the .hoodie directory.
Timeline: Hudi's timeline tracks all actions on the table, enabling time travel and incremental processing.
Record-Level Index (RLI): O(1) record lookups by maintaining a per-record to file-group mapping in the metadata table, eliminating full partition scans (1.0+).
Non-Blocking Concurrency Control (NBCC): Writers no longer block each other; uses optimistic concurrency with conflict detection on the timeline, replacing file-level locking (1.0+).
Functional Indexes: Indexes derived from column expressions (e.g., UPPER(col), date_format(ts)) stored in the metadata table (1.0+).
Secondary Indexes: Support for bloom filter, column stats, and bucket indexes as secondary lookup structures alongside the primary index (1.0+).
Multi-Modal Index: Metadata table consolidates RLI, column stats, bloom filters, and partition stats into one unified structure (1.0+).
Lake Cache: A caching layer for frequently accessed metadata and data blocks, reducing repeated object-store reads.
Metaserver: A centralized metadata service to offload metadata from the .hoodie directory into a scalable server-side store.
File Organization:
Base Path: The root directory where the Hudi table is stored (e.g., /data/hudi_trips/).
Meta Path: The .hoodie directory containing table metadata, timeline, and logs.
Partition Path: Subdirectories within the base path that organize data by partition keys (e.g., /americas/brazil/sao_paulo/).
Metadata Table: An internal Hudi table (LSM-tree based) storing indexes, column stats, bloom filters, and record-level mappings.

Apache Paimon (Latest: 1.4.0, development: 1.5-SNAPSHOT)

Key Concepts:
Table Types: Paimon supports Primary Key Tables (with LSM-tree structure for streaming updates) and Append-Only Tables (for batch analytics). Table modes include Merge-on-Read (MOR), Copy-on-Write (COW), and Merge-on-Write (MOW) with Deletion Vectors.
Merge Engines: Deduplicate (keep last row), Partial Update (progressively complete records), Aggregation (aggregate values), First Row (keep earliest record).
Incremental Queries: Supports querying changes between snapshot IDs or time ranges, useful for change data capture (CDC).
Flink Integration: Paimon is deeply integrated with Apache Flink, making it suitable for stream processing applications. Also supports Spark, Hive, Trino, StarRocks, and Doris.
CDC Ingestion: Native CDC support from MySQL, Kafka, MongoDB, Pulsar, PostgreSQL, and Flink CDC with schema evolution.
Iceberg Metadata Compatibility: Paimon tables can generate Iceberg-compatible metadata, enabling Iceberg-aware engines (Trino, Spark, etc.) to read Paimon tables without code changes.
REST Catalog: Paimon implements a REST Catalog API for cross-engine catalog federation.
Deletion Vectors: Supported in MOW mode to avoid full file rewrites on point-deletes/updates ('deletion-vectors.enabled' = 'true').
Incremental Clustering: Z-order, Hilbert-curve, and order sorting during compaction for multi-dimensional data skipping.
Multimodal Data Lake: Blob storage for multimodal data (images, videos, audio), Vector Index (DiskANN) for ANN search, BTree Index for scalar lookups, and PyPaimon native Python SDK.
File Organization:
Changelog Files: Paimon materializes change data capture (CDC) files for commits, providing a record of data changes.
Bucket Structure: Data is organized into buckets, each containing changelog and data files.
Snapshots View: The table$snapshots system table provides information about available snapshots.
Tag Management: Automatic tag creation on a schedule with retention policies ('tag.automatic-creation' = 'process-time').
Branches: Lightweight branches for isolated data manipulation.

Delta Lake (Latest: 4.1.0)

Key Concepts:
Delta Log: A transaction log that tracks all changes to the table.
ACID Transactions: Ensures data consistency with serializable isolation levels.
Time Travel: Supports querying previous versions of the table.
Schema Enforcement: Enforces a schema on write to ensure data quality.
Change Data Feed: Provides a mechanism for capturing changes to the table.
Deletion Vectors: Row-level soft deletes stored as RoaringBitmap files (.bin), enabling 10-100x faster DELETE/UPDATE operations without rewriting entire Parquet files. Enabled by default in Delta 4.x.
Liquid Clustering: Replaces Hive-style PARTITION BY with flexible, mutable CLUSTER BY semantics. Cluster keys can be changed without rewriting data (ALTER TABLE t CLUSTER BY (new_col)). Applied incrementally via OPTIMIZE.
UniForm (Universal Format): Generates Apache Iceberg v2 metadata alongside the Delta transaction log, enabling Iceberg-aware engines (Trino, Dremio, DuckDB, StarRocks, etc.) to read Delta tables natively without ETL.
Variant Type: New VARIANT SQL data type for semi-structured JSON-like data based on the Apache Parquet Variant Shredding spec.
Row Tracking: Assigns stable, globally unique Row IDs to every row, persisting through UPDATE operations for row-level change tracking and lineage.
Type Widening: Allows widening column types without full table rewrite (e.g., INT to LONG, FLOAT to DOUBLE, DECIMAL(p,s) to DECIMAL(p2,s2)).
Coordinated Commits: Replaces filesystem-level atomic rename with an external commit coordinator for reliable multi-writer workloads on cloud storage.
Delta Kernel: Java and Rust libraries for building Delta connectors without needing to understand protocol details. Powers the official Flink connector rewrite.
V2 Checkpoints: New checkpoint format with Parquet + JSON sidecar for faster checkpoint writes and reads.
In-Commit Timestamps: Monotonically increasing, storage-independent timestamps embedded in every commit for deterministic time travel on cloud storage.
Server-Side Planning: Delegate scan planning to catalog servers following the Iceberg REST Catalog API (preview in 4.1.0).
Catalog-Managed Tables: Support for Unity Catalog managed Delta tables with atomic CTAS and OAuth-based authentication (preview in 4.1.0).
Table Features Model: Protocol writer version 7 enables individual feature flags (e.g., deletionVectors, rowTracking, liquidClustering) declared via writerFeatures in the protocol action.
File Organization:
Delta Files: JSON files that store the transaction log in _delta_log/.
Checkpoint Files: Periodic snapshots of the table state to accelerate recovery. V2 checkpoints use Parquet + JSON sidecar.
Data Files: Parquet files that store the actual data.
Deletion Vector Files: .bin files containing RoaringBitmap data for soft deletes.

Common Operations

Creating Tables: Example: CREATE TABLE iceberg_catalog.db.order_h (...) PARTITIONED BY (year(Order_ts), st)
Each format provides specific syntax for creating tables, defining schema, and specifying partitioning.
Inserting Data: Example: INSERT INTO iceberg_catalog.db6.order_h VALUES (...)
Data can be inserted using SQL statements or by writing DataFrames from Spark or other processing engines.
Querying Data: Example: SELECT * FROM iceberg_catalog.nyc.taxis_COW
Standard SQL syntax is used to query data, with support for filtering, aggregation, and joins.
Updating Data: Updates depend on table type. Copy-on-write involves rewriting files; merge-on-read uses delta logs.
Deleting Data: Deletion can be physical (removing files) or logical (using deletion vectors).
Time Travel: Example: SELECT * FROM iceberg_catalog.db.movies VERSION AS OF 'tg_88'
Querying data as of a specific version, tag, or timestamp.
Branching: Example: ALTER TABLE iceberg_catalog.db.permits CREATE BRANCH etl_today
Tagging: Example: ALTER TABLE iceberg_catalog.db.movies CREATE TAG tg_88 RETAIN 365 DAYS
Compaction: Process of merging smaller data files into larger ones to improve read performance.
Cleanup: Removing obsolete data files and metadata to reclaim storage and maintain performance.

Implementation and Configuration

Spark Integration

Configuring Catalogs: Example:
spark.sql.catalog.iceberg_catalog.type=hadoop
spark.sql.catalog.iceberg_catalog.warehouse=s3://warehouse/path
Spark Session Extensions: Used to enable specific table format features within Spark.
Example: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

Catalog Management

Hive Metastore: A central repository for storing table metadata, commonly used with Hadoop-based systems. Configuration involves setting up the Hive Metastore URI and necessary dependencies.
Cloud-Managed Catalogs: Serverless metadata catalog services offered by cloud providers. Integration typically requires configuring credentials and specifying the catalog endpoint.
Hadoop Catalog: A simple catalog implementation that stores metadata on the Hadoop file system.
Iceberg REST Catalog: An OpenAPI-based standard that has become the de facto interoperability layer for catalog federation across engines and platforms.
Apache Polaris (Incubating): An open-source catalog service under the Apache Software Foundation (incubating since 2024), implementing the Iceberg REST Catalog spec. Provides fine-grained access control and cross-engine catalog management.
Project Nessie: An open-source Git-like branching catalog for Iceberg tables, enabling branch/tag/merge workflows on table metadata.
Apache Gravitino (Incubating): A unified metadata management framework supporting multiple table formats and catalog types.
Unity Catalog: A unified governance solution for data and AI assets. Open-sourced under the Apache umbrella (incubating since 2024). Supports Delta Lake catalog-managed tables with atomic CTAS.

Common Code Examples

Creating a Database: spark.sql("CREATE DATABASE IF NOT EXISTS iceberg_catalog.nyc")
Showing Tables: spark.sql("SHOW TABLES").show()
Setting Table Properties: spark.sql("ALTER TABLE iceberg_catalog.db.taxis_COW SET TBLPROPERTIES ('write.wap.enabled' = 'true')")

Data Migration

Copying Data: Moving data from Hive tables to object storage in Parquet format.
Example: Use your cloud provider's CLI or tools like hadoop distcp to copy data to object storage.
Creating Iceberg Tables from Migrated Data: Defining Iceberg tables that point to the migrated data location.
Ensuring the metadata is correctly configured to reflect the data layout.
Table Path Re-Write (Iceberg 1.8.0+): Migrate tables between storage locations (e.g., HDFS to S3) while preserving full table history and metadata using the rewrite_path procedure.

Advanced Topics

Streaming Ingestion

Reading Streams: Reading data from streaming sources like Kafka or Pulsar using spark.readStream.
Writing Streams: Writing streaming data to table formats using df.writeStream.
Schema Definition: Defining schema for incoming data streams to ensure compatibility with the table format.
CDC Ingestion: Paimon and Hudi offer native CDC support from MySQL, Kafka, MongoDB, PostgreSQL, and other sources with schema evolution.
Flink Integration: Paimon and Iceberg provide deep Flink integration for streaming reads and writes. Delta Lake supports Flink via the Kernel-based connector.

Security and Governance

Access Control: Implementing access control policies to restrict data access based on user roles or permissions.
Data Masking: Protecting sensitive data by masking or redacting information.
Encryption: Encrypting data at rest and in transit to prevent unauthorized access.
Catalog-Level Governance: Apache Polaris, Unity Catalog, and Gravitino provide centralized governance with fine-grained access control, audit logging, and policy enforcement across table formats.

Performance Optimization

Partitioning Strategies: Selecting appropriate partition keys to optimize query performance. Considerations for cardinality, data skew, and query patterns.
Hidden Partitioning: Iceberg and Paimon support partition transforms that eliminate the need for users to be aware of partition layout in queries.
Liquid Clustering (Delta Lake): Adaptive, mutable clustering that replaces static partitioning with flexible CLUSTER BY semantics.
Incremental Clustering (Paimon): Z-order, Hilbert-curve, and order sorting during compaction for optimized data layout.
Data Skipping Techniques: Leveraging metadata information to skip irrelevant data files during query execution.
File Indexes: Bloom filters, bitmap indexes, range bitmap indexes (Paimon), and Puffin statistics files (Iceberg) accelerate scan planning.
Compaction and Vacuuming: Managing data files and metadata to maintain optimal storage and query performance. Regular compaction to consolidate small files. Vacuuming to remove obsolete data and metadata.
File Formats: Using Parquet for its columnar storage, compression, and encoding schemes, optimized for analytical queries.
Deletion Vectors: Avoiding full file rewrites for DELETE/UPDATE operations by using compact bitmap-based soft deletes.
Manifest Rewriting (Iceberg): Consolidating and reorganizing manifest files to improve query planning time and partition pruning efficiency.

Cross-Format Interoperability

Modern open table formats are increasingly converging on interoperability:

Delta UniForm: Delta Lake generates Iceberg v2 metadata alongside the Delta log, enabling Iceberg-aware engines to read Delta tables.
Paimon Iceberg Metadata: Paimon generates Iceberg-compatible metadata, allowing Iceberg-aware engines to read Paimon tables.
Iceberg REST Catalog Standard: An OpenAPI specification that has become the common catalog interoperability layer. Delta Lake 4.1.0 supports server-side planning via this API.
Apache XTable (Incubating): A cross-table format conversion tool (formerly OneTable) enabling metadata translation between Iceberg, Delta Lake, and Hudi without data copying.

Engine Compatibility Matrix

Engine	Apache Iceberg	Delta Lake	Apache Hudi	Apache Paimon
Apache Spark 3.5	Full (v1+v2)	Delta 3.x	Full (1.0+)	Full (1.4)
Apache Spark 4.0/4.1	Full (v3 preview)	Delta 4.0/4.1	Experimental	Experimental
Apache Flink 1.18-2.0	Full	Kernel-based connector	Supported	Deep integration
Trino	Full (v2, v3 in progress)	Connector with DV support	Read support	Via Iceberg compat
Presto	Full	Connector	Read support	Via Iceberg compat
DuckDB	Native	Native `delta_scan()`	-	-
StarRocks	Native connector	-	Connector	Native connector
Doris	Native connector	-	Connector	Native connector
Hive	Supported	-	Supported	Supported
pandas	`pd.read_iceberg()` (3.0+)	`deltalake` Python package	-	PyPaimon SDK

Reference:

Apache Iceberg Spec v2 — Row-Level Deletes

Apache Iceberg Spec v3 — Extended Types and Capabilities

Apache Iceberg Releases (Latest: 1.10.1)

Delta Lake Protocol Specification

Delta Lake Releases (Latest: 4.1.0)

Apache Hudi Releases (Latest: 1.1.1)

Apache Hudi Tech Specs

Apache Paimon Documentation (Latest: 1.4)

Apache Paimon Specification

Apache XTable (Incubating)

Table Formats TLA+ and Fizzbee

Glossary of Key Terms

ACID Transactions: A set of properties that guarantee reliable processing of database transactions (Atomicity, Consistency, Isolation, Durability).
Base Path: The root directory where data is stored (e.g., in Apache Hudi).
Branching: Allows users to create lightweight, independent branches of a table without duplicating the underlying data.
Change Data Capture (CDC): The process of identifying and tracking changes to data in a database or data warehouse.
Commit Log: A transaction log that records all changes to a table.
Compaction: The process of merging smaller data files into larger ones to optimize storage and query performance.
Copy-on-Write (CoW): A table type where updates rewrite entire data files.
Data Catalog: A metadata management tool that stores table schemas, locations, and other metadata.
Data Lakehouse: A data management architecture that combines the best elements of data lakes and data warehouses.
Data Skipping: A technique to reduce I/O by skipping irrelevant data files based on metadata.
Delta Lake: An open-source storage layer that brings ACID transactions to Apache Spark and big data workloads. Latest: 4.1.0 (Feb 2026).
Delta Kernel: Java and Rust libraries for building Delta Lake connectors without needing to track protocol version details.
Deletion Vector: A compact bitmap data structure (RoaringBitmap) used to mark rows as deleted without physically removing them. Supported in Delta Lake, Iceberg v3, and Paimon.
File Format: The format in which data is stored, such as Parquet, ORC, or Avro.
Hadoop Catalog: A simple catalog implementation that stores metadata on the Hadoop file system.
Hidden Partitioning: Partition transforms that decouple physical data layout from SQL queries, eliminating the need for partition-aware WHERE clauses.
Hive Metastore: A central repository for storing metadata about Hive tables, schemas, and partitions.
Apache Hudi: An open-source data lake platform that provides support for incremental data processing and data management. Latest: 1.1.1 (Dec 2025).
Apache Iceberg: An open table format for large analytic datasets. Latest: 1.10.1 (Dec 2025), Spec v2 (stable) / v3 (progressive).
Iceberg REST Catalog: An OpenAPI-based standard for catalog interoperability that has become the de facto cross-engine catalog protocol.
Incremental Queries: Supports querying changes between snapshot IDs or time ranges.
Liquid Clustering: Delta Lake's adaptive, mutable data clustering that replaces static Hive-style partitioning with flexible CLUSTER BY semantics.
Merge-on-Read (MoR): A table type where updates are written as delta logs and merged during reads.
Merge-on-Write (MoW): A table mode (Paimon) where updates use deletion vectors to mark old records and write new data files, balancing write and read performance.
Apache Paimon: A streaming data lake platform that provides a unified architecture for batch and stream processing, with LSM-tree powered real-time updates. Latest: 1.4.0.
Partitioning: The division of a table into smaller, more manageable parts based on column values.
Apache Polaris (Incubating): An open-source catalog service implementing the Iceberg REST Catalog spec, under the Apache Software Foundation.
Puffin Files: Iceberg statistics files containing bloom filters and column-level NDV (number of distinct values) stats for accelerated scan planning.
Record-Level Index (RLI): A Hudi index providing O(1) record lookups by mapping each record to its file group in the metadata table.
Schema Evolution: The ability to modify a table's schema without rewriting the entire table.
Snapshot: A consistent view of a table at a specific point in time.
Tagging: Easily manage versions, collaborate, and maintain data consistency in large-scale environments.
Time Travel: The ability to query historical versions of a table.
UniForm (Universal Format): Delta Lake's capability to generate Iceberg v2 metadata alongside the Delta transaction log for cross-format interoperability.
Apache XTable (Incubating): A cross-table format conversion tool enabling metadata translation between Iceberg, Delta Lake, and Hudi.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Delta		Delta
Hudi		Hudi
Iceberg		Iceberg
Paimon		Paimon
notebooks		notebooks
Canonical.md		Canonical.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Table Formats