Open Table Formats Explained

The metadata layer that separates a data swamp from a data lakehouse.

What Is an Open Table Format?

Imagine a massive library with millions of books scattered across thousands of rooms, with no card catalog, no numbering system, and no organization. Finding a specific book requires walking through every room and scanning every shelf. That is essentially what querying a data lake looks like without an open table format.

An Open Table Format is a metadata specification that sits between raw data files (stored as Apache Parquet on cloud object storage) and the query engines that process them. It acts as the library's card catalog — an explicit, structured index that tells a query engine exactly which files contain the data it needs, without having to look at a single file it doesn't.

Definition: An Open Table Format is an open-source specification for organizing, versioning, and indexing data files in cloud object storage. It provides ACID transactions, schema enforcement, query statistics, and metadata tracking to turn a raw collection of files into a queryable, governed, transactionally consistent database table.

Why Open Table Formats Exist: The Hive Problem

The story of open table formats begins with Apache Hive and its metastore, which was the dominant way to organize big data for over a decade. The Hive model defined a table as a directory on a distributed file system. Partitions were subdirectories. When you queried a table, the engine listed the directory contents to find files.

This "directory-first" approach had five catastrophic failure modes that became increasingly obvious as data lakes scaled to petabytes on cloud object storage:

  1. Slow Directory Listing: Listing millions of files on Amazon S3 is extremely slow — S3 is not a file system. Queries began timing out just during the "planning" phase, before reading a single byte of data.
  2. No ACID Guarantees: If two Spark jobs wrote to the same directory simultaneously, files would be overwritten and corrupted. There was no "safe concurrent write" primitive.
  3. No Safe Deletes or Updates: Deleting a row from a Hive table required rewriting the entire partition — a massive, expensive operation for any table with significant data.
  4. Schema Drift: If a data producer changed a column name, every query against that table broke. There was no mechanism for backward-compatible schema evolution.
  5. No Time Travel: Once data was overwritten, it was gone. You couldn't query the table as it looked yesterday.

Netflix, Databricks, and Uber engineers independently arrived at the same conclusion: the solution requires moving from directory-level tracking to file-level tracking with explicit transactional metadata.

How Open Table Formats Work: The Core Mechanism

All open table formats — regardless of which one you choose — share the same foundational mechanism: they maintain an explicit, atomic log of every data file that belongs to a table.

When a writer wants to add data, it writes new files to object storage, then updates the metadata to atomically "register" those files as part of the table. Readers always see a consistent, committed snapshot — they never see partially-written data. When data is deleted, the files aren't immediately erased; instead, they are "de-registered" from the metadata, making them invisible to readers. Actual file deletion happens later during a garbage-collection sweep.

      sequenceDiagram
        participant Writer
        participant Catalog
        participant Metadata
        participant S3

        Writer->>S3: 1. Write new Parquet files
        Note over S3: Files exist but are invisible to readers
        Writer->>Metadata: 2. Create new Manifest / Log entry
        Writer->>Catalog: 3. Atomic swap to new metadata pointer
        Note over Catalog: Atomic! Either succeeds or fails completely
        Catalog-->>Writer: Commit confirmed
        Note over S3,Catalog: All readers now see the new data instantly
      

The Three Major Open Table Formats

Apache Iceberg

Born at Netflix and donated to the Apache Software Foundation, Iceberg is built around a metadata tree. A root JSON file (managed by the Catalog) points to a snapshot, which points to a Manifest List, which points to Manifest Files, which explicitly list every data file along with column-level statistics (min/max values, null counts).

Iceberg's defining architectural choices are:

Delta Lake

Born at Databricks, Delta Lake uses a transaction log stored in a _delta_log/ directory alongside the data. Every transaction appends a new JSON file to this log. Periodically, Delta computes a "checkpoint" (a Parquet file summarizing the entire history) to prevent the log from becoming too long.

Delta's architectural choices:

Apache Hudi

Born at Uber, Hudi uses a timeline stored in a .hoodie/ directory to track all actions on the table. Hudi's architecture is explicitly designed around primary keys and upsert workloads. It maintains a built-in index (using Bloom filters or HBase) so it can efficiently locate which file contains a specific record ID during an update.

Hudi's architectural choices:

Feature Comparison Matrix

Feature Apache Iceberg Delta Lake Apache Hudi
OriginNetflix / ApacheDatabricks / Linux FoundationUber / Apache
Metadata ModelHierarchical Tree (JSON + Avro)Sequential Log (JSON + Parquet)Timeline (.hoodie dir)
Schema EvolutionBest-in-class (ID-based)Strong (name-based + column mapping)Good (Avro-based)
Partition EvolutionYes (Hidden Partitioning)No (requires full rewrite)No (requires full rewrite)
Time TravelYes (Snapshots)Yes (Log version history)Yes (Timeline)
Row-Level DeletesYes (Position + Equality Deletes)Yes (Deletion Vectors)Yes (MoR)
Best EngineAny (Dremio, Spark, Flink, Trino)Apache Spark / DatabricksApache Spark / Flink
Upsert PerformanceGoodGoodExcellent (primary key index)
Vendor NeutralityExcellentGood (open-source core)Good

Open Table Formats vs. File Formats

A common source of confusion is conflating "table formats" with "file formats." They are completely different layers of the stack:

An Apache Iceberg table typically stores its actual data as Parquet files. The Iceberg metadata layer (JSON + Avro manifests) is layered on top, tracking which Parquet files belong to the table.

The Role of Catalogs

Open table formats describe how metadata is structured, but they don't tell you how to find that metadata for a given table name. That's the Catalog's job. A catalog maps table names (like my_catalog.my_database.sales) to the root metadata file location on object storage.

The Iceberg REST Catalog specification provides a standard HTTP API for catalog operations, enabling true multi-engine interoperability. Implementations like Apache Polaris and Project Nessie fulfill this spec.

Choosing the Right Format for Your Workload

Conclusion

Open Table Formats are the foundational innovation that made the Data Lakehouse possible. By replacing Hive's chaotic directory-listing model with a precise, hierarchical, transactional metadata tree, they deliver ACID guarantees, schema evolution, time travel, and blazing query performance — all directly on cheap cloud object storage.

For most modern data teams starting fresh in 2026, Apache Iceberg is the pragmatic default. Its open specification, multi-engine support, and architectural elegance have earned it dominant adoption across the cloud ecosystem.