Apache Iceberg Snapshots, Time Travel & Rollbacks

How Iceberg gives every table a complete, queryable history.

Introduction: Why Snapshots Matter

One of the most powerful and underappreciated features of Apache Iceberg is its built-in snapshot system. Every time data is written to an Iceberg table — whether an insert, update, or delete — Iceberg creates a new snapshot. This snapshot is a complete, immutable record of exactly which files made up the table at that moment in time.

Because snapshots are additive (new ones are created, old ones preserved), Iceberg tables have a built-in time machine. You can query any table exactly as it appeared at any point in history. You can roll back a bad batch job in seconds. You can run audits against yesterday's data without maintaining separate backup copies.

This isn't a backup strategy bolted on top — it's a fundamental property of how Iceberg stores and tracks data.

Key Insight: In Iceberg, "deleting" data doesn't immediately erase any files. It creates a new snapshot where those files are no longer referenced. Old snapshots — and the files they reference — remain intact until explicitly expired. This is what makes time travel possible.

What is an Iceberg Snapshot?

An Iceberg snapshot is a named, immutable state of a table at a specific point in time. It is represented as a JSON entry inside the Table Metadata file, and contains:

      graph LR
        S1["Snapshot 1
ID: 1001
ts: 2026-01-01
+ 1M rows inserted"] --> S2["Snapshot 2
ID: 1002
ts: 2026-01-15
+ 500K rows inserted"]
        S2 --> S3["Snapshot 3
ID: 1003
ts: 2026-01-30
50K rows deleted"]
        S3 --> S4["Snapshot 4 (CURRENT)
ID: 1004
ts: 2026-02-10
Schema change: new column"]

        style S4 fill:#dcfce7,stroke:#22c55e,stroke-width:2px
        style S1 fill:#f1f5f9,stroke:#94a3b8
        style S2 fill:#f1f5f9,stroke:#94a3b8
        style S3 fill:#f1f5f9,stroke:#94a3b8
      

How Time Travel Works Under the Hood

When a query engine executes a time travel query, the mechanism is straightforward because of how the metadata tree is structured:

  1. The engine reads the Table Metadata JSON file (retrieved from the Catalog).
  2. It scans the snapshots array inside the JSON to find the snapshot whose timestamp matches (or is closest before) the requested time.
  3. It follows that snapshot's Manifest List pointer to get the file index for that historical state.
  4. It reads only the data files listed in those manifests — which represent the table exactly as it was at that time.

No data copying, no backup restores, no special infrastructure. It's just metadata pointer navigation.

Time Travel SQL Syntax

Different query engines expose Iceberg's time travel through slightly different SQL syntax, but the underlying mechanism is identical.

Query by Timestamp

-- Dremio / Spark SQL: query as of a specific timestamp
SELECT * FROM sales FOR SYSTEM_TIME AS OF '2026-01-15 00:00:00';

-- Trino syntax
SELECT * FROM sales FOR TIMESTAMP AS OF TIMESTAMP '2026-01-15 00:00:00 UTC';

Query by Snapshot ID

-- When you know the exact snapshot ID you want
SELECT * FROM sales FOR SYSTEM_VERSION AS OF 1002;

-- Spark with options
SELECT * FROM spark_catalog.db.sales VERSION AS OF 1002;

List All Snapshots

-- See the full history of a table
SELECT * FROM "my_catalog"."my_db"."sales$snapshots";

-- In Spark
SELECT * FROM spark_catalog.db.sales.snapshots;

Practical Use Cases for Time Travel

1. Auditing and Regulatory Compliance

Financial and healthcare organizations often need to prove exactly what data they held at a specific point in time for regulatory audits. Instead of maintaining expensive, frozen backup copies of data, Iceberg's snapshot system provides instant, query-able access to any historical state of the data.

2. Recovering from a Bad Data Pipeline

An ETL job runs at 2 AM, accidentally populates a column with incorrect values across 50 million rows, and your data engineers don't notice until 9 AM. With Iceberg, the fix takes seconds:

-- Step 1: Identify the snapshot before the bad run
SELECT snapshot_id, committed_at FROM "sales$snapshots" ORDER BY committed_at DESC;

-- Step 2: Roll back (Spark procedure)
CALL spark_catalog.system.rollback_to_snapshot('db.sales', 1001);

-- Step 3: Validate the data is restored, then re-run the corrected pipeline

3. Reproducing ML Model Training Data

When a machine learning model produces unexpected results, data scientists need to reproduce the exact dataset that was used for training. With Iceberg snapshots, they can query the training table as it existed on the exact date the training run happened — even months later.

4. "What Changed?" Incremental Analysis

Iceberg supports Incremental Reads — reading only the records that changed between two snapshots. This is invaluable for streaming pipelines that only want to process new data.

-- Read only records added between snapshot 1001 and 1003
SELECT * FROM sales
  START VERSION AS OF 1001
  END VERSION AS OF 1003;

Snapshot Retention and Expiration

Snapshots come with a tradeoff: because old files are retained until their referencing snapshots are expired, storage costs grow over time if snapshots are never cleaned up. Iceberg provides the expire_snapshots procedure to manage this.

-- Expire snapshots older than 7 days (Spark)
CALL spark_catalog.system.expire_snapshots(
  table => 'db.sales',
  older_than => TIMESTAMP '2026-05-08 00:00:00',
  retain_last => 5  -- always keep at least 5 snapshots regardless of age
);

After expiring snapshots, the underlying Parquet files that are no longer referenced by any active snapshot become "orphan files." You then run a remove_orphan_files procedure to delete them from object storage and reclaim the storage cost.

Dremio automates this entire lifecycle — snapshot expiration and orphan file cleanup run automatically on a configurable schedule, removing the operational burden from data engineers.

Snapshot Isolation: How Concurrent Readers and Writers Coexist

A critical property of Iceberg snapshots is that they enable Snapshot Isolation — the same isolation level used by production-grade relational databases. This means:

This is fundamentally different from a file system, where there is no concept of "snapshot isolation" — reads and writes on the same directory can interfere with each other at any time.

Branching and Tagging (Iceberg v2+)

Iceberg's snapshot model was extended in recent versions to support Git-like branches and tags, most commonly used through Project Nessie:

Conclusion

Apache Iceberg's snapshot system is one of the most practically valuable features in the modern data stack. It transforms every table into a fully versioned, auditable, time-travelable dataset — with zero additional infrastructure beyond what Iceberg already maintains for query performance.

The ability to roll back a bad pipeline in seconds, reproduce exact historical datasets for ML, and give auditors instant access to point-in-time data makes Iceberg's snapshot model an essential capability for any serious data lakehouse deployment.