Introduction: Why Snapshots Matter
One of the most powerful and underappreciated features of Apache Iceberg is its built-in snapshot system. Every time data is written to an Iceberg table — whether an insert, update, or delete — Iceberg creates a new snapshot. This snapshot is a complete, immutable record of exactly which files made up the table at that moment in time.
Because snapshots are additive (new ones are created, old ones preserved), Iceberg tables have a built-in time machine. You can query any table exactly as it appeared at any point in history. You can roll back a bad batch job in seconds. You can run audits against yesterday's data without maintaining separate backup copies.
This isn't a backup strategy bolted on top — it's a fundamental property of how Iceberg stores and tracks data.
What is an Iceberg Snapshot?
An Iceberg snapshot is a named, immutable state of a table at a specific point in time. It is represented as a JSON entry inside the Table Metadata file, and contains:
- Snapshot ID: A unique long integer (e.g.,
8744736658020583554) identifying this specific version. - Timestamp (ms): The Unix epoch millisecond when the snapshot was committed.
- Manifest List location: A pointer to the Avro Manifest List file that indexes all the data files belonging to this snapshot.
- Summary: Statistics about the operation — how many records were added, deleted, or how many files changed.
- Parent Snapshot ID: The ID of the previous snapshot, forming a linked chain of history.
graph LR
S1["Snapshot 1
ID: 1001
ts: 2026-01-01
+ 1M rows inserted"] --> S2["Snapshot 2
ID: 1002
ts: 2026-01-15
+ 500K rows inserted"]
S2 --> S3["Snapshot 3
ID: 1003
ts: 2026-01-30
50K rows deleted"]
S3 --> S4["Snapshot 4 (CURRENT)
ID: 1004
ts: 2026-02-10
Schema change: new column"]
style S4 fill:#dcfce7,stroke:#22c55e,stroke-width:2px
style S1 fill:#f1f5f9,stroke:#94a3b8
style S2 fill:#f1f5f9,stroke:#94a3b8
style S3 fill:#f1f5f9,stroke:#94a3b8
How Time Travel Works Under the Hood
When a query engine executes a time travel query, the mechanism is straightforward because of how the metadata tree is structured:
- The engine reads the Table Metadata JSON file (retrieved from the Catalog).
- It scans the
snapshotsarray inside the JSON to find the snapshot whose timestamp matches (or is closest before) the requested time. - It follows that snapshot's Manifest List pointer to get the file index for that historical state.
- It reads only the data files listed in those manifests — which represent the table exactly as it was at that time.
No data copying, no backup restores, no special infrastructure. It's just metadata pointer navigation.
Time Travel SQL Syntax
Different query engines expose Iceberg's time travel through slightly different SQL syntax, but the underlying mechanism is identical.
Query by Timestamp
-- Dremio / Spark SQL: query as of a specific timestamp SELECT * FROM sales FOR SYSTEM_TIME AS OF '2026-01-15 00:00:00'; -- Trino syntax SELECT * FROM sales FOR TIMESTAMP AS OF TIMESTAMP '2026-01-15 00:00:00 UTC';
Query by Snapshot ID
-- When you know the exact snapshot ID you want SELECT * FROM sales FOR SYSTEM_VERSION AS OF 1002; -- Spark with options SELECT * FROM spark_catalog.db.sales VERSION AS OF 1002;
List All Snapshots
-- See the full history of a table SELECT * FROM "my_catalog"."my_db"."sales$snapshots"; -- In Spark SELECT * FROM spark_catalog.db.sales.snapshots;
Practical Use Cases for Time Travel
1. Auditing and Regulatory Compliance
Financial and healthcare organizations often need to prove exactly what data they held at a specific point in time for regulatory audits. Instead of maintaining expensive, frozen backup copies of data, Iceberg's snapshot system provides instant, query-able access to any historical state of the data.
2. Recovering from a Bad Data Pipeline
An ETL job runs at 2 AM, accidentally populates a column with incorrect values across 50 million rows, and your data engineers don't notice until 9 AM. With Iceberg, the fix takes seconds:
-- Step 1: Identify the snapshot before the bad run
SELECT snapshot_id, committed_at FROM "sales$snapshots" ORDER BY committed_at DESC;
-- Step 2: Roll back (Spark procedure)
CALL spark_catalog.system.rollback_to_snapshot('db.sales', 1001);
-- Step 3: Validate the data is restored, then re-run the corrected pipeline
3. Reproducing ML Model Training Data
When a machine learning model produces unexpected results, data scientists need to reproduce the exact dataset that was used for training. With Iceberg snapshots, they can query the training table as it existed on the exact date the training run happened — even months later.
4. "What Changed?" Incremental Analysis
Iceberg supports Incremental Reads — reading only the records that changed between two snapshots. This is invaluable for streaming pipelines that only want to process new data.
-- Read only records added between snapshot 1001 and 1003 SELECT * FROM sales START VERSION AS OF 1001 END VERSION AS OF 1003;
Snapshot Retention and Expiration
Snapshots come with a tradeoff: because old files are retained until their referencing snapshots are expired, storage costs grow over time if snapshots are never cleaned up. Iceberg provides the expire_snapshots procedure to manage this.
-- Expire snapshots older than 7 days (Spark) CALL spark_catalog.system.expire_snapshots( table => 'db.sales', older_than => TIMESTAMP '2026-05-08 00:00:00', retain_last => 5 -- always keep at least 5 snapshots regardless of age );
After expiring snapshots, the underlying Parquet files that are no longer referenced by any active snapshot become "orphan files." You then run a remove_orphan_files procedure to delete them from object storage and reclaim the storage cost.
Dremio automates this entire lifecycle — snapshot expiration and orphan file cleanup run automatically on a configurable schedule, removing the operational burden from data engineers.
Snapshot Isolation: How Concurrent Readers and Writers Coexist
A critical property of Iceberg snapshots is that they enable Snapshot Isolation — the same isolation level used by production-grade relational databases. This means:
- A reader that begins scanning a table always reads from a single consistent snapshot. Even if a writer commits new data mid-query, the reader sees the old snapshot and is never surprised by partially-written data.
- Multiple writers can work on different parts of the table simultaneously. Each writer commits a new snapshot atomically. If two writers try to commit conflicting changes (e.g., both updating the same set of rows), Iceberg's Optimistic Concurrency Control detects the conflict and forces one writer to retry.
This is fundamentally different from a file system, where there is no concept of "snapshot isolation" — reads and writes on the same directory can interfere with each other at any time.
Branching and Tagging (Iceberg v2+)
Iceberg's snapshot model was extended in recent versions to support Git-like branches and tags, most commonly used through Project Nessie:
- Tags are named pointers to a specific snapshot — equivalent to a Git tag. Use them to mark important milestones:
end-of-q1-2026,pre-migration-backup. - Branches are named, mutable pointers to a snapshot that can receive new commits. A data engineer can create a
devbranch, run experimental transformations against it, validate the results, and then merge the branch tomain— just like code review.
Conclusion
Apache Iceberg's snapshot system is one of the most practically valuable features in the modern data stack. It transforms every table into a fully versioned, auditable, time-travelable dataset — with zero additional infrastructure beyond what Iceberg already maintains for query performance.
The ability to roll back a bad pipeline in seconds, reproduce exact historical datasets for ML, and give auditors instant access to point-in-time data makes Iceberg's snapshot model an essential capability for any serious data lakehouse deployment.