How does Nessie branching work?

In Nessie, a branch is a named pointer to a specific commit in the catalog's transaction history. Each commit represents an atomic change to the catalog state — creating a table, modifying a table schema, adding data. Branches can be created from any commit, allowing isolated experimentation. When a branch is ready, it can be merged back to main, applying all commits atomically.

What is Nessie's relationship to Dremio?

Project Nessie was created and is actively developed by Dremio. It powers Dremio Open Catalog as the underlying catalog engine. Nessie is also available as a standalone open-source catalog that any Iceberg-compatible engine can connect to via the Iceberg REST Catalog API.

Project Nessie: The Definitive Guide

Q: What is Project Nessie?

Project Nessie is an open-source catalog for Apache Iceberg (and Delta Lake) that implements Git-like version control semantics for table metadata. It supports creating branches, committing changes, merging branches, and tagging specific states — enabling isolated data experimentation, reproducible ML datasets, and atomic cross-table commits.

What Is Project Nessie?

Project Nessie is an open-source, transactional catalog for Apache Iceberg tables that adds Git-like version control semantics to catalog metadata management. Developed by Dremio and available as an open-source standalone service, Nessie tracks catalog changes (table creates, schema modifications, snapshot updates) as commits on a transaction log, and allows those commits to be organized into branches and tags — exactly like Git organizes code commits.

Nessie implements the Iceberg REST Catalog specification, meaning any Iceberg-compatible engine can connect to it as a drop-in catalog. The branching and tagging features are expressed through Nessie-specific extensions to the REST API or through the Nessie CLI/UI — they are optional capabilities layered on top of the standard catalog operations.

Git-Like Branching for Data

Nessie's branching model mirrors Git's design:

Main Branch

The production state of the catalog. Tables on main are the authoritative, production-ready versions. All production engines read from and write to main by default.

Feature Branches

Data engineers create branches for experimental transformations, schema changes, or new table additions. All changes on a feature branch are isolated — they do not affect main or any other branch. Engines connected to the feature branch see only that branch's catalog state.

Merge

When a feature branch is validated (data quality checks pass, queries produce correct results), the branch is merged into main — atomically applying all catalog changes from the branch to production. If two branches have conflicting changes, Nessie detects the conflict and prevents an uncontrolled overwrite.

Project Nessie Git-Like Branching diagram — Figure 1: Nessie branching — isolated feature branches, atomic merge to main, tagged releases.

Nessie Tags for ML Reproducibility

One of Nessie's most valuable capabilities for ML teams is catalog tagging. A Nessie tag is an immutable pointer to a specific catalog commit — like a Git release tag. When an ML model is trained on a specific version of training data, a Nessie tag can record the exact catalog state at training time.

Three months later, when the model needs to be retrained or debugged, the team connects to Nessie at the tagged commit and sees exactly the same tables, schemas, and data as at training time — regardless of how many changes have been made to the data since. This makes ML model training reproducible across time without duplicating data.

Tags also enable point-in-time compliance snapshots: a tag created at quarter-end records the exact data state used for financial reporting, providing an immutable audit trail without any data copying.

Nessie and Dremio Open Catalog

Project Nessie is the catalog engine behind Dremio Open Catalog. When using Dremio Cloud or Dremio's self-hosted platform, the Open Catalog UI and API expose Nessie's branching and tagging capabilities through an enterprise-grade interface — including branch creation and management, tag creation, branch comparison (showing which tables differ between branches), and merge workflows with conflict detection.

Organizations using Dremio Cloud get Nessie's capabilities in a fully managed, enterprise-supported form. Organizations that prefer a self-hosted, open-source catalog can deploy Nessie independently and connect any Iceberg engine to it.

Nessie as Dremio Open Catalog Engine diagram — Figure 2: Nessie powers Dremio Open Catalog — branching and tagging accessible via enterprise UI.

Summary

Project Nessie brings the most powerful pattern in software development — version control — to lakehouse data management. Its Git-like branching and tagging model enables isolated data experimentation, atomic cross-table commits, reproducible ML datasets, and immutable audit snapshots — all on top of open Apache Iceberg tables without data duplication. As the engine behind Dremio Open Catalog, Nessie's capabilities are available in both open-source and enterprise-managed forms.