What Is Project Nessie?
Project Nessie is an open-source, transactional catalog for Apache Iceberg tables that adds Git-like version control semantics to catalog metadata management. Developed by Dremio and available as an open-source standalone service, Nessie tracks catalog changes (table creates, schema modifications, snapshot updates) as commits on a transaction log, and allows those commits to be organized into branches and tags — exactly like Git organizes code commits.
Nessie implements the Iceberg REST Catalog specification, meaning any Iceberg-compatible engine can connect to it as a drop-in catalog. The branching and tagging features are expressed through Nessie-specific extensions to the REST API or through the Nessie CLI/UI — they are optional capabilities layered on top of the standard catalog operations.
Git-Like Branching for Data
Nessie's branching model mirrors Git's design:
Main Branch
The production state of the catalog. Tables on main are the authoritative, production-ready versions. All production engines read from and write to main by default.
Feature Branches
Data engineers create branches for experimental transformations, schema changes, or new table additions. All changes on a feature branch are isolated — they do not affect main or any other branch. Engines connected to the feature branch see only that branch's catalog state.
Merge
When a feature branch is validated (data quality checks pass, queries produce correct results), the branch is merged into main — atomically applying all catalog changes from the branch to production. If two branches have conflicting changes, Nessie detects the conflict and prevents an uncontrolled overwrite.

Nessie Tags for ML Reproducibility
One of Nessie's most valuable capabilities for ML teams is catalog tagging. A Nessie tag is an immutable pointer to a specific catalog commit — like a Git release tag. When an ML model is trained on a specific version of training data, a Nessie tag can record the exact catalog state at training time.
Three months later, when the model needs to be retrained or debugged, the team connects to Nessie at the tagged commit and sees exactly the same tables, schemas, and data as at training time — regardless of how many changes have been made to the data since. This makes ML model training reproducible across time without duplicating data.
Tags also enable point-in-time compliance snapshots: a tag created at quarter-end records the exact data state used for financial reporting, providing an immutable audit trail without any data copying.
Nessie and Dremio Open Catalog
Project Nessie is the catalog engine behind Dremio Open Catalog. When using Dremio Cloud or Dremio's self-hosted platform, the Open Catalog UI and API expose Nessie's branching and tagging capabilities through an enterprise-grade interface — including branch creation and management, tag creation, branch comparison (showing which tables differ between branches), and merge workflows with conflict detection.
Organizations using Dremio Cloud get Nessie's capabilities in a fully managed, enterprise-supported form. Organizations that prefer a self-hosted, open-source catalog can deploy Nessie independently and connect any Iceberg engine to it.

Summary
Project Nessie brings the most powerful pattern in software development — version control — to lakehouse data management. Its Git-like branching and tagging model enables isolated data experimentation, atomic cross-table commits, reproducible ML datasets, and immutable audit snapshots — all on top of open Apache Iceberg tables without data duplication. As the engine behind Dremio Open Catalog, Nessie's capabilities are available in both open-source and enterprise-managed forms.