What Is Unity Catalog?

Unity Catalog is Databricks' unified governance and metadata catalog for the Databricks Lakehouse Platform. Introduced in 2021, it replaced workspace-level Hive Metastores with a centralized, account-level catalog that spans all Databricks workspaces — providing consistent access control, lineage, and discoverability across the entire organization's data and AI assets.

Unity Catalog uses a three-level namespace hierarchy: catalog.schema.table (compared to HMS's two-level database.table). This three-level structure allows organizations to organize data by environment (prod, dev), domain (sales, marketing), and subject area — providing more flexible organizational taxonomy than the traditional HMS structure.

In 2024, Databricks made a significant strategic move by open-sourcing Unity Catalog under Apache License 2.0. The open-source version (OSS Unity Catalog) can be deployed independently of Databricks and exposes an Iceberg REST Catalog API, enabling non-Databricks engines to connect.

Unity Catalog Governance Features

Unity Catalog provides comprehensive governance across all Databricks assets:

  • Centralized RBAC: Account-level user management with workspace inheritance. Permissions assigned at catalog, schema, table, column, and row level
  • Column-level security: Data masking policies applied transparently to specific columns for specific user groups
  • Row-level security: Dynamic row filters applied based on the querying user's identity
  • Data lineage: Automatic tracking of how tables are created and modified — which upstream tables each table depends on, which downstream tables depend on each table
  • Audit logging: All data access events logged for compliance and security investigation
  • Data discovery: Search-based discovery of tables, columns, and AI assets across the account
Unity Catalog Governance Architecture diagram
Figure 1: Unity Catalog's governance scope — centralized across all workspaces, all asset types.

Unity Catalog vs. Apache Polaris

DimensionUnity CatalogApache Polaris
Governance scopeTables, models, notebooks, filesIceberg tables only
Lineage trackingYes (automatic)No
Open sourceYes (Apache 2.0, 2024)Yes (ASF)
Engine dependencyOptimized for DatabricksVendor-neutral
Git-like branchingNoNo
REST Catalog APIYes (OSS version)Yes

Unity Catalog is the right choice for organizations fully committed to the Databricks platform who want comprehensive cross-asset governance. Polaris and Nessie are better choices for organizations prioritizing vendor neutrality and multi-cloud portability.

Unity Catalog vs Polaris Comparison diagram
Figure 2: Unity Catalog vs Apache Polaris — governance scope and vendor dependency trade-offs.

Summary

Unity Catalog is the most comprehensive governance catalog in the commercial lakehouse ecosystem — spanning tables, ML models, notebooks, and dashboards with fine-grained access control, lineage, and discovery in a single system. Its 2024 open-source release and Iceberg REST API support moved it into the broader ecosystem, making it a viable catalog for multi-engine environments. For pure Databricks shops, Unity Catalog is the natural choice; for organizations building on open standards and multi-engine lakehouses, Apache Polaris or Project Nessie provide stronger vendor neutrality.