How is Apache Gravitino different from Apache Polaris?

Polaris is an Iceberg-specific REST catalog. Gravitino is a federated metadata layer that connects to multiple catalog types — including Iceberg REST catalogs, Hive Metastores, JDBC databases, and message queues. Gravitino provides a unified namespace and governance model across all of them, while Polaris focuses on first-class Iceberg catalog capabilities.

What is Gravitino's relationship to the Apache Software Foundation?

Apache Gravitino entered the Apache Software Foundation Incubator in 2024, donated by Datastrato. It is in the process of graduating from incubation to a top-level Apache project.

Apache Gravitino: The Definitive Guide

Q: What is Apache Gravitino?

Apache Gravitino (formerly part of the Datastrato project) is an open-source federated metadata service that provides a unified metadata API across multiple catalog types — Iceberg, Hive Metastore, JDBC databases, Kafka, and others. It acts as a universal catalog federation layer, presenting a single namespace across disparate catalog backends.

What Is Apache Gravitino?

Apache Gravitino is an open-source, federated metadata service and universal catalog API that connects to multiple heterogeneous catalog backends — Apache Iceberg REST catalogs, Hive Metastore, JDBC relational databases, Apache Kafka, and others — and presents them through a single, unified metadata API and namespace.

The problem Gravitino solves is multi-catalog fragmentation: large organizations typically have data spread across many catalog systems — legacy HMS tables, Iceberg REST catalog tables, relational database schemas, streaming topics. Without a federation layer, every engine must be configured separately for each catalog, and there is no unified view of all available data assets, no consistent access control across catalog types, and no cross-catalog lineage.

Gravitino provides a single API surface that federates all these catalogs: engines and users can discover and query data from any connected catalog through Gravitino's unified namespace, and access control policies can be applied consistently across all catalog types.

Gravitino's Federated Catalog Architecture

Gravitino operates with a plugin-based catalog connector model:

Gravitino Server: The central metadata service exposing Gravitino's unified REST API
Catalog Connectors (plugins): Each connector implements the protocol to communicate with a specific catalog type — Iceberg REST, Hive Thrift, JDBC, Kafka, etc.
Metalake: Gravitino's top-level namespace concept — a logical container for multiple catalogs belonging to an organization or project
Unified Namespace: Tables from all connected catalogs are addressable via a unified metalake.catalog.schema.table path

Apache Gravitino Federated Catalog Architecture diagram — Figure 1: Gravitino's federated architecture — connecting multiple catalogs under a unified metadata API.

Gravitino and Trino Integration

One of Gravitino's most practical integrations is with Trino. Gravitino provides a Trino connector that allows Trino to discover and query tables from all Gravitino-connected catalogs through Gravitino's unified namespace — without configuring each catalog separately in Trino's catalog configuration files.

This dramatically simplifies Trino configuration in multi-catalog environments: instead of maintaining dozens of Trino catalog configuration files (one per catalog backend), a single Gravitino connector configuration gives Trino access to all catalogs registered in Gravitino. As new catalogs are added to Gravitino, they are automatically discoverable by Trino without any Trino reconfiguration.

Gravitino Access Control

Gravitino adds a unified access control layer that spans all connected catalogs. Access policies defined in Gravitino apply across catalog types — a user granted read access to a specific schema in Gravitino's namespace has that access enforced regardless of whether the underlying catalog is Iceberg REST or HMS. This provides consistent governance without requiring separate policy management in each catalog system.

Summary

Apache Gravitino addresses a real pain point in large-scale data organizations: the proliferation of disconnected catalog systems. By federating multiple catalogs behind a unified metadata API and governance model, Gravitino reduces the operational complexity of multi-catalog environments and provides consistent access control across catalog types. For organizations with legacy HMS deployments, Iceberg REST catalogs, and relational databases that need a unified view of all data assets, Gravitino is the federation layer that makes the data lakehouse vision of unified data access practical.

What Is Apache Gravitino?

Gravitino's Federated Catalog Architecture

Gravitino and Trino Integration

Gravitino Access Control

Summary

Related Concepts

Go Deeper — Recommended Resources