What Is AWS Glue Data Catalog?
AWS Glue Data Catalog is Amazon Web Services' serverless, managed metadata catalog — the central repository for table schemas, partition metadata, and data locations for data stored in AWS S3. It serves as the catalog for the entire AWS analytics ecosystem: Amazon Athena queries Glue for table metadata, Amazon EMR (Spark/Hive) uses Glue as its Hive Metastore replacement, Amazon Redshift Spectrum queries Glue-registered tables in S3, and AWS Lake Formation uses Glue as its metadata backbone for fine-grained access control.
Glue Data Catalog began as an HMS-compatible catalog — supporting the same Thrift API that Apache Hive uses, making it a drop-in replacement for self-hosted HMS with serverless scaling and AWS-managed availability. Over time, AWS has added native Apache Iceberg support and an Iceberg REST Catalog API, positioning Glue as the managed catalog for the AWS-based open lakehouse.
Glue and Apache Iceberg
AWS Glue supports Apache Iceberg in two modes:
Iceberg Tables in Glue (Backend Mode)
Spark jobs running on Amazon EMR can write Iceberg tables using Glue as the catalog backend. Glue stores the current metadata file location for each Iceberg table; the full Iceberg metadata tree (manifest list, manifests, data files) lives in S3. This is the most common EMR + Iceberg pattern on AWS.
Iceberg REST Catalog API
Glue now exposes an Iceberg REST Catalog API endpoint, allowing any Iceberg REST catalog client to connect to Glue directly — without HMS Thrift client configuration. This enables Spark, Trino, Flink, and other engines to use Glue as a REST catalog in their standard Iceberg catalog configuration.

Glue and AWS Lake Formation
AWS Lake Formation is AWS's data lake governance service, built on top of Glue Data Catalog. Lake Formation adds fine-grained access control to Glue-registered tables: column-level permissions, row-level filters, tag-based access policies, and cross-account data sharing.
For organizations building a governed lakehouse on AWS, the Glue + Lake Formation combination provides the access control layer without requiring a separate catalog deployment. Lake Formation permissions are enforced at the Glue catalog API level — engines querying Glue-registered Iceberg tables via Athena or EMR have their access controlled by Lake Formation policies, regardless of which engine is making the request.
Glue vs. Polaris and Nessie
| Dimension | AWS Glue | Apache Polaris | Project Nessie |
|---|---|---|---|
| Deployment | Managed AWS service | Self-hosted or managed | Self-hosted (open source) |
| Open source | No (proprietary) | Yes (ASF) | Yes (Apache) |
| Cloud portability | AWS only | Any cloud | Any cloud |
| Git-like branching | No | No | Yes |
| REST Catalog API | Yes | Yes | Yes |
| Lake Formation integration | Native | No | No |

Summary
AWS Glue Data Catalog is the practical default catalog for AWS-based data lakehouses. Its native integration with Athena, EMR, Redshift Spectrum, and Lake Formation makes it the path of least resistance for AWS-centric organizations. For organizations prioritizing cloud portability, open governance, or Git-like data versioning, Apache Polaris or Project Nessie are the open alternatives. Most AWS lakehouse deployments start with Glue and evaluate migration to open catalogs as multi-cloud or portability requirements emerge.