Does AWS Glue support Apache Iceberg?

Yes. AWS Glue Data Catalog supports Apache Iceberg tables in two ways: as an Iceberg catalog backend (storing Iceberg metadata pointers for Iceberg tables written by EMR/Spark), and via the Iceberg REST Catalog API (allowing any Iceberg REST catalog client to create and query Iceberg tables in Glue). Athena v3 can query Iceberg tables registered in Glue natively.

Is AWS Glue Data Catalog open source?

No. AWS Glue Data Catalog is a proprietary AWS managed service. While it exposes open APIs (HMS-compatible Thrift and Iceberg REST), the catalog itself is AWS infrastructure. This creates some vendor dependency — catalog state is stored in AWS, and migrating to another catalog requires table re-registration.

AWS Glue Data Catalog: The Definitive Guide

Q: What is AWS Glue Data Catalog?

AWS Glue Data Catalog is Amazon Web Services' serverless, managed metadata catalog for data stored on AWS. It stores table schemas, partition information, and storage locations, and now supports Apache Iceberg tables via the Iceberg REST Catalog API. It integrates natively with Amazon Athena, Amazon EMR, Amazon Redshift Spectrum, and AWS Lake Formation.

What Is AWS Glue Data Catalog?

AWS Glue Data Catalog is Amazon Web Services' serverless, managed metadata catalog — the central repository for table schemas, partition metadata, and data locations for data stored in AWS S3. It serves as the catalog for the entire AWS analytics ecosystem: Amazon Athena queries Glue for table metadata, Amazon EMR (Spark/Hive) uses Glue as its Hive Metastore replacement, Amazon Redshift Spectrum queries Glue-registered tables in S3, and AWS Lake Formation uses Glue as its metadata backbone for fine-grained access control.

Glue Data Catalog began as an HMS-compatible catalog — supporting the same Thrift API that Apache Hive uses, making it a drop-in replacement for self-hosted HMS with serverless scaling and AWS-managed availability. Over time, AWS has added native Apache Iceberg support and an Iceberg REST Catalog API, positioning Glue as the managed catalog for the AWS-based open lakehouse.

Glue and Apache Iceberg

AWS Glue supports Apache Iceberg in two modes:

Iceberg Tables in Glue (Backend Mode)

Spark jobs running on Amazon EMR can write Iceberg tables using Glue as the catalog backend. Glue stores the current metadata file location for each Iceberg table; the full Iceberg metadata tree (manifest list, manifests, data files) lives in S3. This is the most common EMR + Iceberg pattern on AWS.

Iceberg REST Catalog API

Glue now exposes an Iceberg REST Catalog API endpoint, allowing any Iceberg REST catalog client to connect to Glue directly — without HMS Thrift client configuration. This enables Spark, Trino, Flink, and other engines to use Glue as a REST catalog in their standard Iceberg catalog configuration.

AWS Glue Data Catalog Architecture diagram — Figure 1: Glue Data Catalog in the AWS lakehouse — HMS-compatible and Iceberg REST API for all engines.

Glue and AWS Lake Formation

AWS Lake Formation is AWS's data lake governance service, built on top of Glue Data Catalog. Lake Formation adds fine-grained access control to Glue-registered tables: column-level permissions, row-level filters, tag-based access policies, and cross-account data sharing.

For organizations building a governed lakehouse on AWS, the Glue + Lake Formation combination provides the access control layer without requiring a separate catalog deployment. Lake Formation permissions are enforced at the Glue catalog API level — engines querying Glue-registered Iceberg tables via Athena or EMR have their access controlled by Lake Formation policies, regardless of which engine is making the request.

Glue vs. Polaris and Nessie

Dimension	AWS Glue	Apache Polaris	Project Nessie
Deployment	Managed AWS service	Self-hosted or managed	Self-hosted (open source)
Open source	No (proprietary)	Yes (ASF)	Yes (Apache)
Cloud portability	AWS only	Any cloud	Any cloud
Git-like branching	No	No	Yes
REST Catalog API	Yes	Yes	Yes
Lake Formation integration	Native	No	No

Glue vs Open Catalogs Comparison diagram — Figure 2: Glue vs open catalog alternatives — AWS-native governance vs cloud-portable open standards.

Summary

AWS Glue Data Catalog is the practical default catalog for AWS-based data lakehouses. Its native integration with Athena, EMR, Redshift Spectrum, and Lake Formation makes it the path of least resistance for AWS-centric organizations. For organizations prioritizing cloud portability, open governance, or Git-like data versioning, Apache Polaris or Project Nessie are the open alternatives. Most AWS lakehouse deployments start with Glue and evaluate migration to open catalogs as multi-cloud or portability requirements emerge.