What Is the Hive Metastore?

The Hive Metastore (HMS) is the centralized metadata store for the Apache Hive ecosystem — the catalog that holds table schemas, partition metadata, and storage locations for tables in Hadoop data lakes. Originally developed as part of Apache Hive, HMS became the de facto standard catalog for the entire Hadoop ecosystem: MapReduce, Spark, Presto/Trino, Impala, and most cloud data lake services all support HMS connectivity.

HMS stores its metadata in a relational database — most commonly MySQL or PostgreSQL — and exposes a Thrift RPC API for client access. Engines connect to HMS by configuring the Thrift endpoint, and HMS returns table schemas, partition lists, and file locations from its relational backend.

At its peak (2015–2020), the Hive Metastore was the catalog for essentially every enterprise data lake built on Hadoop or cloud object storage. Many organizations have years of institutional investment in HMS and thousands of tables registered there. Migration away from HMS to modern REST catalogs is a significant, ongoing industry effort.

HMS Limitations for Apache Iceberg

While Iceberg can use HMS as a catalog (via the HiveCatalog), HMS was designed for Hive tables — not Iceberg's snapshot-based metadata model. Key limitations:

  • No REST API: HMS uses Thrift, not HTTP REST. Every engine needs Thrift client libraries and HMS-specific configuration — no plug-and-play interoperability.
  • No credential vending: HMS returns storage paths but not storage credentials. Engines need separate, broad long-lived credentials to access data files.
  • No RBAC at catalog level: HMS has no built-in access control — security must be layered on top via Apache Ranger or cloud IAM policies applied to storage paths.
  • Partition scalability limit: HMS stores every partition as a row in a relational DB. Tables with millions of partitions cause HMS performance degradation — a well-known scaling bottleneck that Iceberg's manifest-based metadata avoids entirely.
  • Not ACID-aware: HMS metadata operations are not atomically committed — concurrent writes can leave HMS in inconsistent state for high-concurrency workloads.
Hive Metastore vs Iceberg REST Catalog diagram
Figure 1: HMS limitations vs modern Iceberg REST catalogs — the migration path for legacy data lakes.

HMS in the Migration Era

For organizations migrating from Hadoop to the cloud lakehouse, HMS often remains the catalog during transition. The common migration pattern:

  1. Continue running HMS for existing Hive tables
  2. Register new Apache Iceberg tables in an Iceberg REST catalog (Polaris, Nessie, Glue)
  3. Migrate Hive tables to Iceberg format in-place (using Iceberg's migrate procedure)
  4. Re-register migrated tables in the Iceberg REST catalog
  5. Decommission HMS when all tables are migrated

The Iceberg HMS HiveCatalog supports this migration: migrated Iceberg tables registered in HMS work for read, and engines can use both HMS (for legacy Hive tables) and a REST catalog (for new Iceberg tables) simultaneously during the migration period.

AWS Glue as HMS Replacement

On AWS, AWS Glue Data Catalog is the managed, serverless replacement for self-hosted HMS. Glue is HMS-compatible (supports the HMS Thrift API) and adds cloud-native features: serverless scaling, IAM integration, and native S3 integration. Most AWS Hadoop migrations replace self-hosted HMS with Glue Data Catalog as an intermediate step before moving to a full Iceberg REST catalog.

HMS Migration Path to REST Catalog diagram
Figure 2: Migration path — from self-hosted HMS through Glue to Iceberg REST catalog.

Summary

The Hive Metastore is the legacy catalog of the Hadoop era — battle-tested, widely deployed, but fundamentally mismatched to the requirements of the modern open data lakehouse. Its lack of a REST API, credential vending, and scalable ACID metadata operations drove the development of the Iceberg REST Catalog specification and the modern catalog ecosystem (Apache Polaris, Project Nessie, AWS Glue REST). For most organizations, replacing HMS with a modern REST catalog is one of the highest-impact upgrades in the lakehouse modernization journey.