What Is OpenMetadata?

OpenMetadata is an open-source metadata management and data catalog platform that provides a unified, API-first foundation for data discovery, lineage tracking, quality monitoring, collaboration, and governance across the modern data stack. It was designed from the ground up for cloud-native architectures — with native integrations for Apache Iceberg, dbt, Apache Spark, Apache Airflow, and over 80 data sources and services.

OpenMetadata consolidates capabilities that organizations previously needed three separate tools to achieve: a data catalog (for discovery and documentation), a lineage platform (for data flow tracking), and a quality monitoring tool (for automated quality checks and scoring). Its unified metadata API enables programmatic integration with any tool in the data stack, making it the metadata backbone for the modern lakehouse.

Core Capabilities

  • Data Discovery: Full-text search across all registered assets with intelligent filters (type, domain, owner, tier, tags). Similarity recommendations suggest related assets based on schema and usage patterns.
  • Automated Metadata Ingestion: Scheduled connectors automatically ingest schema metadata from Iceberg catalogs, Spark job metadata from Spark History Server, dbt model definitions from manifest.json, and query logs from various engines.
  • Data Lineage: Automatically parsed from SQL query logs, dbt manifests, and pipeline metadata. Column-level lineage available for dbt models and query log parsing.
  • Data Quality: Native integration with dbt tests and Great Expectations. Quality scores computed from test results are displayed alongside table metadata. Quality alerts notify asset owners of quality degradation.
  • Collaboration: Inline comments and discussions on any metadata entity. Activity feeds showing recent changes. Automated notifications for schema changes and quality alerts.
OpenMetadata Platform Architecture diagram
Figure 1: OpenMetadata unified platform — catalog, lineage, quality, and collaboration in one system.

OpenMetadata and Iceberg Integration

OpenMetadata integrates with the Apache Iceberg lakehouse through multiple connectors:

  • Iceberg catalog connectors: Ingest table schemas, partition specs, and column statistics from Glue Data Catalog, custom REST catalog endpoints, and Hive Metastore (for Iceberg-over-Hive tables)
  • dbt integration: Ingest dbt model definitions, test results, and table-level lineage from dbt manifest.json — the most complete source of transformation lineage for dbt-based Iceberg pipelines
  • Airflow integration: Ingest DAG metadata and pipeline lineage from Apache Airflow — showing which Airflow DAGs produce which Iceberg tables
  • Query log lineage: Parse SQL query logs from engines to extract runtime lineage — which tables are read and written by each query execution
OpenMetadata Iceberg Integration diagram
Figure 2: OpenMetadata Iceberg integration — catalog, dbt, and Airflow connectors for complete metadata.

Summary

OpenMetadata is the modern open-source data governance platform for the cloud-native data lakehouse — combining catalog, lineage, quality, and collaboration in an API-first, actively maintained platform with broad connector coverage. For organizations building new cloud lakehouses or migrating from Hadoop (and seeking an alternative to Apache Atlas), OpenMetadata provides the comprehensive metadata management foundation that makes data assets discoverable, trustworthy, and governable across the entire analytics stack.