What Is the Medallion Architecture?
The Medallion Architecture (also called the Bronze-Silver-Gold architecture or multi-hop architecture) is the dominant data engineering design pattern for organizing data pipelines in the data lakehouse. It divides data storage into three progressively refined quality tiers, each serving as input to the next:
- Bronze: Raw data as ingested — exact copies of source records, often in append-only Iceberg tables, with no modifications
- Silver: Cleansed, validated, and conformed data — business rules applied, duplicates removed, schema standardized, CDC events merged into current state
- Gold: Business-ready analytical datasets — pre-aggregated metrics, dimensional models, domain-specific datasets tailored to specific business teams and use cases
The Medallion Architecture's core value is progressive data quality enhancement: raw data flows through transformations that incrementally increase trustworthiness and analytical readiness. Each tier has clear ownership, quality expectations, and access patterns — enabling data teams to reason clearly about which data to use for each purpose.
Bronze Layer: Raw Ingestion
The Bronze layer stores raw data exactly as received from source systems — no transformations, no cleaning, no deduplication. Its purpose is completeness and auditability: every record that arrived from every source is preserved in Bronze, regardless of quality.
Bronze tables are typically implemented as append-only Iceberg tables partitioned by ingestion date. Apache Flink or Spark Structured Streaming continuously appends incoming Kafka events or CDC records to Bronze. Schema evolution is common in Bronze — sources add new fields, and Iceberg's schema evolution capability allows new columns to be added without breaking existing queries.

Silver Layer: Cleansing and Conforming
The Silver layer transforms Bronze raw data into clean, validated, business-conforming records. Silver transformations include: deduplication (remove duplicate events), null handling (impute or reject records with critical null values), data type standardization (parse dates, normalize strings), business rule validation (reject orders with negative quantities), and CDC state management (MERGE INTO to apply INSERT/UPDATE/DELETE events to current-state records).
Silver tables use Iceberg's MERGE INTO for CDC pipelines — applying Bronze change events to maintain a current-state Silver table that represents the present state of the source operational system. Compaction is regularly run on Silver tables to merge the many small files created by frequent MERGE INTO operations into optimally-sized Parquet files.
Gold Layer: Business-Ready Analytics
The Gold layer is the final transformation tier — the data that business analysts, BI tools, and Dremio's semantic layer sit on top of. Gold tables are domain-specific, pre-aggregated datasets optimized for the access patterns of specific business teams:
- Sales Gold: Revenue metrics by region, product, and time with pre-computed growth rates
- Marketing Gold: Campaign performance datasets with attribution metrics
- Operations Gold: Order fulfillment KPIs and SLA tracking datasets
- Finance Gold: P&L datasets with account hierarchy aggregations
Gold tables are often Z-Ordered by the most common filter dimensions (date, region, product category) and have Dremio Reflections defined on top for sub-second BI query performance.

Summary
The Medallion Architecture is the most practical and widely adopted framework for organizing data lakehouse pipelines. Its Bronze-Silver-Gold tier structure provides clear quality expectations, ownership boundaries, and transformation patterns for data engineering teams. Implemented on Apache Iceberg, each layer benefits from Iceberg's ACID transactions (safe concurrent writes), schema evolution (Bronze handles source schema changes), MERGE INTO (Silver CDC pipelines), and partition evolution (Gold tables adapt partitioning as data grows). With Dremio's Reflections on Gold tables and a semantic layer on top, the Medallion Architecture delivers trusted, business-ready analytics to every user in the organization.