Why should Bronze tables be append-only?

Append-only Bronze tables preserve the complete historical record of all received data — including duplicates, errors, and schema variations that may need to be investigated or re-processed. Immutability makes Bronze the reliable audit trail: if a Silver transformation has a bug, you can always re-process from Bronze without re-extracting from the source. Updates would destroy this audit capability.

How should Bronze tables be partitioned?

Bronze tables are typically partitioned by ingestion_date or ingestion_timestamp_hour — the time the data arrived in the lakehouse, not the event time. This enables efficient querying of 'what was ingested today?' for pipeline monitoring, and allows Silver transformations to efficiently read only the new Bronze partitions since the last run.

Bronze Layer: The Definitive Guide for Apache Iceberg Lakehouse

Q: What is the Bronze Layer?

The Bronze Layer (also called the raw layer or ingestion layer) is the first tier of the Medallion Architecture. It stores raw data exactly as received from source systems — no cleaning, no deduplication, no transformation. Bronze tables are typically append-only Apache Iceberg tables partitioned by ingestion date, preserving a complete historical record of all source data received.

What Is the Bronze Layer?

The Bronze Layer is the first tier of the Medallion Architecture — the raw data landing zone where every record from every source system is stored exactly as received, without any transformation, cleaning, or deduplication. Bronze tables are the lakehouse's permanent, immutable record of all data ever ingested.

The Bronze Layer's design principle is: preserve everything, transform nothing. This immutability serves critical purposes: it creates a complete audit trail for compliance (proving exactly what data was received and when), enables re-processing (if a Silver transformation has a bug, Bronze data can be re-processed without re-extracting from the source), and preserves the ability to fix historical errors by reprocessing Bronze records with corrected transformation logic.

Bronze Table Design

Bronze tables are designed for ingestion performance and auditability, not analytical query performance:

Partition by ingestion date: Partition on _ingest_date (the date the record arrived, not the event date) — enables efficient 'show me today's new records' queries by Silver jobs without scanning historical partitions
Append-only: Only INSERT operations; no UPDATE or DELETE. New records are appended; even CDC DELETE events are appended as records with op_type='D'
Full source fidelity: All source columns preserved, including columns that may seem redundant or erroneous — data engineering is not responsible for source quality at this layer
Metadata columns added: _ingest_timestamp, _source_system, _pipeline_run_id — operational metadata for monitoring and debugging
Avro or Parquet: Typically Parquet for schema enforcement, though some teams use Avro for schema flexibility at the Bronze layer

Bronze Layer Design Pattern diagram — Figure 1: Bronze table design — append-only, ingestion-date partitioned, full-fidelity source records.

Bronze Layer Monitoring

The Bronze Layer requires specific monitoring to ensure ingestion pipeline health:

Freshness monitoring: How old is the newest record in each Bronze table? A 2-hour gap when freshness should be 5 minutes indicates a pipeline failure.
Volume monitoring: Are row counts per ingestion batch within expected ranges? A batch loading 1,000 rows when it typically loads 100,000 indicates a source issue or pipeline filter bug.
Schema drift detection: Have source columns changed type or disappeared? Bronze schema changes can break Silver transformations that reference specific columns.

These Bronze-layer monitoring checks are implemented by data observability platforms (Monte Carlo, Elementary) as freshness and volume checks on the Bronze Iceberg tables.

Bronze Layer Monitoring diagram — Figure 2: Bronze layer monitoring — freshness, volume, and schema drift checks for ingestion health.

Summary

The Bronze Layer is the bedrock of the Medallion Architecture — the immutable, complete record of all data the lakehouse has ever received. By designing Bronze tables as append-only, ingestion-date-partitioned Apache Iceberg tables with full source fidelity, data teams create a reliable foundation that enables safe re-processing, complete audit trails, and the debugging capability to trace any Silver or Gold data quality issue back to the exact source records that caused it. The Bronze Layer's value is proportional to its faithfulness to the source — the closer to raw, the more valuable for downstream re-processing and compliance.

What Is the Bronze Layer?

Bronze Table Design

Bronze Layer Monitoring

Summary

Related Concepts

Go Deeper — Recommended Resources