What Is an Upsert?
An upsert (a portmanteau of update and insert) is a database write operation that combines two behaviors in a single statement: if a record matching the specified key already exists in the target table, update it with the new values; if no matching record exists, insert it as a new row. Upsert is the fundamental operation for maintaining current-state tables from change data streams.
In the data lakehouse, upsert is the operation that makes Silver tables valuable — not just a raw historical log like Bronze tables, but a continuously updated current state that reflects the present state of source operational databases. A Silver orders table maintained via upsert always shows the current status of each order (not a log of all status changes), enabling analysts to query 'how many orders are currently pending?' rather than 'how many order_status = pending events have ever occurred?'
MERGE INTO: Iceberg's Upsert Implementation
Apache Iceberg implements upsert through the ANSI standard MERGE INTO statement:
MERGE INTO silver.orders AS target
USING bronze.orders_cdc AS source
ON target.order_id = source.order_id
WHEN MATCHED AND source.op_type = 'D'
THEN DELETE
WHEN MATCHED AND source.op_type IN ('U', 'R')
THEN UPDATE SET
target.status = source.status,
target.updated_at = source.updated_at
WHEN NOT MATCHED AND source.op_type = 'I'
THEN INSERT (order_id, status, created_at, updated_at)
VALUES (source.order_id, source.status, source.created_at, source.updated_at);This MERGE INTO processes a batch of CDC events (INSERTs, UPDATEs, DELETEs) from the Bronze CDC table and applies them atomically to the Silver current-state table — maintaining a correct, current-state representation of the source orders table.

CoW vs MoR Upsert Strategies
Apache Iceberg supports two physical write strategies for upsert operations:
Copy-on-Write (CoW)
When a MERGE INTO updates a row, Iceberg rewrites the entire Parquet data file containing that row — incorporating the updated value into a new file and marking the old file as deleted. Read queries require no merge logic — every data file contains only current values. Best for tables with low update frequency but high read volume.
Merge-on-Read (MoR)
When a MERGE INTO updates a row, Iceberg writes a small equality delete file (marking the old row as deleted by its primary key) and a new data file (containing the updated row). Old data files are not rewritten. Read queries must merge data files with delete files to produce current values. Best for high-frequency CDC streams where rewriting large files on every update is impractical. Compaction periodically merges delete files back into data files to maintain read performance.

Summary
Upsert via MERGE INTO is the cornerstone DML operation for maintaining current-state Silver tables in the data lakehouse. By combining CDC events with Iceberg's MERGE INTO (implemented as either Copy-on-Write or Merge-on-Read depending on update frequency), data teams create Silver tables that continuously reflect the present state of source operational databases — enabling analysts to ask current-state questions of lakehouse data with the same confidence as querying the operational system directly, but with the scale, governance, and analytical richness of the Apache Iceberg lakehouse.