What Is dbt?
dbt (data build tool) is an open-source framework that enables data engineers and analysts to define data transformation logic as SQL SELECT statements — called models — that dbt compiles into DDL/DML and executes on a connected data platform. dbt handles the boilerplate of table/view creation, dependency ordering, incremental update logic, test execution, and documentation generation — allowing practitioners to focus on transformation logic rather than infrastructure plumbing.
In the data lakehouse context, dbt is the standard tool for the Transform step of the ELT pattern: raw data is loaded into Bronze Iceberg tables by ingestion pipelines (Flink, Airbyte, Spark batch jobs), and dbt models define the transformations that produce Silver (cleansed) and Gold (aggregated) Iceberg tables. dbt models are version-controlled SQL files — bringing software engineering practices (version control, CI/CD, peer review, automated testing) to data transformation.
dbt Model Types and Iceberg Materialization
dbt provides four model materialization types, each mapping to different Iceberg operations:
- view: Creates an Iceberg view (or VDS in Dremio) — no data storage, query runs against base tables at query time
- table: Creates a full Iceberg table on each run — rewrites all data from scratch. Correct for small models; expensive for large tables.
- incremental: On first run, creates the full table. On subsequent runs, processes only new records and merges them into the existing table using MERGE INTO or INSERT OVERWRITE PARTITION — the most important materialization type for Silver and Gold lakehouse tables
- ephemeral: Compiles to a CTE, not a stored table — used for intermediate transformation logic within a model

dbt Testing and Documentation for Lakehouse Quality
dbt's built-in testing framework is one of its most valuable features for lakehouse data quality governance:
Built-in tests: not_null (column has no null values), unique (column has no duplicate values), accepted_values (column contains only expected categorical values), relationships (foreign key integrity between tables)
Custom tests: SQL-based tests for domain-specific business rules (revenue_amount > 0, order_date <= current_date)
dbt's documentation auto-generation produces a browsable, searchable website showing all models, their SQL logic, column descriptions, test results, and an interactive lineage DAG showing how models depend on each other. This documentation is ingestible by OpenMetadata and DataHub for catalog-level lineage.

Summary
dbt is the SQL transformation standard for the modern data lakehouse — providing version-controlled, tested, documented transformation models that produce Silver and Gold Iceberg tables from Bronze raw data. Through adapters for Dremio (dbt-dremio), Spark (dbt-spark), and Trino (dbt-trino), dbt integrates with every major lakehouse execution engine — bringing software engineering discipline to data transformation and making data quality testing a first-class part of the ELT pipeline.