The Five-Layer Lakehouse Architecture
The lakehouse architecture is organized into five functional layers, each providing specific capabilities:
Layer 1: Storage
All data lives in cloud object storage (S3, ADLS, GCS) as Parquet data files and Avro metadata files. Storage is infinitely scalable, always-on, and billed separately from compute.
Layer 2: Table Format
Apache Iceberg organizes data files into tables with ACID transactions, schema enforcement, partition management, snapshot isolation, and time travel — providing database semantics on top of raw object storage.
Layer 3: Catalog
An Iceberg REST Catalog (Apache Polaris, Project Nessie, AWS Glue) tracks table metadata locations and enforces RBAC access control — the shared metadata service all engines connect through.
Layer 4: Compute
Decoupled query engines each handle specific workloads: Dremio for BI analytics and semantic layer, Spark for batch ETL, Flink for streaming ingestion, Trino for federated SQL.
Layer 5: Semantic & Governance
Semantic layer (Dremio VDSs), governance (RBAC, lineage, quality), and data catalog (discovery and documentation) make the lakehouse usable and trustworthy for the entire organization.

Lakehouse vs Data Warehouse vs Data Lake
| Dimension | Data Warehouse | Data Lake | Data Lakehouse |
|---|---|---|---|
| Storage | Proprietary | Object storage | Object storage |
| File format | Proprietary | Any (Parquet, CSV) | Open (Iceberg + Parquet) |
| ACID transactions | Yes | No | Yes (Iceberg) |
| Schema enforcement | Yes | No | Yes (Iceberg) |
| Multi-engine support | No | Partial | Yes (open format) |
| Storage cost | High | Low | Low |
| ML/AI support | Limited | Good | Excellent |

Summary
The data lakehouse architecture represents the convergence of the best properties of data warehouses (ACID, governance, performance) and data lakes (open formats, low cost, scalability) into a unified, open, multi-engine platform. Built on Apache Iceberg tables in cloud object storage, governed by open Iceberg REST Catalogs, and accessed by decoupled compute engines including Dremio, the lakehouse is the dominant enterprise data architecture of 2025 and beyond — providing the analytical capabilities of the warehouse at the economics and flexibility of the open data lake.