The Five-Layer Lakehouse Architecture

The lakehouse architecture is organized into five functional layers, each providing specific capabilities:

Layer 1: Storage

All data lives in cloud object storage (S3, ADLS, GCS) as Parquet data files and Avro metadata files. Storage is infinitely scalable, always-on, and billed separately from compute.

Layer 2: Table Format

Apache Iceberg organizes data files into tables with ACID transactions, schema enforcement, partition management, snapshot isolation, and time travel — providing database semantics on top of raw object storage.

Layer 3: Catalog

An Iceberg REST Catalog (Apache Polaris, Project Nessie, AWS Glue) tracks table metadata locations and enforces RBAC access control — the shared metadata service all engines connect through.

Layer 4: Compute

Decoupled query engines each handle specific workloads: Dremio for BI analytics and semantic layer, Spark for batch ETL, Flink for streaming ingestion, Trino for federated SQL.

Layer 5: Semantic & Governance

Semantic layer (Dremio VDSs), governance (RBAC, lineage, quality), and data catalog (discovery and documentation) make the lakehouse usable and trustworthy for the entire organization.

Five Layer Lakehouse Architecture diagram
Figure 1: The five-layer lakehouse architecture — storage, table format, catalog, compute, and governance.

Lakehouse vs Data Warehouse vs Data Lake

DimensionData WarehouseData LakeData Lakehouse
StorageProprietaryObject storageObject storage
File formatProprietaryAny (Parquet, CSV)Open (Iceberg + Parquet)
ACID transactionsYesNoYes (Iceberg)
Schema enforcementYesNoYes (Iceberg)
Multi-engine supportNoPartialYes (open format)
Storage costHighLowLow
ML/AI supportLimitedGoodExcellent
Lakehouse vs Warehouse vs Lake diagram
Figure 2: Architecture comparison — data warehouse, data lake, and data lakehouse on key dimensions.

Summary

The data lakehouse architecture represents the convergence of the best properties of data warehouses (ACID, governance, performance) and data lakes (open formats, low cost, scalability) into a unified, open, multi-engine platform. Built on Apache Iceberg tables in cloud object storage, governed by open Iceberg REST Catalogs, and accessed by decoupled compute engines including Dremio, the lakehouse is the dominant enterprise data architecture of 2025 and beyond — providing the analytical capabilities of the warehouse at the economics and flexibility of the open data lake.