# DataLakehouse101.com > DataLakehouse101.com is the definitive knowledge hub for the modern data lakehouse — 100 authoritative, expert-written guides on Apache Iceberg, Dremio, open table formats, catalog technologies, governance, analytics, and AI-native data architectures. Written by Alex Merced, VP of Developer Relations at Dremio and author of multiple books on data lakehouse architecture. ## About - [Home](https://datalakehouse101.com/): Introduction to the data lakehouse, key concepts, and navigation to all resources - [Knowledge Base](https://datalakehouse101.com/knowledge/): Index of all 100 definitive guides on data lakehouse terms ## Deep Dive & Cornerstone Guides - [Agentic Analytics & the Semantic Layer: The Complete Guide](https://datalakehouse101.com/agentic-analytics/): A complete guide to Agentic Analytics and the Semantic Layer. Learn how AI agents query enterprise data, why a semantic layer is essential, and how to build one on Apache Iceberg and Dremio. - [What is an Agentic Lakehouse? Reference Architecture & Guide](https://datalakehouse101.com/agentic-lakehouse/): Discover the Agentic Lakehouse architecture. Learn how semantic layers, governed access, and Apache Iceberg create a trusted foundation for autonomous AI agents. - [Apache Iceberg Explained: The Open Table Format Standard](https://datalakehouse101.com/apache-iceberg/): A comprehensive guide to Apache Iceberg. Learn what Iceberg is, why it was created, its core abstraction, and how it enables the modern data lakehouse. - [Apache Iceberg Architecture: The Definitive Technical Guide](https://datalakehouse101.com/apache-iceberg-architecture/): A 3,000-word deep dive into the Apache Iceberg architecture. Understand the metadata tree, manifest lists, snapshots, commit flows, and how Iceberg enables ACID transactions on the data lake. - [Iceberg REST Catalog, Apache Polaris & Catalog Interoperability Guide](https://datalakehouse101.com/apache-iceberg-rest-catalog/): A complete guide to the Iceberg REST Catalog spec, Apache Polaris, Project Nessie, and catalog interoperability. Learn how to choose and configure an Iceberg catalog for your lakehouse. - [Apache Iceberg Schema Evolution & Hidden Partitioning: Complete Guide](https://datalakehouse101.com/apache-iceberg-schema-evolution/): A complete guide to Apache Iceberg schema evolution and hidden partitioning. Learn how to safely add, drop, and rename columns and change partition strategies without rewriting data. - [Apache Iceberg Snapshots, Time Travel & Rollbacks: Complete Guide](https://datalakehouse101.com/apache-iceberg-snapshots-and-time-travel/): Master Apache Iceberg snapshots. Learn how time travel, point-in-time queries, and rollbacks work under the hood using Iceberg's snapshot isolation model. - [Apache Iceberg vs Delta Lake vs Apache Hudi: The 2026 Guide](https://datalakehouse101.com/apache-iceberg-vs-delta-lake-vs-hudi/): A deep, unbiased technical comparison of the top open table formats: Apache Iceberg, Delta Lake, and Apache Hudi. Learn their architectural differences and which to choose. - [What is a Data Lakehouse? The Definitive Guide](https://datalakehouse101.com/data-lakehouse/): The definitive guide to the Data Lakehouse architecture. Learn how it combines data lake flexibility with data warehouse performance, and why it has become the standard for modern analytics. - [Data Lakehouse vs. Data Lake vs. Data Warehouse: 2026 Comparison](https://datalakehouse101.com/data-lakehouse-vs-data-lake-vs-data-warehouse/): Understand the evolution of enterprise data architecture. Compare Data Warehouses, Data Lakes, and Data Lakehouses on cost, performance, governance, and AI capabilities. - [Lakehouse for AI Agents: Architecture & Implementation Guide](https://datalakehouse101.com/lakehouse-for-ai-agents/): Learn how to build a data lakehouse that safely serves autonomous AI agents. Covers architecture, MCP integration, governed access, semantic context, and workload isolation patterns. - [Open Table Formats Explained: Iceberg, Delta Lake & Hudi](https://datalakehouse101.com/open-table-formats/): A comprehensive guide to open table formats. Learn what they are, why they exist, how Apache Iceberg, Delta Lake, and Apache Hudi work, and which to choose for your data lakehouse. ## Knowledge Base Glossary - [ACID Transactions in the Data Lakehouse: The Definitive Guide](https://datalakehouse101.com/knowledge/acid-transactions.html): Learn what ACID transactions are, why they matter in the data lakehouse, and how Apache Iceberg implements Atomicity, Consistency, Isolation, and Durability on cloud object storage. - [Agentic Lakehouse: The Definitive Guide](https://datalakehouse101.com/knowledge/agentic-lakehouse.html): Learn what the Agentic Lakehouse is, how AI agents autonomously discover and query Apache Iceberg data via Dremio's MCP server, and why it is the next evolution of enterprise analytics. - [AI Semantic Layer in Dremio: The Definitive Guide](https://datalakehouse101.com/knowledge/ai-semantic-layer.html): Learn what Dremio's AI Semantic Layer is, how it enables natural language data access for AI agents via MCP, and why it is the bridge between enterprise data and agentic AI. - [Amazon S3 for Data Lakehouse: The Definitive Guide](https://datalakehouse101.com/knowledge/amazon-s3.html): Learn how Amazon S3 serves as the object storage foundation for data lakehouses on AWS, including S3 storage classes, strong consistency, and integration with Apache Iceberg. - [Apache Arrow: The Definitive Guide](https://datalakehouse101.com/knowledge/apache-arrow.html): Learn what Apache Arrow is, how its in-memory columnar format powers vectorized query engines, and why Arrow Flight SQL is the high-performance data access protocol for the lakehouse. - [Apache Atlas: The Definitive Guide for Data Governance](https://datalakehouse101.com/knowledge/apache-atlas.html): Learn what Apache Atlas is, how it provides metadata management and governance for the Hadoop and lakehouse ecosystem, and how it integrates with Apache Ranger for comprehensive governance. - [Apache Avro: The Definitive Guide](https://datalakehouse101.com/knowledge/apache-avro.html): Learn what Apache Avro is, how its schema-embedded row format is used for streaming and Iceberg metadata, and how it differs from Parquet and ORC in the data lakehouse. - [Apache Flink: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/apache-flink.html): Learn what Apache Flink is, how it enables real-time streaming ingestion into Apache Iceberg tables, and why Flink is the engine of choice for CDC and event-driven lakehouse pipelines. - [Apache Gravitino: The Definitive Guide](https://datalakehouse101.com/knowledge/apache-gravitino.html): Learn what Apache Gravitino is, how it acts as a universal metadata layer federating multiple catalogs, and its role in multi-catalog lakehouse governance. - [Apache Hudi: The Definitive Guide](https://datalakehouse101.com/knowledge/apache-hudi.html): Learn what Apache Hudi is, how its incremental processing model works, how it compares to Apache Iceberg and Delta Lake, and when to use Hudi in your data lakehouse. - [Apache Iceberg: The Definitive Guide](https://datalakehouse101.com/knowledge/apache-iceberg.html): Learn what Apache Iceberg is, how its metadata architecture works, and why it is the industry-standard open table format for the modern data lakehouse in 2025. - [Apache Kafka: The Definitive Guide for Data Lakehouse Ingestion](https://datalakehouse101.com/knowledge/apache-kafka.html): Learn how Apache Kafka enables streaming data ingestion into the lakehouse, its role in CDC pipelines, and how Kafka connects operational systems to Apache Iceberg tables. - [Apache ORC: The Definitive Guide](https://datalakehouse101.com/knowledge/apache-orc.html): Learn what Apache ORC is, how its columnar format compares to Parquet, and its role in the Hive ecosystem and modern Apache Iceberg data lakehouses. - [Apache Parquet: The Definitive Guide](https://datalakehouse101.com/knowledge/apache-parquet.html): Learn what Apache Parquet is, how its columnar format enables efficient analytics, and why it is the default file format for Apache Iceberg and the data lakehouse. - [Apache Polaris: The Definitive Guide](https://datalakehouse101.com/knowledge/apache-polaris.html): Learn what Apache Polaris is, how this ASF-governed Iceberg REST Catalog enables multi-engine interoperability, and why it is the open standard for lakehouse catalog governance. - [Apache Ranger: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/apache-ranger.html): Learn what Apache Ranger is, how it provides centralized security policy management for the Hadoop and lakehouse ecosystem, and its integration with Apache Iceberg and Hive. - [Apache Spark: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/apache-spark.html): Learn what Apache Spark is, how it powers Iceberg ETL and ML workloads, and when to use Spark vs Dremio, Trino, or Flink in your data lakehouse architecture. - [Automated Table Optimization: The Definitive Guide](https://datalakehouse101.com/knowledge/automated-table-optimization.html): Learn what automated table optimization is, how Dremio and Apache Iceberg automate compaction, Z-ordering, and snapshot expiry, and why it eliminates manual table maintenance. - [Autonomous Reflections in Dremio: The Definitive Guide](https://datalakehouse101.com/knowledge/autonomous-reflections.html): Learn how Dremio Autonomous Reflections automatically analyze query patterns and create, update, and drop Reflections to optimize lakehouse performance without manual tuning. - [AWS Glue Data Catalog: The Definitive Guide](https://datalakehouse101.com/knowledge/aws-glue-catalog.html): Learn what AWS Glue Data Catalog is, how it supports Apache Iceberg as a serverless managed catalog on AWS, and how it compares to Polaris and Nessie for the open lakehouse. - [Azure ADLS: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/azure-adls.html): Learn what Azure Data Lake Storage Gen2 is, how it serves as the object storage foundation for Azure-based Apache Iceberg lakehouses, and its key features for enterprise data. - [Batch Processing: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/batch-processing.html): Learn what batch processing is, how Apache Spark handles large-scale batch ETL into Apache Iceberg, and when to choose batch over streaming for lakehouse pipelines. - [Bronze Layer: The Definitive Guide for Apache Iceberg Lakehouse](https://datalakehouse101.com/knowledge/bronze-layer.html): Learn what the Bronze Layer is in the Medallion Architecture, how it stores raw ingested data in Apache Iceberg, and best practices for Bronze table design. - [Catalog Interoperability: The Definitive Guide](https://datalakehouse101.com/knowledge/catalog-interoperability.html): Learn what catalog interoperability means in the data lakehouse, how the Iceberg REST Catalog specification enables it, and why it is essential for the open, multi-engine lakehouse. - [Change Data Capture (CDC): The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/change-data-capture.html): Learn what Change Data Capture is, how Debezium and Apache Flink stream CDC events into Apache Iceberg tables, and how CDC enables near-real-time lakehouse data freshness. - [Column-Level Security: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/column-level-security.html): Learn what column-level security is, how data masking and column visibility controls protect sensitive data in Apache Iceberg and Dremio, and why it enables PII governance. - [Column Pruning: The Definitive Guide](https://datalakehouse101.com/knowledge/column-pruning.html): Learn what column pruning is, how it works with Apache Parquet's columnar storage, and why it dramatically reduces I/O for analytical queries in the data lakehouse. - [Compaction in Apache Iceberg: The Definitive Guide](https://datalakehouse101.com/knowledge/compaction.html): Learn what compaction is in Apache Iceberg, why it is essential for lakehouse performance, how Copy-on-Write and Merge-on-Read compaction work, and how Dremio automates it. - [Copy-on-Write (CoW) in Apache Iceberg: The Definitive Guide](https://datalakehouse101.com/knowledge/copy-on-write.html): Learn what Copy-on-Write means in Apache Iceberg, when to use CoW vs Merge-on-Read, and how CoW updates and deletes work for optimal read performance in the data lakehouse. - [Data Catalog: The Definitive Guide](https://datalakehouse101.com/knowledge/data-catalog.html): Learn what a data catalog is, how it differs from a technical metadata store, and why business data catalogs are essential for self-service analytics in the modern data lakehouse. - [Data Engineering: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/data-engineering.html): Learn what data engineering is, the key skills and tools data engineers use to build Apache Iceberg lakehouse pipelines, and the modern data engineering career path. - [Data Fabric: The Definitive Guide](https://datalakehouse101.com/knowledge/data-fabric.html): Learn what a data fabric is, how it provides a unified data management layer across distributed data sources, and how it relates to the data lakehouse and data mesh architectures. - [Data Governance in the Data Lakehouse: The Definitive Guide](https://datalakehouse101.com/knowledge/data-governance.html): Learn what data governance means in the data lakehouse, how access control, lineage, quality, and catalog management implement it, and why governance is essential for enterprise lakehouses. - [Data Ingestion: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/data-ingestion.html): Learn what data ingestion is, the key ingestion patterns for Apache Iceberg lakehouses, and how to choose between streaming, batch, and federated ingestion approaches. - [Data Lake: The Definitive Guide](https://datalakehouse101.com/knowledge/data-lake.html): Learn what a data lake is, how it works, its strengths and weaknesses, and how it evolved into the modern data lakehouse with Apache Iceberg. - [Data Lakehouse: The Definitive Guide](https://datalakehouse101.com/knowledge/data-lakehouse.html): Learn what a data lakehouse is, how it works, why it replaced the data warehouse for modern analytics, and how to build one with Apache Iceberg and Dremio. - [Data Lineage: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/data-lineage.html): Learn what data lineage is, how it tracks data flow through lakehouse pipelines, and why lineage is essential for governance, debugging, and regulatory compliance. - [Data Mesh: The Definitive Guide](https://datalakehouse101.com/knowledge/data-mesh.html): Learn what Data Mesh is, its four core principles, how it differs from centralized data platforms, and how to implement Data Mesh with Apache Iceberg and Dremio. - [Data Observability: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/data-observability.html): Learn what data observability is, how it monitors freshness, volume, schema, and distribution in Apache Iceberg pipelines, and why it is essential for reliable lakehouse operations. - [Data Quality: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/data-quality.html): Learn what data quality means in the data lakehouse, the key data quality dimensions, and how tools like Great Expectations and dbt tests enforce quality in Iceberg pipelines. - [Data Science on the Data Lakehouse: The Definitive Guide](https://datalakehouse101.com/knowledge/data-science-lakehouse.html): Learn how data scientists use Apache Iceberg tables, PyIceberg, and Dremio's AI Semantic Layer for ML feature engineering, model training, and AI-ready data access. - [Data Skipping: The Definitive Guide](https://datalakehouse101.com/knowledge/data-skipping.html): Learn what data skipping is in Apache Iceberg, how file-level statistics enable skipping irrelevant data files, and why data skipping is critical for lakehouse query performance. - [Data Warehouse: The Definitive Guide](https://datalakehouse101.com/knowledge/data-warehouse.html): Learn what a data warehouse is, how it works, its architecture patterns, key limitations, and how it compares to the modern data lakehouse built on Apache Iceberg. - [dbt (Data Build Tool): The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/dbt.html): Learn what dbt is, how it implements SQL-based ELT transformations on Apache Iceberg tables, and why dbt-dremio and dbt-spark are key tools in the modern lakehouse pipeline. - [Decoupled Storage and Compute: The Definitive Guide](https://datalakehouse101.com/knowledge/decoupled-storage-compute.html): Learn what decoupled storage and compute means in the data lakehouse, how Apache Iceberg on object storage enables it, and why it is the key economic advantage over data warehouses. - [Delta Lake: The Definitive Guide](https://datalakehouse101.com/knowledge/delta-lake.html): Learn what Delta Lake is, how it works, how it compares to Apache Iceberg, and when to choose Delta Lake for your data lakehouse in 2025. - [Dremio Cloud: The Definitive Guide](https://datalakehouse101.com/knowledge/dremio-cloud.html): Learn what Dremio Cloud is, how its serverless lakehouse platform works on AWS and Azure, and how it compares to Snowflake and Databricks for open lakehouse analytics. - [Dremio Open Catalog: The Definitive Guide](https://datalakehouse101.com/knowledge/dremio-open-catalog.html): Learn what Dremio Open Catalog is, how it implements the Iceberg REST Catalog spec with Git-like Nessie branching, and why it enables true multi-engine lakehouse interoperability. - [Dremio Intelligent Query Engine: The Definitive Guide](https://datalakehouse101.com/knowledge/dremio-query-engine.html): Learn how Dremio's Intelligent Query Engine uses Apache Arrow vectorized execution, Reflection acceleration, and adaptive optimization to deliver sub-second queries on Apache Iceberg data. - [Dremio Reflections: The Definitive Guide](https://datalakehouse101.com/knowledge/dremio-reflections.html): Learn what Dremio Reflections are, how raw and aggregation Reflections work, and how they deliver sub-second BI performance on Apache Iceberg data without query rewrites. - [Dremio: The Definitive Guide](https://datalakehouse101.com/knowledge/dremio.html): Learn what Dremio is, how its intelligent query engine works, and why it is the leading data lakehouse platform for Apache Iceberg, semantic layers, and self-service analytics. - [ELT (Extract, Load, Transform): The Definitive Guide](https://datalakehouse101.com/knowledge/elt.html): Learn what ELT is, how Extract Load Transform differs from ETL, why ELT is the dominant pattern in the modern data lakehouse, and how to implement it with Apache Iceberg and dbt. - [ETL: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/etl.html): Learn what ETL is, how it differs from ELT in the data lakehouse, and how Apache Spark and Flink implement modern ETL pipelines into Apache Iceberg tables. - [Feature Store: The Definitive Guide for Data Lakehouse ML](https://datalakehouse101.com/knowledge/feature-store.html): Learn what a feature store is, how Apache Iceberg tables serve as open feature stores, and how feature stores bridge ML engineering and data lakehouse pipelines. - [Gold Layer: The Definitive Guide for Apache Iceberg Lakehouse](https://datalakehouse101.com/knowledge/gold-layer.html): Learn what the Gold Layer is in the Medallion Architecture, how it delivers pre-aggregated business metrics on Apache Iceberg, and best practices for Gold table design and optimization. - [Google Cloud Storage (GCS): The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/google-cloud-storage.html): Learn how Google Cloud Storage serves as the object storage foundation for GCP-based Apache Iceberg data lakehouses and its key features for lakehouse architecture. - [Apache Hadoop HDFS: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/hdfs.html): Learn what Apache Hadoop HDFS is, its role as the original data lake storage layer, and why modern lakehouses are migrating from HDFS to cloud object storage. - [Hidden Partitioning in Apache Iceberg: The Definitive Guide](https://datalakehouse101.com/knowledge/hidden-partitioning.html): Learn how Apache Iceberg hidden partitioning works, why it eliminates the need to write partition-aware queries, and how it enables automatic partition pruning for any query engine. - [Hive Metastore: The Definitive Guide](https://datalakehouse101.com/knowledge/hive-metastore.html): Learn what the Hive Metastore is, how it stores Hadoop table metadata, its limitations for Apache Iceberg, and why modern lakehouses are migrating to REST catalog alternatives. - [Iceberg Manifest Files: The Definitive Guide](https://datalakehouse101.com/knowledge/iceberg-manifest-files.html): Learn what Apache Iceberg manifest files are, how they store file-level statistics for data skipping, and how the manifest tree enables efficient query planning at petabyte scale. - [Iceberg REST Catalog: The Definitive Guide](https://datalakehouse101.com/knowledge/iceberg-rest-catalog.html): Learn what the Apache Iceberg REST Catalog specification is, how it enables multi-engine catalog interoperability, and which implementations power modern data lakehouses. - [Iceberg Snapshots: The Definitive Guide](https://datalakehouse101.com/knowledge/iceberg-snapshots.html): Learn what Apache Iceberg snapshots are, how they enable ACID transactions and time travel, and how Dremio leverages the snapshot model for reliable lakehouse analytics. - [Lakehouse Architecture: The Definitive Guide](https://datalakehouse101.com/knowledge/lakehouse-architecture.html): Learn what the data lakehouse architecture is, its key layers and components, and how Apache Iceberg, object storage, and query engines combine into the complete lakehouse stack. - [Lakehouse Federation: The Definitive Guide](https://datalakehouse101.com/knowledge/lakehouse-federation.html): Learn what lakehouse federation is, how engines like Dremio and Trino query across Iceberg tables and external databases simultaneously, and why federation eliminates data silos. - [Medallion Architecture: The Definitive Guide](https://datalakehouse101.com/knowledge/medallion-architecture.html): Learn what the Medallion Architecture is, how Bronze, Silver, and Gold layers organize data lakehouse pipelines, and best practices for implementing it with Apache Iceberg. - [Merge-on-Read (MoR) in Apache Iceberg: The Definitive Guide](https://datalakehouse101.com/knowledge/merge-on-read.html): Learn what Merge-on-Read means in Apache Iceberg, how delete files work, when to use MoR vs Copy-on-Write, and how MoR enables high-frequency CDC updates in the data lakehouse. - [Metadata Management: The Definitive Guide](https://datalakehouse101.com/knowledge/metadata-management.html): Learn what metadata management is, the types of metadata in the data lakehouse, and how managing technical, business, and operational metadata enables governance and performance. - [MinIO: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/minio.html): Learn what MinIO is, how its S3-compatible object storage enables on-premises and hybrid data lakehouses, and how to use MinIO with Apache Iceberg for local development. - [MLflow: The Definitive Guide for Data Lakehouse ML](https://datalakehouse101.com/knowledge/mlflow.html): Learn what MLflow is, how it tracks ML experiments on Apache Iceberg features, and why it is the standard open-source ML lifecycle management platform for lakehouse data science. - [MCP (Model Context Protocol): The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/model-context-protocol.html): Learn what the Model Context Protocol (MCP) is, how Dremio's MCP server enables AI agents to autonomously query the lakehouse, and why MCP is the standard for AI tool integration. - [Multi-Engine Interoperability: The Definitive Guide](https://datalakehouse101.com/knowledge/multi-engine-interoperability.html): Learn what multi-engine interoperability means in the data lakehouse, how Apache Iceberg and the REST Catalog spec enable it, and why it protects against vendor lock-in. - [Nessie Branching and Tagging: The Definitive Guide](https://datalakehouse101.com/knowledge/nessie-branching-tagging.html): Learn how Project Nessie's Git-like branching and tagging enables isolated data experimentation, reproducible ML experiments, and zero-risk schema migrations on Apache Iceberg tables. - [Object Storage: The Definitive Guide](https://datalakehouse101.com/knowledge/object-storage.html): Learn what object storage is, why S3-compatible object storage is the foundation of the data lakehouse, and how it enables decoupled storage and compute at petabyte scale. - [Open Lakehouse: The Definitive Guide](https://datalakehouse101.com/knowledge/open-lakehouse.html): Learn what the Open Lakehouse is, how open standards (Apache Iceberg, Parquet, Iceberg REST Catalog) prevent vendor lock-in, and why openness is the key principle for future-proof data architecture. - [Open Table Format: The Definitive Guide](https://datalakehouse101.com/knowledge/open-table-format.html): Learn what an open table format is, how Apache Iceberg, Delta Lake, and Apache Hudi compare, and why open table formats are the foundation of the modern data lakehouse. - [OpenMetadata: The Definitive Guide for Data Lakehouse Governance](https://datalakehouse101.com/knowledge/openmetadata.html): Learn what OpenMetadata is, how it provides unified data discovery, lineage, and quality for Apache Iceberg lakehouses, and why it is the modern open-source catalog platform. - [Partition Evolution in Apache Iceberg: The Definitive Guide](https://datalakehouse101.com/knowledge/partition-evolution.html): Learn how Apache Iceberg partition evolution works, why it solves the static partitioning problem, and how to change partition schemes without rewriting data. - [Physical Datasets in Dremio: The Definitive Guide](https://datalakehouse101.com/knowledge/physical-datasets.html): Learn what Physical Datasets are in Dremio, how they register data sources including Apache Iceberg tables and federated sources, and how they serve as the foundation for Virtual Datasets. - [Predicate Pushdown: The Definitive Guide](https://datalakehouse101.com/knowledge/predicate-pushdown.html): Learn what predicate pushdown is, how it works in Apache Iceberg and Parquet, and why it is one of the most important query optimization techniques in the data lakehouse. - [Presto: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/presto.html): Learn what Presto is, how it compares to Trino, and its role in distributed SQL analytics across Apache Iceberg and heterogeneous data sources in the modern lakehouse. - [Project Nessie: The Definitive Guide](https://datalakehouse101.com/knowledge/project-nessie.html): Learn what Project Nessie is, how its Git-like branching and tagging works for Apache Iceberg tables, and why it powers Dremio Open Catalog for data engineering workflows. - [Query Optimization: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/query-optimization.html): Learn what query optimization is, the key optimization techniques used in Apache Iceberg and Dremio, and how to improve query performance in the data lakehouse. - [Role-Based Access Control (RBAC): The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/rbac.html): Learn what RBAC is in the data lakehouse, how it is implemented in Apache Polaris and Dremio, and why role-based access is the governance standard for enterprise Iceberg deployments. - [Real-Time Analytics: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/real-time-analytics.html): Learn what real-time analytics means in the data lakehouse, how streaming ingestion and query engines like Dremio enable near-real-time dashboards on Apache Iceberg data. - [Row-Level Deletes in Apache Iceberg: The Definitive Guide](https://datalakehouse101.com/knowledge/row-level-deletes.html): Learn how row-level deletes work in Apache Iceberg V2, the difference between positional and equality delete files, and practical use cases like GDPR erasure and CDC upserts. - [Schema Evolution in Apache Iceberg: The Definitive Guide](https://datalakehouse101.com/knowledge/schema-evolution.html): Learn how Apache Iceberg schema evolution works, what changes are safe vs breaking, and how to evolve table schemas without data rewrites in your data lakehouse. - [Self-Service Analytics: The Definitive Guide](https://datalakehouse101.com/knowledge/self-service-analytics.html): Learn what self-service analytics is, how the data lakehouse enables it at scale, and why the semantic layer, data catalog, and governed access are essential for true self-service. - [Semantic Layer: The Definitive Guide](https://datalakehouse101.com/knowledge/semantic-layer.html): Learn what a semantic layer is, how it translates raw data into business-friendly metrics, and why it is the critical bridge between data engineers and business analysts in the lakehouse. - [Silver Layer: The Definitive Guide for Apache Iceberg Lakehouse](https://datalakehouse101.com/knowledge/silver-layer.html): Learn what the Silver Layer is in the Medallion Architecture, how it cleanses and conforms Bronze data in Apache Iceberg, and best practices for Silver table design and maintenance. - [SQL Analytics: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/sql-analytics.html): Learn how SQL analytics powers the data lakehouse, the key SQL patterns for Iceberg queries, and how engines like Dremio deliver ANSI SQL analytics on petabyte-scale data. - [Stream Processing: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/stream-processing.html): Learn what stream processing is, how Apache Flink enables real-time stream processing into Apache Iceberg tables, and the difference between streaming and batch processing. - [Time Travel in Apache Iceberg: The Definitive Guide](https://datalakehouse101.com/knowledge/time-travel.html): Learn how Apache Iceberg time travel works, how to query historical snapshots by timestamp or snapshot ID, and practical use cases for time travel in the data lakehouse. - [Trino: The Definitive Guide for Data Lakehouse](https://datalakehouse101.com/knowledge/trino.html): Learn what Trino is, how its federated SQL query engine works across Apache Iceberg and other data sources, and when to use Trino in your data lakehouse architecture. - [Unity Catalog: The Definitive Guide](https://datalakehouse101.com/knowledge/unity-catalog.html): Learn what Databricks Unity Catalog is, how it provides unified governance for the Databricks Lakehouse, its open-source release, and how it compares to Apache Polaris. - [Upsert: The Definitive Guide for Apache Iceberg](https://datalakehouse101.com/knowledge/upsert.html): Learn what upsert is, how Apache Iceberg's MERGE INTO implements upsert semantics, and why upsert is essential for CDC pipelines maintaining current-state Silver tables. - [Vectorized Query Execution: The Definitive Guide](https://datalakehouse101.com/knowledge/vectorized-execution.html): Learn what vectorized query execution is, how it uses Apache Arrow and SIMD instructions to accelerate analytical queries, and why it is the performance foundation of modern lakehouse engines. - [Virtual Datasets in Dremio: The Definitive Guide](https://datalakehouse101.com/knowledge/virtual-datasets.html): Learn what Dremio Virtual Datasets are, how they create a semantic layer above raw Iceberg data, and how they enable self-service analytics without exposing raw table complexity. - [Z-Ordering: The Definitive Guide for Apache Iceberg](https://datalakehouse101.com/knowledge/z-ordering.html): Learn what Z-Ordering is in Apache Iceberg, how it clusters data to improve data skipping, and how to use OPTIMIZE with Z-ORDER BY for lakehouse query performance. ## Author - [Alex Merced](https://alexmerced.com): VP Developer Relations at Dremio, author of multiple books on data lakehouse, Apache Iceberg, and Agentic AI - [Substack Newsletter](https://amdatalakehouse.substack.com): Weekly updates on open lakehouse OSS and Agentic AI ## Books by Alex Merced - [Architecting an Apache Iceberg Lakehouse](https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/): Manning Publications — the definitive book on building enterprise lakehouses with Apache Iceberg - [The Open Source Lakehouse](https://www.amazon.com/Open-Source-Lakehouse-Architecting-Analytical/dp/B0GW595MVL/): Comprehensive guide to open lakehouse architecture - [The 2026 Guide to Lakehouses, Apache Iceberg and Agentic AI](https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands-ebook/dp/B0GQL4QNRT/): Current-year handbook on lakehouses and AI - [Apache Iceberg and Agentic AI](https://www.amazon.com/Apache-Iceberg-Agentic-Connecting-Structured/dp/B0GW2WF4PX/): Connecting structured data to AI agents via Iceberg