Analytics & BI
Analytics & BI
AI Semantic Layer (Dremio)
Learn what Dremio's AI Semantic Layer is, how it enables natural language data access for AI agents …
Analytics & BI
Automated Table Optimization
Learn what automated table optimization is, how Dremio and Apache Iceberg automate compaction, Z-ord…
Analytics & BI
Data Skipping
Learn what data skipping is in Apache Iceberg, how file-level statistics enable skipping irrelevant …
Analytics & BI
Lakehouse Federation
Learn what lakehouse federation is, how engines like Dremio and Trino query across Iceberg tables an…
Analytics & BI
Multi-Engine Interoperability
Learn what multi-engine interoperability means in the data lakehouse, how Apache Iceberg and the RES…
Analytics & BI
Query Optimization
Learn what query optimization is, the key optimization techniques used in Apache Iceberg and Dremio,…
Analytics & BI
Self-Service Analytics
Learn what self-service analytics is, how the data lakehouse enables it at scale, and why the semant…
Analytics & BI
Semantic Layer
Learn what a semantic layer is, how it translates raw data into business-friendly metrics, and why i…
Analytics & BI
SQL Analytics
Learn how SQL analytics powers the data lakehouse, the key SQL patterns for Iceberg queries, and how…
Architecture Patterns
Architecture Patterns
Agentic Lakehouse
Learn what the Agentic Lakehouse is, how AI agents autonomously discover and query Apache Iceberg da…
Architecture Patterns
Bronze Layer
Learn what the Bronze Layer is in the Medallion Architecture, how it stores raw ingested data in Apa…
Architecture Patterns
Data Engineering
Learn what data engineering is, the key skills and tools data engineers use to build Apache Iceberg …
Architecture Patterns
Data Science on the Lakehouse
Learn how data scientists use Apache Iceberg tables, PyIceberg, and Dremio's AI Semantic Layer for M…
Architecture Patterns
ETL (Extract, Transform, Load)
Learn what ETL is, how it differs from ELT in the data lakehouse, and how Apache Spark and Flink imp…
Architecture Patterns
Feature Store
Learn what a feature store is, how Apache Iceberg tables serve as open feature stores, and how featu…
Architecture Patterns
Gold Layer
Learn what the Gold Layer is in the Medallion Architecture, how it delivers pre-aggregated business …
Architecture Patterns
Lakehouse Architecture
Learn what the data lakehouse architecture is, its key layers and components, and how Apache Iceberg…
Architecture Patterns
Medallion Architecture
Learn what the Medallion Architecture is, how Bronze, Silver, and Gold layers organize data lakehous…
Architecture Patterns
MLflow
Learn what MLflow is, how it tracks ML experiments on Apache Iceberg features, and why it is the sta…
Architecture Patterns
MCP (Model Context Protocol)
Learn what the Model Context Protocol (MCP) is, how Dremio's MCP server enables AI agents to autonom…
Architecture Patterns
Nessie Branching and Tagging
Learn how Project Nessie's Git-like branching and tagging enables isolated data experimentation, rep…
Architecture Patterns
Open Lakehouse
Learn what the Open Lakehouse is, how open standards (Apache Iceberg, Parquet, Iceberg REST Catalog)…
Architecture Patterns
Silver Layer
Learn what the Silver Layer is in the Medallion Architecture, how it cleanses and conforms Bronze da…
Catalogs & Metadata
Catalogs & Metadata
Apache Gravitino
Learn what Apache Gravitino is, how it acts as a universal metadata layer federating multiple catalo…
Catalogs & Metadata
Apache Polaris
Learn what Apache Polaris is, how this ASF-governed Iceberg REST Catalog enables multi-engine intero…
Catalogs & Metadata
AWS Glue Data Catalog
Learn what AWS Glue Data Catalog is, how it supports Apache Iceberg as a serverless managed catalog …
Catalogs & Metadata
Catalog Interoperability
Learn what catalog interoperability means in the data lakehouse, how the Iceberg REST Catalog specif…
Catalogs & Metadata
Data Catalog
Learn what a data catalog is, how it differs from a technical metadata store, and why business data …
Catalogs & Metadata
Hive Metastore
Learn what the Hive Metastore is, how it stores Hadoop table metadata, its limitations for Apache Ic…
Catalogs & Metadata
Metadata Management
Learn what metadata management is, the types of metadata in the data lakehouse, and how managing tec…
Catalogs & Metadata
Project Nessie
Learn what Project Nessie is, how its Git-like branching and tagging works for Apache Iceberg tables…
Catalogs & Metadata
Unity Catalog
Learn what Databricks Unity Catalog is, how it provides unified governance for the Databricks Lakeho…
Core Concepts
Core Concepts
ACID Transactions
Learn what ACID transactions are, why they matter in the data lakehouse, and how Apache Iceberg impl…
Core Concepts
Data Lake
Learn what a data lake is, how it works, its strengths and weaknesses, and how it evolved into the m…
Core Concepts
Data Lakehouse
Learn what a data lakehouse is, how it works, why it replaced the data warehouse for modern analytic…
Core Concepts
Data Mesh
Learn what Data Mesh is, its four core principles, how it differs from centralized data platforms, a…
Core Concepts
Data Warehouse
Learn what a data warehouse is, how it works, its architecture patterns, key limitations, and how it…
Core Concepts
Decoupled Storage and Compute
Learn what decoupled storage and compute means in the data lakehouse, how Apache Iceberg on object s…
Core Concepts
ELT (Extract, Load, Transform)
Learn what ELT is, how Extract Load Transform differs from ETL, why ELT is the dominant pattern in t…
Core Concepts
Open Table Format
Learn what an open table format is, how Apache Iceberg, Delta Lake, and Apache Hudi compare, and why…
File Formats & Storage
File Formats & Storage
Amazon S3
Learn how Amazon S3 serves as the object storage foundation for data lakehouses on AWS, including S3…
File Formats & Storage
Apache Arrow
Learn what Apache Arrow is, how its in-memory columnar format powers vectorized query engines, and w…
File Formats & Storage
Apache Avro
Learn what Apache Avro is, how its schema-embedded row format is used for streaming and Iceberg meta…
File Formats & Storage
Apache ORC
Learn what Apache ORC is, how its columnar format compares to Parquet, and its role in the Hive ecos…
File Formats & Storage
Apache Parquet
Learn what Apache Parquet is, how its columnar format enables efficient analytics, and why it is the…
File Formats & Storage
Azure Data Lake Storage (ADLS)
Learn what Azure Data Lake Storage Gen2 is, how it serves as the object storage foundation for Azure…
File Formats & Storage
Google Cloud Storage (GCS)
Learn how Google Cloud Storage serves as the object storage foundation for GCP-based Apache Iceberg …
File Formats & Storage
Apache Hadoop HDFS
Learn what Apache Hadoop HDFS is, its role as the original data lake storage layer, and why modern l…
File Formats & Storage
MinIO
Learn what MinIO is, how its S3-compatible object storage enables on-premises and hybrid data lakeho…
File Formats & Storage
Object Storage
Learn what object storage is, why S3-compatible object storage is the foundation of the data lakehou…
Governance
Governance & Quality
Governance & Quality
Apache Atlas
Learn what Apache Atlas is, how it provides metadata management and governance for the Hadoop and la…
Governance & Quality
Apache Ranger
Learn what Apache Ranger is, how it provides centralized security policy management for the Hadoop a…
Governance & Quality
Column-Level Security
Learn what column-level security is, how data masking and column visibility controls protect sensiti…
Governance & Quality
Data Fabric
Learn what a data fabric is, how it provides a unified data management layer across distributed data…
Governance & Quality
Data Lineage
Learn what data lineage is, how it tracks data flow through lakehouse pipelines, and why lineage is …
Governance & Quality
Data Observability
Learn what data observability is, how it monitors freshness, volume, schema, and distribution in Apa…
Governance & Quality
Data Quality
Learn what data quality means in the data lakehouse, the key data quality dimensions, and how tools …
Governance & Quality
OpenMetadata
Learn what OpenMetadata is, how it provides unified data discovery, lineage, and quality for Apache …
Governance & Quality
Role-Based Access Control (RBAC)
Learn what RBAC is in the data lakehouse, how it is implemented in Apache Polaris and Dremio, and wh…
Ingestion
Ingestion & Streaming
Ingestion & Streaming
Batch Processing
Learn what batch processing is, how Apache Spark handles large-scale batch ETL into Apache Iceberg, …
Ingestion & Streaming
Data Ingestion
Learn what data ingestion is, the key ingestion patterns for Apache Iceberg lakehouses, and how to c…
Ingestion & Streaming
dbt (Data Build Tool)
Learn what dbt is, how it implements SQL-based ELT transformations on Apache Iceberg tables, and why…
Ingestion & Streaming
Real-Time Analytics
Learn what real-time analytics means in the data lakehouse, how streaming ingestion and query engine…
Ingestion & Streaming
Stream Processing
Learn what stream processing is, how Apache Flink enables real-time stream processing into Apache Ic…
Ingestion & Streaming
Upsert
Learn what upsert is, how Apache Iceberg's MERGE INTO implements upsert semantics, and why upsert is…
Query Engines & Platforms
Query Engines & Platforms
Apache Flink
Learn what Apache Flink is, how it enables real-time streaming ingestion into Apache Iceberg tables,…
Query Engines & Platforms
Apache Spark
Learn what Apache Spark is, how it powers Iceberg ETL and ML workloads, and when to use Spark vs Dre…
Query Engines & Platforms
Autonomous Reflections
Learn how Dremio Autonomous Reflections automatically analyze query patterns and create, update, and…
Query Engines & Platforms
Column Pruning
Learn what column pruning is, how it works with Apache Parquet's columnar storage, and why it dramat…
Query Engines & Platforms
Dremio Cloud
Learn what Dremio Cloud is, how its serverless lakehouse platform works on AWS and Azure, and how it…
Query Engines & Platforms
Dremio Open Catalog
Learn what Dremio Open Catalog is, how it implements the Iceberg REST Catalog spec with Git-like Nes…
Query Engines & Platforms
Dremio Intelligent Query Engine
Learn how Dremio's Intelligent Query Engine uses Apache Arrow vectorized execution, Reflection accel…
Query Engines & Platforms
Dremio Reflections
Learn what Dremio Reflections are, how raw and aggregation Reflections work, and how they deliver su…
Query Engines & Platforms
Dremio
Learn what Dremio is, how its intelligent query engine works, and why it is the leading data lakehou…
Query Engines & Platforms
Physical Datasets (Dremio)
Learn what Physical Datasets are in Dremio, how they register data sources including Apache Iceberg …
Query Engines & Platforms
Predicate Pushdown
Learn what predicate pushdown is, how it works in Apache Iceberg and Parquet, and why it is one of t…
Query Engines & Platforms
Presto
Learn what Presto is, how it compares to Trino, and its role in distributed SQL analytics across Apa…
Query Engines & Platforms
Trino
Learn what Trino is, how its federated SQL query engine works across Apache Iceberg and other data s…
Query Engines & Platforms
Vectorized Query Execution
Learn what vectorized query execution is, how it uses Apache Arrow and SIMD instructions to accelera…
Query Engines & Platforms
Virtual Datasets (Dremio)
Learn what Dremio Virtual Datasets are, how they create a semantic layer above raw Iceberg data, and…
Table Formats
Table Formats
Apache Hudi
Learn what Apache Hudi is, how its incremental processing model works, how it compares to Apache Ice…
Table Formats
Apache Iceberg
Learn what Apache Iceberg is, how its metadata architecture works, and why it is the industry-standa…
Table Formats
Compaction
Learn what compaction is in Apache Iceberg, why it is essential for lakehouse performance, how Copy-…
Table Formats
Copy-on-Write (CoW)
Learn what Copy-on-Write means in Apache Iceberg, when to use CoW vs Merge-on-Read, and how CoW upda…
Table Formats
Delta Lake
Learn what Delta Lake is, how it works, how it compares to Apache Iceberg, and when to choose Delta …
Table Formats
Hidden Partitioning
Learn how Apache Iceberg hidden partitioning works, why it eliminates the need to write partition-aw…
Table Formats
Iceberg Manifest Files
Learn what Apache Iceberg manifest files are, how they store file-level statistics for data skipping…
Table Formats
Iceberg REST Catalog
Learn what the Apache Iceberg REST Catalog specification is, how it enables multi-engine catalog int…
Table Formats
Iceberg Snapshots
Learn what Apache Iceberg snapshots are, how they enable ACID transactions and time travel, and how …
Table Formats
Merge-on-Read (MoR)
Learn what Merge-on-Read means in Apache Iceberg, how delete files work, when to use MoR vs Copy-on-…
Table Formats
Partition Evolution
Learn how Apache Iceberg partition evolution works, why it solves the static partitioning problem, a…
Table Formats
Row-Level Deletes
Learn how row-level deletes work in Apache Iceberg V2, the difference between positional and equalit…
Table Formats
Schema Evolution
Learn how Apache Iceberg schema evolution works, what changes are safe vs breaking, and how to evolv…
Table Formats
Time Travel
Learn how Apache Iceberg time travel works, how to query historical snapshots by timestamp or snapsh…
Table Formats
Z-Ordering (Data Sorting)
Learn what Z-Ordering is in Apache Iceberg, how it clusters data to improve data skipping, and how t…