What Are Virtual Datasets?
Virtual Datasets (VDS) are Dremio's primary semantic layer building block. A VDS is a named, saved SQL query that defines a clean, business-friendly transformation of one or more underlying data sources — Iceberg tables, other VDSs, or federated sources. It appears as a named table in Dremio's catalog, queryable by any BI tool or SQL client exactly like a physical table.
The key property of a VDS is that it is virtual — no data is materialized or moved. The VDS definition is a SQL query that executes against the underlying sources at runtime. This means VDSs always reflect the current state of the underlying data, without any refresh lag.
VDSs enable the decoupling of the physical data model (how data is stored in Iceberg) from the analytical data model (what business users see). Data engineers create and maintain VDSs that apply business logic, join tables, rename columns to business terminology, and filter sensitive data — while BI analysts query the clean, consistent, business-friendly views without needing to understand the raw Iceberg table structure.
VDS as a Semantic Layer
Virtual Datasets form the foundation of Dremio's semantic layer. A well-designed VDS hierarchy typically has multiple levels:
Foundation VDS (Base Layer)
Direct, light-touch views of Iceberg tables that add column aliases, apply basic data type casts, and filter PII columns. These VDSs provide a clean, governed starting point that hides raw storage complexity.
Business Logic VDS (Middle Layer)
VDSs that join foundation VDSs, apply business calculations (revenue = quantity × unit_price), implement business rules (exclude cancelled orders), and expose business metrics (LTV, churn rate, conversion rate).
BI-Ready VDS (Presentation Layer)
VDSs optimized for specific BI tools or business domains — a 'Sales Dashboard VDS' that pre-joins all tables needed for the sales dashboard, named with business-friendly column names exactly as the BI tool expects them.

Reflections on Virtual Datasets
One of the most powerful Dremio patterns is combining VDSs with Reflections. A Reflection defined on a VDS pre-computes the VDS's transformation and stores the result as an optimized Iceberg table. When queries arrive against the VDS, Dremio's optimizer routes them to the Reflection rather than re-executing the transformation at runtime.
This enables a pattern where complex VDS transformations (multi-table joins, window functions, business metric calculations) are pre-computed once and reused across all subsequent queries — combining the semantic clarity of VDSs with the performance of materialized views, with zero query rewrite required.

Summary
Virtual Datasets are the semantic layer foundation of Dremio's lakehouse platform. By decoupling the physical Iceberg data model from the analytical data model that business users see, VDSs enable true self-service analytics — analysts work with clean, governed, business-friendly views while data engineers maintain the underlying transformations. Combined with Reflections, VDSs provide both semantic clarity and sub-second performance — the combination that makes the open data lakehouse competitive with proprietary BI-optimized cloud warehouses.