What Is SQL Analytics in the Lakehouse?

SQL analytics is the practice of using Structured Query Language to query, transform, and analyze data in the data lakehouse. SQL is the universal interface for data — it is the language that business analysts, data engineers, data scientists, and AI agents all use to interact with Apache Iceberg tables, regardless of which underlying engine executes the query.

Every major lakehouse query engine provides a SQL interface: Dremio offers ANSI SQL with extensions for Iceberg management and Reflection-accelerated analytics; Spark SQL provides Python and Scala-integrated SQL for batch ETL; Trino delivers distributed ANSI SQL for federated queries; Flink SQL provides streaming SQL for real-time pipeline definition. SQL's universality means analytical skills transfer across engines — a DML pattern learned in Dremio works in Trino.

Key SQL Patterns for Lakehouse Analytics

The most important SQL patterns for lakehouse analytics:

Window Functions

Essential for time-series analysis, ranking, and cumulative calculations:
SELECT customer_id, order_date, revenue, SUM(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS cumulative_ltv FROM gold.orders;

CTEs (Common Table Expressions)

Named subqueries that improve complex query readability and reusability:
WITH active_customers AS (SELECT customer_id FROM silver.customers WHERE status = 'active'), revenue AS (SELECT customer_id, SUM(order_value) as ltv FROM gold.orders GROUP BY 1) SELECT a.customer_id, r.ltv FROM active_customers a JOIN revenue r ON a.customer_id = r.customer_id;

Iceberg Time Travel SQL

SELECT * FROM silver.orders FOR TIMESTAMP AS OF TIMESTAMP '2026-01-01 00:00:00';

MERGE INTO for Upserts

Efficient CDC application to current-state Silver tables without full table rewrites.

SQL Analytics Lakehouse Patterns diagram
Figure 1: Key SQL patterns — window functions, CTEs, MERGE INTO, and time travel for lakehouse analytics.

Dremio SQL Extensions

Dremio extends ANSI SQL with lakehouse-specific capabilities:

  • OPTIMIZE TABLE: Triggers compaction and Z-order layout optimization on Iceberg tables
  • ALTER TABLE ... REFRESH REFLECTIONS: Forces Reflection refresh outside the scheduled window
  • CREATE VDS AS: Creates a Virtual Dataset from a SQL SELECT statement, adding it to the semantic layer
  • AI_COMPLETE, AI_CLASSIFY, AI_GENERATE: AI functions for LLM-powered transformations directly in SQL queries
  • AT BRANCH/TAG: Iceberg time travel via Nessie branch or tag name rather than timestamp
Dremio SQL Extensions for Lakehouse diagram
Figure 2: Dremio SQL extensions — Iceberg management, Reflection control, and AI SQL functions.

Summary

SQL analytics is the universal interface of the data lakehouse — the common language connecting data consumers of every type (analysts, engineers, AI agents) to Apache Iceberg data through any compliant engine. Understanding key SQL patterns — window functions, CTEs, MERGE INTO, time travel — combined with engine-specific extensions like Dremio's AI functions and VDS creation, gives data teams the complete SQL toolkit for lakehouse analytics at any scale.