How do AI agents query the lakehouse?

AI agents query the Agentic Lakehouse through Dremio's MCP (Model Context Protocol) server. When an AI agent needs data to answer a question, it calls MCP tools: list_datasets to find relevant tables, get_schema to understand structure and metric definitions, execute_query to run SQL, and get_results to retrieve data. The agent can autonomously perform multi-step data analysis without human SQL writing.

What makes a lakehouse 'agentic-ready'?

An agentic-ready lakehouse requires: (1) An AI Semantic Layer with rich natural language descriptions of datasets and metric definitions that LLMs can interpret; (2) An MCP server exposing data access as AI-callable tools; (3) Governed access ensuring agents respect RBAC policies; (4) High-quality, current data (via streaming ingestion) that agents can trust for business decisions.

Agentic Lakehouse: The Definitive Guide

Q: What is the Agentic Lakehouse?

The Agentic Lakehouse is a data lakehouse architecture designed to serve AI agents as first-class data consumers alongside human analysts. It combines an AI Semantic Layer (rich business context for LLM consumption), an MCP server (Model Context Protocol tools for autonomous agent data access), governed access control (RBAC ensuring agents access only authorized data), and Apache Iceberg tables (the open, governed data foundation).

What Is the Agentic Lakehouse?

The Agentic Lakehouse is the architectural evolution of the data lakehouse from a human-centric analytics platform to an AI-native data platform where autonomous AI agents are first-class data consumers. In the Agentic Lakehouse, AI agents don't just assist human analysts — they autonomously discover business data, form SQL queries, execute analyses, interpret results, and produce actionable insights without requiring human SQL writing or data engineering support.

The Agentic Lakehouse is enabled by three components working together: the AI Semantic Layer (providing rich business context that AI models can interpret), the MCP server (exposing data access as AI-callable tools), and governed access control (ensuring agents access only data their principal is authorized for). Together, these create a data platform where any LLM-powered agent can become an autonomous data analyst.

The AI Agent Data Access Flow

How an AI agent answers a business question using the Agentic Lakehouse:

Question received: 'What are the top 5 performing product categories by revenue this quarter vs last quarter?'
List datasets: Agent calls MCP list_datasets tool, discovers gold.sales_metrics with description 'Pre-computed revenue metrics by product category and time period'
Get schema: Agent calls get_schema on sales_metrics, learns columns include category, quarter, revenue, yoy_growth_rate with business descriptions
Form query: Agent formulates SQL: SELECT category, revenue, yoy_growth_rate FROM gold.sales_metrics WHERE quarter IN ('2026-Q1', '2025-Q1') ORDER BY revenue DESC LIMIT 10
Execute and return: Agent calls execute_query, receives results, formats answer with business context — all autonomously, in seconds

AI Agent Agentic Lakehouse Flow diagram — Figure 1: AI agent data access flow — MCP tool calls from question to governed query to business answer.

Governance in the Agentic Lakehouse

AI agents accessing enterprise data must respect the same governance policies as human users. In the Agentic Lakehouse, this is achieved through:

Agent principals: Each AI agent is assigned a principal identity in the catalog (Polaris or Dremio) with specific role assignments — an agent serving the sales team has sales data access, but not HR or financial data access
Catalog-layer enforcement: The MCP server calls Dremio or the Iceberg REST catalog with the agent's credentials — access control is enforced at the catalog, not at the application layer
Column masking: Agents receive masked values for sensitive columns according to their role's masking policies — PII is protected from agent access using the same policies that protect human analyst access
Audit logging: All agent query executions are logged to the same audit trail as human queries — enabling compliance review of AI agent data access

Agentic Lakehouse Governance diagram — Figure 2: Governance in the Agentic Lakehouse — agent principals, catalog enforcement, masking, and audit.

Summary

The Agentic Lakehouse represents the most significant evolution of the enterprise data platform since the introduction of cloud storage. By combining Apache Iceberg's open, governed data foundation with Dremio's AI Semantic Layer and MCP server, organizations create a data platform where AI agents become autonomous analytical contributors — answering business questions, generating insights, and preparing ML features without human SQL assistance. The Agentic Lakehouse is not a future architecture — it is the current direction of Dremio's platform development and the destination that every enterprise data lakehouse investment is building toward.

What Is the Agentic Lakehouse?

The AI Agent Data Access Flow

Governance in the Agentic Lakehouse

Summary

Related Concepts

Go Deeper — Recommended Resources