What Is the Agentic Lakehouse?

The Agentic Lakehouse is the architectural evolution of the data lakehouse from a human-centric analytics platform to an AI-native data platform where autonomous AI agents are first-class data consumers. In the Agentic Lakehouse, AI agents don't just assist human analysts — they autonomously discover business data, form SQL queries, execute analyses, interpret results, and produce actionable insights without requiring human SQL writing or data engineering support.

The Agentic Lakehouse is enabled by three components working together: the AI Semantic Layer (providing rich business context that AI models can interpret), the MCP server (exposing data access as AI-callable tools), and governed access control (ensuring agents access only data their principal is authorized for). Together, these create a data platform where any LLM-powered agent can become an autonomous data analyst.

The AI Agent Data Access Flow

How an AI agent answers a business question using the Agentic Lakehouse:

  1. Question received: 'What are the top 5 performing product categories by revenue this quarter vs last quarter?'
  2. List datasets: Agent calls MCP list_datasets tool, discovers gold.sales_metrics with description 'Pre-computed revenue metrics by product category and time period'
  3. Get schema: Agent calls get_schema on sales_metrics, learns columns include category, quarter, revenue, yoy_growth_rate with business descriptions
  4. Form query: Agent formulates SQL: SELECT category, revenue, yoy_growth_rate FROM gold.sales_metrics WHERE quarter IN ('2026-Q1', '2025-Q1') ORDER BY revenue DESC LIMIT 10
  5. Execute and return: Agent calls execute_query, receives results, formats answer with business context — all autonomously, in seconds
AI Agent Agentic Lakehouse Flow diagram
Figure 1: AI agent data access flow — MCP tool calls from question to governed query to business answer.

Governance in the Agentic Lakehouse

AI agents accessing enterprise data must respect the same governance policies as human users. In the Agentic Lakehouse, this is achieved through:

  • Agent principals: Each AI agent is assigned a principal identity in the catalog (Polaris or Dremio) with specific role assignments — an agent serving the sales team has sales data access, but not HR or financial data access
  • Catalog-layer enforcement: The MCP server calls Dremio or the Iceberg REST catalog with the agent's credentials — access control is enforced at the catalog, not at the application layer
  • Column masking: Agents receive masked values for sensitive columns according to their role's masking policies — PII is protected from agent access using the same policies that protect human analyst access
  • Audit logging: All agent query executions are logged to the same audit trail as human queries — enabling compliance review of AI agent data access
Agentic Lakehouse Governance diagram
Figure 2: Governance in the Agentic Lakehouse — agent principals, catalog enforcement, masking, and audit.

Summary

The Agentic Lakehouse represents the most significant evolution of the enterprise data platform since the introduction of cloud storage. By combining Apache Iceberg's open, governed data foundation with Dremio's AI Semantic Layer and MCP server, organizations create a data platform where AI agents become autonomous analytical contributors — answering business questions, generating insights, and preparing ML features without human SQL assistance. The Agentic Lakehouse is not a future architecture — it is the current direction of Dremio's platform development and the destination that every enterprise data lakehouse investment is building toward.