Presto is an open-source, distributed SQL query engine originally developed at Facebook (Meta) for interactive analytics across heterogeneous data sources at petabyte scale. It is now governed by the Presto Foundation (a Linux Foundation project) and is distinct from Trino, which forked from Presto's open-source community in 2020.

What is the difference between Presto and Trino?

Presto and Trino share a common codebase origin but diverged after the 2020 fork. Presto (Presto Foundation) is maintained primarily by Meta engineers and is optimized for Meta's scale. Trino has a larger open-source community, more frequent releases, and stronger Iceberg REST catalog support. Both support Apache Iceberg.

Who uses Presto in production?

Presto is used at massive scale by Meta (Facebook/Instagram/WhatsApp analytics), Uber (trip and driver analytics), Twitter/X, Airbnb, and other large internet companies. These organizations run Presto clusters with thousands of nodes querying petabytes of data.

Presto: The Definitive Guide for Data Lakehouse

What Is Presto?

Presto is an open-source, distributed SQL query engine originally developed at Facebook (now Meta) in 2012 and open-sourced in 2013. It was designed to enable interactive SQL analytics across Meta's massive data infrastructure — petabytes of data in HDFS, Hive tables, MySQL databases, and proprietary stores — with sub-minute query latency at any scale.

Presto's architectural innovation was the MPP (massively parallel processing) SQL execution model applied to federated data sources: a single Presto cluster with a Coordinator and hundreds of Workers can query simultaneously across HDFS, S3, relational databases, and other sources using standard SQL, without requiring data to be centralized first.

In 2020, the Presto open-source community forked into two projects: Presto (maintained by the Presto Foundation, primarily Meta engineers) and Trino (the renamed PrestoSQL, maintained by an independent community). Both projects share the original Presto architecture but have diverged in implementation details and ecosystem direction.

Presto Architecture

Presto's architecture is the MPP model that influenced all subsequent distributed SQL engines:

Coordinator: Receives SQL queries, parses and plans them using the Presto planner and Raptor optimizer, then distributes execution stages to Workers
Workers: Execute plan fragments in parallel, reading from data sources via Connectors and exchanging data between Workers via the Shuffle layer
Connectors: Plugin interfaces that translate SQL operations into source-specific reads — Iceberg, Hive, JDBC, Kafka, etc.
Memory management: Presto uses spill-to-disk for memory-intensive operations (large aggregations, sorts), enabling queries that exceed available cluster memory

Presto Architecture MPP diagram — Figure 1: Presto's MPP architecture — Coordinator distributes SQL execution across Worker nodes.

Presto and Apache Iceberg

Presto supports Apache Iceberg through the Iceberg connector. Presto's Iceberg support covers: reading Iceberg V1 and V2 tables, partition pruning via hidden partitioning, time travel queries, and basic write operations. Presto's Iceberg catalog support includes Hive Metastore and REST catalogs.

Compared to Trino, Presto's Iceberg connector is somewhat less feature-complete for writing (particularly for complex V2 operations), though both engines provide production-grade Iceberg reads. For organizations primarily using Presto as a read engine against Iceberg tables (written by Spark or Flink), the difference is minimal.

Presto vs. Trino: Which to Choose?

For organizations starting fresh in 2025, the guidance is:

Choose Trino if: you want the most active open-source community, the fastest release cadence, the strongest Iceberg REST catalog support, or you are using Starburst Galaxy as a managed service
Choose Presto if: your organization already runs Presto at scale and the migration cost outweighs the benefits, or if you are at Meta-scale and need Presto's spill-to-disk and native Meta optimizations
Consider AWS Athena if: you are on AWS and want serverless Trino/Presto-compatible SQL without any cluster management

Presto Trino Decision Guide diagram — Figure 2: Presto vs Trino selection criteria for modern lakehouse deployments.

Summary

Presto pioneered distributed federated SQL analytics and remains a production-grade engine at some of the world's largest scale deployments. For most new data lakehouse projects in 2025, Trino is the preferred choice given its more active community and stronger Iceberg ecosystem. For organizations already running Presto or operating at Meta-equivalent scale, Presto's battle-tested architecture and performance remain compelling. Both engines are part of a healthy lakehouse ecosystem alongside Dremio (BI analytics) and Spark (batch ETL).

What Is Presto?

Presto Architecture

Presto and Apache Iceberg

Presto vs. Trino: Which to Choose?

Summary

Related Concepts

Go Deeper — Recommended Resources