What Is Amazon S3?

Amazon S3 (Simple Storage Service) is AWS's flagship object storage service, launched in 2006 as one of the first cloud services. It provides unlimited storage capacity at low cost with 99.999999999% (11 nines) object durability through automatic multi-AZ replication. S3 is the most widely used object storage service in the world and the dominant storage backend for AWS-based data lakehouses.

For Apache Iceberg lakehouses on AWS, S3 is where everything lives: Parquet data files, Avro manifest files, manifest lists, table metadata JSON files, and Iceberg catalog state. Query engines (Dremio, Spark, Trino) access S3 through the S3 API — reading and writing objects using the same universal interface.

S3 Storage Classes for Lakehouse Data

S3 offers multiple storage classes with different cost/access-time trade-offs — enabling lakehouse cost optimization based on data access patterns:

  • S3 Standard: Default class for frequently accessed data. Millisecond access time. Highest cost per GB. Best for Bronze/Silver/Gold layer tables actively queried.
  • S3 Intelligent-Tiering: Automatically moves objects between access tiers based on usage patterns. No retrieval fee. Best for data with unpredictable access patterns (older partitions that are occasionally queried).
  • S3 Standard-IA (Infrequent Access): Lower storage cost, retrieval fee per GB. Best for data accessed less than once per month.
  • S3 Glacier Instant Retrieval: Very low storage cost, millisecond retrieval, higher retrieval fee. Best for data retained for compliance but rarely queried.
Amazon S3 Storage Classes for Lakehouse diagram
Figure 1: S3 storage class tiering for lakehouse data — optimize costs based on partition access frequency.

S3 Strong Consistency and Iceberg

Amazon S3 achieved strong read-after-write consistency for all GET, PUT, LIST, and DELETE operations in December 2020. This was a critical milestone for Iceberg on S3: previously, Iceberg implementations had to work around S3's eventual consistency with mechanisms like DynamoDB lock tables. With strong consistency, Iceberg's native atomic commit mechanism (writing metadata and updating the catalog pointer) works correctly on S3 without any workarounds.

S3 strong consistency means: immediately after an Iceberg commit writes a new metadata file to S3, any subsequent GET of that key will return the new file. LIST operations immediately reflect new objects. This is the consistency model Iceberg's optimistic concurrency control requires.

S3 Access Control for Lakehouses

S3 provides multiple access control mechanisms for securing lakehouse data:

  • IAM Policies: Identity-based policies granting specific AWS principals (IAM users, roles) access to specific S3 buckets and object prefixes
  • Bucket Policies: Resource-based policies attached to S3 buckets, defining who can access which objects
  • S3 Block Public Access: Account and bucket-level settings preventing any public access regardless of individual object ACLs
  • AWS Lake Formation: Fine-grained table and column level access control layered on top of Glue-cataloged S3 data
  • Credential Vending: Iceberg REST catalogs (Polaris, Nessie, Glue) return short-lived STS credentials scoped to specific S3 prefixes — engines get exactly the permissions needed for specific table access
S3 Access Control Lakehouse Security diagram
Figure 2: S3 security layers for lakehouse data — IAM, bucket policies, Lake Formation, and credential vending.

Summary

Amazon S3 is the default storage layer for AWS-based data lakehouses, combining unlimited scale, 11-nine durability, strong consistency, cost-optimized storage tiers, and the universal S3 API that every lakehouse engine supports. For organizations building on Apache Iceberg on AWS, S3 is the natural and optimal storage foundation — directly enabling the decoupled storage-and-compute architecture that makes the open lakehouse economically superior to proprietary cloud warehouses.