Apache ORC (Optimized Row Columnar) is an open-source, columnar storage file format developed for the Apache Hive ecosystem. Like Parquet, ORC stores data column-by-column with statistics for predicate pushdown. ORC is the native format for Hive ACID tables and includes built-in bloom filter indexes and lightweight indexes that Parquet lacks natively.

Does Apache Iceberg support ORC?

Yes. Apache Iceberg supports ORC as a data file format alongside Parquet and Avro. Tables can be created with ORC format using the 'write.format.default=orc' table property. ORC support in Iceberg is primarily useful for organizations migrating from Hive ORC tables to Iceberg without converting file formats.

Should I use ORC or Parquet for new Iceberg tables?

For new Iceberg tables, Parquet is the recommended default. Parquet has broader query engine support, is more heavily optimized in most engines' Iceberg readers, and has stronger ecosystem tooling. ORC is best when migrating existing Hive ORC tables to Iceberg format to avoid the cost of file conversion.

Apache ORC: The Definitive Guide

What Is Apache ORC?

Apache ORC (Optimized Row Columnar) is an open-source, columnar storage file format created at Hortonworks in 2013 specifically to address performance limitations of the original Hive RCFile format. ORC became the standard file format for Apache Hive — especially for Hive ACID tables (which require ORC for row-level insert/update/delete support in Hive) — and was widely adopted in the Hadoop ecosystem.

ORC, like Apache Parquet, stores data column-by-column with column-level statistics and compression. Both are columnar formats optimized for analytical workloads; they share the core design philosophy but differ in implementation details — particularly in their index structures, ACID capabilities, and ecosystem support.

ORC vs. Parquet

Feature	Apache ORC	Apache Parquet
Column storage	Yes	Yes
Per-column stats	Yes (min/max/sum/bloom)	Yes (min/max/null count)
Bloom filter index	Built-in	Requires separate config
Nested types	Yes	Yes (stronger ecosystem)
Hive ACID support	Yes (native)	No
Iceberg support	Yes (non-default)	Yes (default)
Ecosystem breadth	Hive, Spark, Trino	Universal

ORC in Apache Iceberg

Apache Iceberg supports ORC as an alternative data file format. Tables configured with ORC format use ORC files for data storage while retaining all of Iceberg's metadata features: snapshots, time travel, schema evolution, and partition evolution all work identically with ORC files. Iceberg's manifest file statistics are derived from ORC file statistics using the same mechanism as Parquet.

ORC support in Iceberg is primarily a migration enabler: organizations with existing Hive ORC tables can migrate to Iceberg format (gaining snapshots, ACID, and schema evolution) without converting existing ORC data files to Parquet. New files can be written as Parquet while historical files remain ORC — Iceberg handles mixed-format tables correctly.

ORC in Apache Iceberg Migration Path diagram — Figure 2: ORC in Iceberg — migration path for Hive ORC tables to Iceberg without file format conversion.

Summary

Apache ORC is the columnar format of the Hadoop/Hive era — highly optimized, with built-in bloom filter indexes and native Hive ACID support. In the modern data lakehouse, Apache Parquet is the dominant format due to its broader engine support and Iceberg ecosystem maturity. ORC remains relevant for Hive legacy migrations and environments where Hive ACID semantics are still required. For new Iceberg tables, Parquet is the clear default choice.

What Is Apache ORC?

ORC vs. Parquet

ORC in Apache Iceberg

Summary

Related Concepts

Go Deeper — Recommended Resources