What Is Apache ORC?
Apache ORC (Optimized Row Columnar) is an open-source, columnar storage file format created at Hortonworks in 2013 specifically to address performance limitations of the original Hive RCFile format. ORC became the standard file format for Apache Hive — especially for Hive ACID tables (which require ORC for row-level insert/update/delete support in Hive) — and was widely adopted in the Hadoop ecosystem.
ORC, like Apache Parquet, stores data column-by-column with column-level statistics and compression. Both are columnar formats optimized for analytical workloads; they share the core design philosophy but differ in implementation details — particularly in their index structures, ACID capabilities, and ecosystem support.
ORC vs. Parquet
| Feature | Apache ORC | Apache Parquet |
|---|---|---|
| Column storage | Yes | Yes |
| Per-column stats | Yes (min/max/sum/bloom) | Yes (min/max/null count) |
| Bloom filter index | Built-in | Requires separate config |
| Nested types | Yes | Yes (stronger ecosystem) |
| Hive ACID support | Yes (native) | No |
| Iceberg support | Yes (non-default) | Yes (default) |
| Ecosystem breadth | Hive, Spark, Trino | Universal |

ORC in Apache Iceberg
Apache Iceberg supports ORC as an alternative data file format. Tables configured with ORC format use ORC files for data storage while retaining all of Iceberg's metadata features: snapshots, time travel, schema evolution, and partition evolution all work identically with ORC files. Iceberg's manifest file statistics are derived from ORC file statistics using the same mechanism as Parquet.
ORC support in Iceberg is primarily a migration enabler: organizations with existing Hive ORC tables can migrate to Iceberg format (gaining snapshots, ACID, and schema evolution) without converting existing ORC data files to Parquet. New files can be written as Parquet while historical files remain ORC — Iceberg handles mixed-format tables correctly.

Summary
Apache ORC is the columnar format of the Hadoop/Hive era — highly optimized, with built-in bloom filter indexes and native Hive ACID support. In the modern data lakehouse, Apache Parquet is the dominant format due to its broader engine support and Iceberg ecosystem maturity. ORC remains relevant for Hive legacy migrations and environments where Hive ACID semantics are still required. For new Iceberg tables, Parquet is the clear default choice.