What Is Apache ORC?

Apache ORC (Optimized Row Columnar) is an open-source, columnar storage file format created at Hortonworks in 2013 specifically to address performance limitations of the original Hive RCFile format. ORC became the standard file format for Apache Hive — especially for Hive ACID tables (which require ORC for row-level insert/update/delete support in Hive) — and was widely adopted in the Hadoop ecosystem.

ORC, like Apache Parquet, stores data column-by-column with column-level statistics and compression. Both are columnar formats optimized for analytical workloads; they share the core design philosophy but differ in implementation details — particularly in their index structures, ACID capabilities, and ecosystem support.

ORC vs. Parquet

FeatureApache ORCApache Parquet
Column storageYesYes
Per-column statsYes (min/max/sum/bloom)Yes (min/max/null count)
Bloom filter indexBuilt-inRequires separate config
Nested typesYesYes (stronger ecosystem)
Hive ACID supportYes (native)No
Iceberg supportYes (non-default)Yes (default)
Ecosystem breadthHive, Spark, TrinoUniversal
ORC vs Parquet Comparison diagram
Figure 1: ORC vs Parquet — feature comparison for lakehouse file format selection.

ORC in Apache Iceberg

Apache Iceberg supports ORC as an alternative data file format. Tables configured with ORC format use ORC files for data storage while retaining all of Iceberg's metadata features: snapshots, time travel, schema evolution, and partition evolution all work identically with ORC files. Iceberg's manifest file statistics are derived from ORC file statistics using the same mechanism as Parquet.

ORC support in Iceberg is primarily a migration enabler: organizations with existing Hive ORC tables can migrate to Iceberg format (gaining snapshots, ACID, and schema evolution) without converting existing ORC data files to Parquet. New files can be written as Parquet while historical files remain ORC — Iceberg handles mixed-format tables correctly.

ORC in Apache Iceberg Migration Path diagram
Figure 2: ORC in Iceberg — migration path for Hive ORC tables to Iceberg without file format conversion.

Summary

Apache ORC is the columnar format of the Hadoop/Hive era — highly optimized, with built-in bloom filter indexes and native Hive ACID support. In the modern data lakehouse, Apache Parquet is the dominant format due to its broader engine support and Iceberg ecosystem maturity. ORC remains relevant for Hive legacy migrations and environments where Hive ACID semantics are still required. For new Iceberg tables, Parquet is the clear default choice.