What Is Automated Table Optimization?
Automated table optimization is the management capability that makes Apache Iceberg table performance self-maintaining — automatically scheduling and executing the maintenance operations that keep tables performant (right-sized files, clustered data layout, clean metadata) without requiring data engineers to manually schedule and monitor each maintenance job.
Without automation, Iceberg table maintenance is a significant operational burden: data engineers must monitor each table's file count, schedule compaction jobs in Airflow, run Z-order optimization jobs periodically, track snapshot ages and run expiry procedures, and clean up orphan files from failed commits. For organizations with dozens or hundreds of Iceberg tables, manual maintenance is unsustainable — tables drift into poor performance states and engineers spend more time on maintenance than on analytics.
The Four Optimization Operations
- Compaction: Merges many small Parquet files (created by streaming writes, frequent MERGE INTO, or small batch writes) into optimally-sized files (128MB–1GB). Reduces the number of files a query must open, improving scan efficiency and reducing S3 API costs from excessive LIST and GET operations.
- Z-Order Optimization: Re-sorts data files by multiple clustering columns to narrow per-file statistics and maximize data skipping. Should run after significant data accumulation or on a weekly schedule for heavily queried tables.
- Snapshot Expiry: Removes snapshot metadata beyond the retention window (typically 7–30 days). Old snapshots accumulate in the metadata.json file and manifest lists, increasing metadata read overhead for every query.
- Orphan File Removal: Deletes data files in the table's storage prefix that are not referenced by any active snapshot — left behind by failed write operations, aborted transactions, or manual file operations.

Automation with Dremio and Apache Spark
Dremio provides managed automatic optimization for tables in Dremio Open Catalog. Enabled per-table with a DDL statement:
ALTER TABLE catalog.schema.orders
SET TBLPROPERTIES (
'optimize.target-file-size-mb' = '256',
'optimize.auto-clean-orphan-files' = 'true',
'optimize.snapshot-expire-days' = '7'
);Dremio's background optimization service monitors table write activity and triggers the appropriate maintenance operations automatically. For self-managed Iceberg tables on Spark, Apache Airflow DAGs can schedule Spark procedures:
-- Spark compaction
CALL catalog.system.rewrite_data_files('schema.orders');
-- Snapshot expiry
CALL catalog.system.expire_snapshots('schema.orders', TIMESTAMP '7 days ago');

Summary
Automated table optimization is the operational maturity milestone that transforms Iceberg table management from a labor-intensive engineering task into a self-maintaining system. By automating compaction, Z-ordering, snapshot expiry, and orphan file cleanup, organizations maintain consistently high query performance without dedicating engineering time to table maintenance. Dremio's managed automatic optimization delivers this capability for tables in Dremio Open Catalog, while Apache Airflow + Spark procedures provide the self-managed equivalent for broader Iceberg deployments.