What Is Z-Ordering?
Z-ordering (also called Z-order clustering or multi-dimensional clustering) is a data layout optimization technique that physically organizes rows in data files to maximize the effectiveness of data skipping for multi-dimensional query filters. Unlike simple sorting by a single column (which is only optimal for queries filtering on that column), Z-ordering produces a data layout that is efficient for queries filtering on any combination of the clustered columns.
The technique is named for the Z-shaped path (resembling the letter Z or the Morton code curve) that the Z-curve traces when applied to two-dimensional integer coordinates. Applied to columnar data: the Z-curve interleaves the binary representations of multiple column values into a single composite sort key, such that rows with similar values across all dimensions cluster together in sorted order.
In practical terms: if a table is Z-ordered by (customer_id, product_category), then data files will tend to contain rows with similar customer_ids AND similar product_categories. A query filtering on customer_id = 'abc' alone can skip most files. A query filtering on product_category = 'Electronics' alone can skip most files. A query filtering on both can skip even more files. All three query patterns benefit from data skipping — no single-column sort provides this multi-dimensional coverage.
How Z-Ordering Improves Data Skipping
Data skipping in Apache Iceberg works through per-file column statistics: each data file records the min and max value for each column it contains. At query time, the engine uses these statistics to skip files whose min-max range does not overlap with the query's filter predicates.
For data skipping to be effective, files must contain rows with similar values for the queried columns — so that their min-max ranges are narrow and don't overlap with ranges that contain no relevant rows. Poor data layout (random file assignment) produces wide min-max ranges for every file, and no files can be skipped.
Z-ordering produces narrow per-file min-max ranges for all clustered dimensions simultaneously. A file in a Z-ordered table by (region, product_id) contains rows from a limited range of regions and a limited range of product IDs — its min-max statistics are tight. A query for a specific region AND product_id can skip all files whose region range or product_id range does not include the query values.

Z-Ordering vs. Single-Column Sorting
| Dimension | Single-Column Sort | Z-Ordering |
|---|---|---|
| Best query filter | Sort column only | Any clustered column or combination |
| File skipping for filter on col A | Excellent (if sorted on A) | Good |
| File skipping for filter on col B | Poor (if sorted on A) | Good |
| File skipping for filter on A AND B | Good for A, poor for B | Excellent |
| Best for | Predictable, single-column query patterns | Ad-hoc analytics with variable filters |
Z-ordering is most valuable for tables used in exploratory analytics where query patterns vary — analysts filter on different column combinations depending on the question they are answering. For operational dashboards with fixed, predictable filters, simple sorting on the primary filter column is often sufficient and has lower computational overhead.
Z-Ordering in Practice with Dremio
Dremio applies Z-ordering as part of its Automated Table Optimization feature. When Z-ordering is configured for an Iceberg table, Dremio's background optimization process applies the Z-curve algorithm to the table's data files during compaction runs, rewriting files with optimally clustered row ordering.
Configuring Z-ordering in Dremio: ALTER TABLE my_table CLUSTER BY (region, product_category, customer_segment). After configuration, Dremio's optimizer automatically maintains the clustering during future compaction cycles. Dremio can also combine Z-ordering with file size optimization — ensuring that Z-ordered files are also optimally sized, not just optimally ordered.

Z-Ordering Best Practices
Applying Z-ordering effectively requires careful column selection:
- Cluster by 2–4 high-impact columns. Z-ordering effectiveness diminishes with more than 4 columns — the Z-curve space becomes too high-dimensional for effective clustering. Choose the 2–4 columns most commonly used in query filters.
- Prioritize high-cardinality filter columns. Z-ordering is most effective for columns with many distinct values (product_id, customer_id, transaction_id). Low-cardinality columns (status, region) benefit more from partitioning than Z-ordering.
- Apply Z-ordering at the Gold layer. The Gold layer tables queried by BI tools benefit most from Z-ordering — the query patterns are known, and the read performance improvement directly impacts user-facing dashboard latency.
- Combine with partitioning. Z-ordering works within partitions. A table partitioned by
days(event_date)and Z-ordered by(customer_id, product_id)within partitions benefits from partition pruning for date filters AND Z-order skipping for customer/product filters.
Summary
Z-ordering is the most powerful data layout optimization for tables with diverse, multi-dimensional query patterns. By co-locating rows with similar values across multiple clustered dimensions, Z-ordering enables aggressive data skipping for any combination of filter predicates — making it ideal for the ad-hoc analytical workloads of the Gold layer.
Combined with hidden partitioning (for coarse-grained file elimination) and file size compaction (for optimal I/O), Z-ordering is the final layer of a comprehensive data layout optimization strategy for Apache Iceberg tables. Dremio's automated table optimization applies all three layers transparently, maintaining optimal table performance without manual DBA intervention.