What Is MLflow?
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. Originally developed at Databricks and now a Linux Foundation project, MLflow provides four core capabilities: Tracking (logging parameters, metrics, and artifacts for each experiment run), Projects (packaging ML code for reproducibility), Models (packaging models in a standardized format for deployment), and Registry (managing model versions and stage transitions from development to production).
In the data lakehouse context, MLflow is the ML governance layer that sits alongside Iceberg's data governance — while Apache Iceberg governs the data assets (tables, schemas, versions), MLflow governs the ML artifacts (experiments, model versions, deployment stages) that are produced from that data.
MLflow Experiment Tracking with Iceberg
The MLflow + Iceberg integration pattern for reproducible ML experiments:
import mlflow
from pyiceberg.catalog import load_catalog
with mlflow.start_run(run_name='customer-churn-v3'):
# Log the Iceberg data version used for training
catalog = load_catalog('polaris', uri='...')
table = catalog.load_table('gold.customer_features')
snapshot_id = table.current_snapshot().snapshot_id
mlflow.log_param('training_snapshot_id', snapshot_id)
mlflow.log_param('training_date', '2026-05-14')
# Train model...
mlflow.log_metric('auc', 0.89)
mlflow.sklearn.log_model(model, 'churn_model')By logging the Iceberg snapshot_id alongside model parameters, each MLflow run creates a complete lineage link between the model and the exact data version used for training — enabling perfect reproducibility.

MLflow Model Registry
The MLflow Model Registry provides governance for production ML deployments:
- Version management: Each model artifact (sklearn, PyTorch, Spark ML) is versioned in the registry with a unique version number
- Stage transitions: Models progress through stages: None → Staging → Production → Archived. Stage promotion requires explicit API calls — preventing accidental production promotion
- Annotations: Model versions are annotated with descriptions, data lineage (which Iceberg snapshot trained the model), validation metrics, and approval status
- Deployment integration: MLflow Models can be served as REST APIs using mlflow models serve, or deployed to cloud ML serving platforms (SageMaker, Azure ML, Databricks serving)

Summary
MLflow is the ML governance companion to Apache Iceberg in the open lakehouse ML stack. While Iceberg governs data assets, MLflow governs ML artifacts — tracking experiments, versioning models, and managing production deployments. The combination of Iceberg feature tables + Nessie tags for data versioning + MLflow for experiment tracking creates a complete, reproducible, governed ML platform on the open lakehouse — without requiring proprietary ML platforms that create additional vendor dependencies alongside the data platform.