Track experiments and models with MLflow

Article
11/08/2024

Tracking is the process of saving relevant information about experiments. In this article, you learn how to use MLflow for tracking experiments and runs in Azure Machine Learning workspaces.

Some methods available in the MLflow API might not be available when connected to Azure Machine Learning. For details about supported and unsupported operations, see Support matrix for querying runs and experiments. You can also learn about the supported MLflow functionalities in Azure Machine Learning from the article MLflow and Azure Machine Learning.

Note

To track experiments running on Azure Databricks, see Track Azure Databricks ML experiments with MLflow and Azure Machine Learning.

Prerequisites

Have an Azure subscription with the Azure Machine Learning.
To run Azure CLI and Python commands, install Azure CLI v2 and the Azure Machine Learning SDK v2 for Python. The ml extension for Azure CLI installs automatically the first time you run an Azure Machine Learning CLI command.

Install the MLflow SDK mlflow package and the Azure Machine Learning azureml-mlflow plugin for MLflow as follows:
```
pip install mlflow azureml-mlflow
```
Tip

You can use the mlflow-skinny package, which is a lightweight MLflow package without SQL storage, server, UI, or data science dependencies. This package is recommended for users who primarily need the MLflow tracking and logging capabilities without importing the full suite of features, including deployments.
Create an Azure Machine Learning workspace. To create a workspace, see Create resources you need to get started. Review the access permissions you need to perform your MLflow operations in your workspace.
To do remote tracking, or track experiments running outside Azure Machine Learning, configure MLflow to point to the tracking URI of your Azure Machine Learning workspace. For more information on how to connect MLflow to your workspace, see Configure MLflow for Azure Machine Learning.

Configure the experiment

MLflow organizes information in experiments and runs. Runs are called jobs in Azure Machine Learning. By default, runs log to an automatically created experiment named Default, but you can configure which experiment to track.

Notebooks
Jobs

For interactive training, such as in a Jupyter notebook, use the MLflow command mlflow.set_experiment(). For example, the following code snippet configures an experiment:

experiment_name = 'hello-world-example'
mlflow.set_experiment(experiment_name)

To submit jobs by using the Azure Machine Learning CLI or SDK, set the experiment name by using the experiment_name property of the job. You don't have to configure the experiment name in your training script.

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
tags:
  hello: world
display_name: hello-world-example
experiment_name: hello-world-example
description: |

Configure the run

Azure Machine Learning tracks training jobs in what MLflow calls runs. Use runs to capture all the processing that your job performs.

Notebooks
Jobs

When you work interactively, MLflow starts tracking your training routine as soon as you log information that requires an active run. For instance, if Mlflow's autologging functionality is enabled, MLflow tracking starts when you log a metric or parameter, or start a training cycle.

However, it's usually helpful to start the run explicitly, especially if you want to capture the total time for your experiment in the Duration field. To start the run explicitly, use mlflow.start_run().

Whether you start the run manually or not, you eventually need to stop the run, so that MLflow knows that your experiment run is done and can mark the run's status as Completed. To stop a run, use mlflow.end_run().

The following code starts a run manually and ends it at the end of the notebook:

mlflow.start_run()

# Your code

mlflow.end_run()

It's best to start runs manually so you don't forget to end them. You can use the context manager paradigm to help you remember to end the run.

with mlflow.start_run() as run:
    # Your code

When you start a new run with mlflow.start_run(), it can be useful to specify the run_name parameter, which later translates to the name of the run in the Azure Machine Learning user interface. This practice helps you identify the run more quickly.

with mlflow.start_run(run_name="hello-world-example") as run:
    # Your code

Azure Machine Learning jobs allow you to submit long-running training or inference routines as isolated and reproducible executions.

Create a training routine that has tracking

When you work with jobs, you typically place all your training logic as files inside a folder, such as src. One of the files is a Python file with your training code entry point.

In your training routine, you can use the MLflow SDK to track any metric, parameter, artifacts, or models. For examples, see Log metrics, parameters, and files with MLflow.

The following example shows a hello_world.py training routine that adds logging:

# imports
import os
import mlflow

from random import random

# define functions
def main():
    mlflow.log_param("hello_param", "world")
    mlflow.log_metric("hello_metric", random())
    os.system(f"echo 'hello world' > helloworld.txt")
    mlflow.log_artifact("helloworld.txt")


# run functions
if __name__ == "__main__":
    # run main function
    main()

The previous code example doesn't use mlflow.start_run() but if used, MLflow reuses the current active run. Therefore, you don't need to remove the mlflow.start_run() line if you migrate code to Azure Machine Learning.

Ensure your job's environment has MLflow installed

All Azure Machine Learning curated environments already have MLflow installed. However, if you use a custom environment, create a conda.yaml file that has the dependencies you need, and reference the environment in your job.

channels:
- conda-forge
dependencies:
- python=3.8.5
- pip
- pip:
  - mlflow
  - azureml-mlflow
  - fastparquet
  - cloudpickle==1.6.0
  - colorama==0.4.4
  - dask==2.30.0

Configure the job name

Use the Azure Machine Learning jobs parameter display_name to configure the name of the run.

Use the display_name property to configure the job.

Azure CLI
Python SDK

To configure the job, create a YAML file with your job definition in a job.yml file outside of the src directory.

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
tags:
  hello: world
display_name: hello-world-example
experiment_name: hello-world-example
description: |

from azure.ai.ml import command, Environment

command_job = command(
    code="src",
    command="echo "hello world",
    environment=Environment(image="library/python:latest"),
    compute="cpu-cluster",
    display_name="hello-world-example"
)

Make sure not to use mlflow.start_run(run_name="") inside your training routine.

Submit the job

The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the Azure Machine Learning artifacts you create. Connect to the Azure Machine Learning workspace.

Azure CLI
Python SDK

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

Import the required libraries:

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

Configure workspace details and get a handle to the workspace:

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

Open your terminal and use the following code to submit the job. Jobs that use MLflow and run on Azure Machine Learning automatically log any tracking information to the workspace.
- Azure CLI
- Python SDK
Use the Azure Machine Learning CLI to submit your job.
```
az ml job create -f job.yml --web
```
Use the Python SDK to submit your job.
```
returned_job = ml_client.jobs.create_or_update(command_job)
returned_job.studio_url
```
Monitor the job progress in Azure Machine Learning studio.

Enable MLflow autologging

You can log metrics, parameters, and files with MLflow manually, and you can also rely on MLflow's automatic logging capability. Each machine learning framework supported by MLflow determines what to track automatically for you.

To enable automatic logging, insert the following code before your training code:

mlflow.autolog()

View metrics and artifacts in your workspace

The metrics and artifacts from MLflow logging are tracked in your workspace. You can view and access them in Azure Machine Learning studio or access them programatically via the MLflow SDK.

To view metrics and artifacts in the studio:

On the Jobs page in your workspace, select the experiment name.
On the experiment details page, select the Metrics tab.
Select logged metrics to render charts on the right side. You can customize the charts by applying smoothing, changing the color, or plotting multiple metrics on a single graph. You can also resize and rearrange the layout.
Once you create your desired view, save it for future use and share it with your teammates by using a direct link.

To access or query metrics, parameters, and artifacts programatically via the MLflow SDK, use mlflow.get_run().

import mlflow

run = mlflow.get_run("<RUN_ID>")

metrics = run.data.metrics
params = run.data.params
tags = run.data.tags

print(metrics, params, tags)

Tip

The preceding example returns only the last value of a given metric. To retrieve all the values of a given metric, use the mlflow.get_metric_history method. For more information on retrieving metrics values, see Get params and metrics from a run.

To download artifacts you logged, such as files and models, use mlflow.artifacts.download_artifacts().

mlflow.artifacts.download_artifacts(run_id="<RUN_ID>", artifact_path="helloworld.txt")

For more information about how to retrieve or compare information from experiments and runs in Azure Machine Learning by using MLflow, see Query & compare experiments and runs with MLflow.