APPLIES TO:
Azure CLI ml extension v2 (current)
The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/sweepJob.schema.json.
Note
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension.
You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
Key |
Type |
Description |
Allowed values |
Default value |
$schema |
string |
The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, you can invoke schema and resource completions if you include $schema at the top of your file. |
|
|
type |
const |
Required. The type of job. |
sweep |
sweep |
name |
string |
Name of the job. Must be unique across all jobs in the workspace. If omitted, Azure Machine Learning autogenerates a GUID for the name. |
|
|
display_name |
string |
Display name of the job in the studio UI. Can be non-unique within the workspace. If omitted, Azure Machine Learning autogenerates a human-readable adjective-noun identifier for the display name. |
|
|
experiment_name |
string |
Organize the job under the experiment name. The run record of each job is organized under the corresponding experiment in the "Experiments" tab of the studio. If omitted, Azure Machine Learning defaults experiment_name to the name of the working directory where the job was created. |
|
|
description |
string |
Description of the job. |
|
|
tags |
object |
Dictionary of tags for the job. |
|
|
sampling_algorithm |
object |
Required. The hyperparameter sampling algorithm to use over the search_space . One of RandomSamplingAlgorithm, GridSamplingAlgorithm,or BayesianSamplingAlgorithm. |
|
|
search_space |
object |
Required. Dictionary of the hyperparameter search space. The hyperparameter name is the key, and the value is the parameter expression.
Hyperparameters can be referenced in the trial.command with the ${{ search_space.<hyperparameter> }} expression. |
|
|
search_space.<hyperparameter> |
object |
Visit Parameter expressions for the set of possible expressions to use. |
|
|
objective.primary_metric |
string |
Required. The name of the primary metric reported by each trial job. The metric must be logged in the user's training script, using mlflow.log_metric() with the same corresponding metric name. |
|
|
objective.goal |
string |
Required. The optimization goal of the objective.primary_metric . |
maximize , minimize |
|
early_termination |
object |
The early termination policy to use. A trial job is canceled when the criteria of the specified policy are met. If omitted, no early termination policy is applied. One of BanditPolicy, MedianStoppingPolicy,or TruncationSelectionPolicy. |
|
|
limits |
object |
Limits for the sweep job. See Attributes of the limits key. |
|
|
compute |
string |
Required. Name of the compute target on which to execute the job, with the azureml:<compute_name> syntax. |
|
|
trial |
object |
Required. The job template for each trial. Each trial job is provided with a different combination of hyperparameter values that the system samples from the search_space . Visit Attributes of the trial key. |
|
|
inputs |
object |
Dictionary of inputs to the job. The key is a name for the input within the context of the job and the value is the input value.
Inputs can be referenced in the command using the ${{ inputs.<input_name> }} expression. |
|
|
inputs.<input_name> |
number, integer, boolean, string, or object |
One of a literal value (of type number, integer, boolean, or string) or an object that contains a job input data specification. |
|
|
outputs |
object |
Dictionary of output configurations of the job. The key is a name for the output within the context of the job and the value is the output configuration.
Outputs can be referenced in the command using the ${{ outputs.<output_name> }} expression. |
|
|
outputs.<output_name> |
object |
You can leave the object empty, and in that case, by default the output is of uri_folder type and Azure Machine Learning system-generates an output location for the output. All files to the output directory are written via read-write mount. To specify a different mode for the output, provide an object that contains the job output specification. |
|
|
identity |
object |
The identity is used for data accessing. It can be User Identity Configuration, Managed Identity Configuration or None. For UserIdentityConfiguration, the identity of job submitter is used to access input data and write result to output folder. Otherwise, the managed identity of the compute target is used. |
|
|
Sampling algorithms
RandomSamplingAlgorithm
Key |
Type |
Description |
Allowed values |
Default value |
type |
const |
Required. The type of sampling algorithm. |
random |
|
seed |
integer |
A random seed to use to initialize the random number generation. If omitted, the default seed value is null. |
|
|
rule |
string |
The type of random sampling to use. The default, random , uses simple uniform random sampling, while sobol uses the Sobol quasi-random sequence. |
random , sobol |
random |
GridSamplingAlgorithm
Key |
Type |
Description |
Allowed values |
type |
const |
Required. The sampling algorithm type. |
grid |
BayesianSamplingAlgorithm
Key |
Type |
Description |
Allowed values |
type |
const |
Required. The sampling algorithm type. |
bayesian |
Early termination policies
BanditPolicy
Key |
Type |
Description |
Allowed values |
Default value |
type |
const |
Required. The policy type. |
bandit |
|
slack_factor |
number |
The ratio used to calculate the allowed distance from the best performing trial. One of slack_factor or slack_amount is required. |
|
|
slack_amount |
number |
The absolute distance allowed from the best performing trial. One of slack_factor or slack_amount is required. |
|
|
evaluation_interval |
integer |
The frequency for applying the policy. |
|
1 |
delay_evaluation |
integer |
The number of intervals for which to delay the first policy evaluation. If specified, the policy applies on every multiple of evaluation_interval that is greater than or equal to delay_evaluation . |
|
0 |
Key |
Type |
Description |
Allowed values |
Default value |
type |
const |
Required. The policy type. |
median_stopping |
|
evaluation_interval |
integer |
The frequency for applying the policy. |
|
1 |
delay_evaluation |
integer |
The number of intervals for which to delay the first policy evaluation. If specified, the policy applies on every multiple of evaluation_interval that is greater than or equal to delay_evaluation . |
|
0 |
TruncationSelectionPolicy
Key |
Type |
Description |
Allowed values |
Default value |
type |
const |
Required. The policy type. |
truncation_selection |
|
truncation_percentage |
integer |
Required. The percentage of trial jobs to cancel at each evaluation interval. |
|
|
evaluation_interval |
integer |
The frequency for applying the policy. |
|
1 |
delay_evaluation |
integer |
The number of intervals for which to delay the first policy evaluation. If specified, the policy applies on every multiple of evaluation_interval that is greater than or equal to delay_evaluation . |
|
0 |
Parameter expressions
Choice
Key |
Type |
Description |
Allowed values |
type |
const |
Required. The expression type. |
choice |
values |
array |
Required. The list of discrete values from which to choose. |
|
Randint
Key |
Type |
Description |
Allowed values |
type |
const |
Required. The expression type. |
randint |
upper |
integer |
Required. The exclusive upper bound for the range of integers. |
|
Qlognormal, qnormal
Key |
Type |
Description |
Allowed values |
type |
const |
Required. The expression type. |
qlognormal , qnormal |
mu |
number |
Required. The mean of the normal distribution. |
|
sigma |
number |
Required. The standard deviation of the normal distribution. |
|
q |
integer |
Required. The smoothing factor. |
|
Key |
Type |
Description |
Allowed values |
type |
const |
Required. The expression type. |
qloguniform , quniform |
min_value |
number |
Required. The minimum value in the range (inclusive). |
|
max_value |
number |
Required. The maximum value in the range (inclusive). |
|
q |
integer |
Required. The smoothing factor. |
|
Lognormal, normal
Key |
Type |
Description |
Allowed values |
type |
const |
Required. The expression type. |
lognormal , normal |
mu |
number |
Required. The mean of the normal distribution. |
|
sigma |
number |
Required. The standard deviation of the normal distribution. |
|
Key |
Type |
Description |
Allowed values |
type |
const |
Required. The expression type. |
loguniform |
min_value |
number |
Required. The minimum value in the range is exp(min_value) (inclusive). |
|
max_value |
number |
Required. The maximum value in the range is exp(max_value) (inclusive). |
|
Key |
Type |
Description |
Allowed values |
type |
const |
Required. The expression type. |
uniform |
min_value |
number |
Required. The minimum value in the range (inclusive). |
|
max_value |
number |
Required. The maximum value in the range (inclusive). |
|
Attributes of the limits
key
Key |
Type |
Description |
Default value |
max_total_trials |
integer |
The maximum number of trial jobs. |
1000 |
max_concurrent_trials |
integer |
The maximum number of trial jobs that can run concurrently. |
Defaults to max_total_trials . |
timeout |
integer |
The maximum time in seconds, that the entire sweep job is allowed to run. Once this limit is reached, the system cancels the sweep job, including all of its trials. |
5184000 |
trial_timeout |
integer |
The maximum time in seconds each trial job is allowed to run. Once this limit is reached, the system cancels the trial. |
|
Attributes of the trial
key
Key |
Type |
Description |
Default value |
command |
string |
Required. The command to execute. |
|
code |
string |
Local path to the source code directory to be uploaded and used for the job. |
|
environment |
string or object |
Required. The environment to use for the job. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. To reference an existing environment, use the azureml:<environment-name>:<environment-version> syntax.
To define an environment inline, follow the Environment schema. Exclude the name and version properties because inline environments don't support them. |
|
environment_variables |
object |
Dictionary of environment variable name-value pairs to set on the process where the command is executed. |
|
distribution |
object |
The distribution configuration for distributed training scenarios. One of Mpi Configuration, PyTorch Configuration, or TensorFlow Configuration. |
|
resources.instance_count |
integer |
The number of nodes to use for the job. |
1 |
Distribution configurations
MpiConfiguration
Key |
Type |
Description |
Allowed values |
type |
const |
Required. Distribution type. |
mpi |
process_count_per_instance |
integer |
Required. The number of processes per node to launch for the job. |
|
PyTorchConfiguration
Key |
Type |
Description |
Allowed values |
Default value |
type |
const |
Required. Distribution type. |
pytorch |
|
process_count_per_instance |
integer |
The number of processes per node to launch for the job. |
|
1 |
TensorFlowConfiguration
Key |
Type |
Description |
Allowed values |
Default value |
type |
const |
Required. Distribution type. |
tensorflow |
|
worker_count |
integer |
The number of workers to launch for the job. |
|
Defaults to resources.instance_count . |
parameter_server_count |
integer |
The number of parameter servers to launch for the job. |
|
0 |
Key |
Type |
Description |
Allowed values |
Default value |
type |
string |
The type of job input. Specify uri_file for input data that points to a single file source, or uri_folder for input data that points to a folder source. For more information, visit Learn more about data access. |
uri_file , uri_folder , mltable , mlflow_model |
uri_folder |
path |
string |
The path to the data to use as input. This value can be specified in a few ways:
- A local path to the data source file or folder, for example, path: ./iris.csv . The data uploads during job submission.
- A URI of a cloud path to the file or folder to use as the input. Supported URI types are azureml , https , wasbs , abfss , adl . For more information about use of the azureml:// URI format, visit Core yaml syntax.
- An existing registered Azure Machine Learning data asset to use as the input. To reference a registered data asset, use the azureml:<data_name>:<data_version> syntax or azureml:<data_name>@latest (to reference the latest version of that data asset) - for example, path: azureml:cifar10-data:1 or path: azureml:cifar10-data@latest . |
|
|
mode |
string |
Mode of how the data should be delivered to the compute target.
For read-only mount (ro_mount ), the data is consumed as a mount path. A folder is mounted as a folder and a file is mounted as a file. Azure Machine Learning resolves the input to the mount path.
For download mode, the data is downloaded to the compute target. Azure Machine Learning resolves the input to the downloaded path.
For just the URL of the storage location of the data artifact or artifacts, instead of mounting or downloading the data itself, use the direct mode. This passes in the URL of the storage location as the job input. In this case, you're fully responsible for handling credentials to access the storage. |
ro_mount , download , direct |
ro_mount |
Job outputs
Key |
Type |
Description |
Allowed values |
Default value |
type |
string |
The job output type. For the default uri_folder type, the output corresponds to a folder. |
uri_file , uri_folder , mltable , mlflow_model |
uri_folder |
mode |
string |
Mode of the delivery of the output file or files to the destination storage. For the read-write mount mode (rw_mount ), the output directory is a mounted directory. For the upload mode, all files written are uploaded at the end of the job. |
rw_mount , upload |
rw_mount |
Identity configurations
UserIdentityConfiguration
Key |
Type |
Description |
Allowed values |
type |
const |
Required. Identity type. |
user_identity |
ManagedIdentityConfiguration
Key |
Type |
Description |
Allowed values |
type |
const |
Required. Identity type. |
managed or managed_identity |
You can use the az ml job
command to manage Azure Machine Learning jobs.
Examples
Visit the examples GitHub repository for examples. Several are shown here:
YAML: hello sweep
$schema: https://azuremlschemas.azureedge.net/latest/sweepJob.schema.json
type: sweep
trial:
command: >-
python hello-sweep.py
--A ${{inputs.A}}
--B ${{search_space.B}}
--C ${{search_space.C}}
code: src
environment: azureml://registries/azureml/environments/sklearn-1.0/labels/latest
inputs:
A: 0.5
sampling_algorithm: random
search_space:
B:
type: choice
values: ["hello", "world", "hello_world"]
C:
type: uniform
min_value: 0.1
max_value: 1.0
objective:
goal: minimize
primary_metric: random_metric
limits:
max_total_trials: 4
max_concurrent_trials: 2
timeout: 3600
display_name: hello-sweep-example
experiment_name: hello-sweep-example
description: Hello sweep job example.
YAML: basic Python model hyperparameter tuning
$schema: https://azuremlschemas.azureedge.net/latest/sweepJob.schema.json
type: sweep
trial:
code: src
command: >-
python main.py
--iris-csv ${{inputs.iris_csv}}
--C ${{search_space.C}}
--kernel ${{search_space.kernel}}
--coef0 ${{search_space.coef0}}
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
inputs:
iris_csv:
type: uri_file
path: wasbs://datasets@azuremlexamples.blob.core.chinacloudapi.cn/iris.csv
compute: azureml:cpu-cluster
sampling_algorithm: random
search_space:
C:
type: uniform
min_value: 0.5
max_value: 0.9
kernel:
type: choice
values: ["rbf", "linear", "poly"]
coef0:
type: uniform
min_value: 0.1
max_value: 1
objective:
goal: minimize
primary_metric: training_f1_score
limits:
max_total_trials: 20
max_concurrent_trials: 10
timeout: 7200
display_name: sklearn-iris-sweep-example
experiment_name: sklearn-iris-sweep-example
description: Sweep hyperparemeters for training a scikit-learn SVM on the Iris dataset.
Next steps