使用自动化机器学习训练计算机视觉模型的数据架构 (v1)

适用于:Python SDK azureml v1

重要

本文中的一些 Azure CLI 命令使用适用于 Azure 机器学习的 azure-cli-ml 或 v1 扩展。 对 v1 扩展的支持将于 2025 年 9 月 30 日结束。 在该日期之前,你将能够安装和使用 v1 扩展。

建议在 2025 年 9 月 30 日之前转换为 ml 或 v2 扩展。 有关 v2 扩展的详细信息,请参阅 Azure ML CLI 扩展和 Python SDK v2

重要

此功能目前处于公开预览状态。 此预览版不附带服务级别协议。 某些功能可能不受支持或者受限。 有关详细信息,请参阅适用于 Azure 预览版的补充使用条款

了解如何设置 JSONL 文件格式,以便在训练和推理期间在计算机视觉任务的自动化 ML 实验中使用数据。

用于训练的数据架构

Azure 机器学习的图像 AutoML 要求以 JSONL(JSON 行)格式准备输入图像数据。 本部分介绍多类图像分类、多标签图像分类、对象检测和实例分段的输入数据格式或架构。 我们还将提供最终训练或验证 JSON 行文件的示例。

图像分类(二进制/多类)

每个 JSON 行中的输入数据格式/架构:

{
   "image_url":"AmlDatastore://data_directory/../Image_name.image_format",
   "image_details":{
      "format":"image_format",
      "width":"image_width",
      "height":"image_height"
   },
   "label":"class_name",
}
密钥 说明 示例
image_url Azure 机器学习数据存储中的图像位置
Required, String
"AmlDatastore://data_directory/Image_01.jpg"
image_details 图像详细信息
Optional, Dictionary
"image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format 图像类型(支持 Pillow 库中所有可用的图像格式)
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif","bmp", "tif", "tiff"}
"jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width 图像的宽度
Optional, String or Positive Integer
"400px" or 400
height 图像的高度
Optional, String or Positive Integer
"200px" or 200
label 图像的类/标签
Required, String
"cat"

多类图像分类的 JSONL 文件示例:

{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details":{"format": "jpg", "width": "400px", "height": "258px"}, "label": "can"}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "397px", "height": "296px"}, "label": "milk_bottle"}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "1024px", "height": "768px"}, "label": "water_bottle"}

多类图像分类的图像示例。

多标签图像分类

下面是每个 JSON 行中用于图像分类的输入数据格式/架构示例。

{
   "image_url":"AmlDatastore://data_directory/../Image_name.image_format",
   "image_details":{
      "format":"image_format",
      "width":"image_width",
      "height":"image_height"
   },
   "label":[
      "class_name_1",
      "class_name_2",
      "class_name_3",
      "...",
      "class_name_n"
        
   ]
}
密钥 说明 示例
image_url Azure 机器学习数据存储中的图像位置
Required, String
"AmlDatastore://data_directory/Image_01.jpg"
image_details 图像详细信息
Optional, Dictionary
"image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format 图像类型(支持 Pillow 库中所有可用的图像格式)
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif", "bmp", "tif", "tiff"}
"jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width 图像的宽度
Optional, String or Positive Integer
"400px" or 400
height 图像的高度
Optional, String or Positive Integer
"200px" or 200
label 图像中的类/标签列表
Required, List of Strings
["cat","dog"]

多标签图像分类的 JSONL 文件示例:

{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details":{"format": "jpg", "width": "400px", "height": "258px"}, "label": ["can"]}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "397px", "height": "296px"}, "label": ["can","milk_bottle"]}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "1024px", "height": "768px"}, "label": ["carton","milk_bottle","water_bottle"]}

多标签图像分类的图像示例。

对象检测

下面是用于对象检测的示例 JSONL 文件。

{
   "image_url":"AmlDatastore://data_directory/../Image_name.image_format",
   "image_details":{
      "format":"image_format",
      "width":"image_width",
      "height":"image_height"
   },
   "label":[
      {
         "label":"class_name_1",
         "topX":"xmin/width",
         "topY":"ymin/height",
         "bottomX":"xmax/width",
         "bottomY":"ymax/height",
         "isCrowd":"isCrowd"
      },
      {
         "label":"class_name_2",
         "topX":"xmin/width",
         "topY":"ymin/height",
         "bottomX":"xmax/width",
         "bottomY":"ymax/height",
         "isCrowd":"isCrowd"
      },
      "..."
   ]
}

其中:

  • xmin = 边界框左上角的 x 坐标
  • ymin = 边界框左上角的 y 坐标
  • xmax = 边界框右下角的 x 坐标
  • ymax = 边界框右下角的 y 坐标
密钥 说明 示例
image_url Azure 机器学习数据存储中的图像位置
Required, String
"AmlDatastore://data_directory/Image_01.jpg"
image_details 图像详细信息
Optional, Dictionary
"image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format 图像类型(支持 Pillow 库中提供的所有图像格式。但对于 YOLO,仅支持 opencv 允许的图像格式)
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif", "bmp", "tif", "tiff"}
"jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width 图像的宽度
Optional, String or Positive Integer
"499px" or 499
height 图像的高度
Optional, String or Positive Integer
"665px" or 665
label(外部键) 边界框列表,其中每个框都是其左上方和右下方坐标的 label, topX, topY, bottomX, bottomY, isCrowd 字典
Required, List of dictionaries
[{"label": "cat", "topX": 0.260, "topY": 0.406, "bottomX": 0.735, "bottomY": 0.701, "isCrowd": 0}]
label(内部键) 边界框中对象的类/标签
Required, String
"cat"
topX 边界框左上角的 x 坐标与图像宽度的比率
Required, Float in the range [0,1]
0.260
topY 边界框左上角的 y 坐标与图像高度的比率
Required, Float in the range [0,1]
0.406
bottomX 边界框右下角的 x 坐标与图像宽度的比率
Required, Float in the range [0,1]
0.735
bottomY 边界框右下角的 y 坐标与图像高度的比率
Required, Float in the range [0,1]
0.701
isCrowd 指示边界框是否围绕对象群。 如果设置了此特殊标志,我们在计算指标时将跳过此特定边界框。
Optional, Bool
0

用于对象检测的 JSONL 文件示例:

{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "can", "topX": 0.260, "topY": 0.406, "bottomX": 0.735, "bottomY": 0.701, "isCrowd": 0}]}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "carton", "topX": 0.172, "topY": 0.153, "bottomX": 0.432, "bottomY": 0.659, "isCrowd": 0}, {"label": "milk_bottle", "topX": 0.300, "topY": 0.566, "bottomX": 0.891, "bottomY": 0.735, "isCrowd": 0}]}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "carton", "topX": 0.0180, "topY": 0.297, "bottomX": 0.380, "bottomY": 0.836, "isCrowd": 0}, {"label": "milk_bottle", "topX": 0.454, "topY": 0.348, "bottomX": 0.613, "bottomY": 0.683, "isCrowd": 0}, {"label": "water_bottle", "topX": 0.667, "topY": 0.279, "bottomX": 0.841, "bottomY": 0.615, "isCrowd": 0}]}

物体检测的图像示例。

实例分段

对于实例分段,自动化 ML 仅支持多边形作为输入和输出,不支持掩码。

下面是实例分段的示例 JSONL 文件。

{
   "image_url":"AmlDatastore://data_directory/../Image_name.image_format",
   "image_details":{
      "format":"image_format",
      "width":"image_width",
      "height":"image_height"
   },
   "label":[
      {
         "label":"class_name",
         "isCrowd":"isCrowd",
         "polygon":[["x1", "y1", "x2", "y2", "x3", "y3", "...", "xn", "yn"]]
      }
   ]
}
密钥 说明 示例
image_url Azure 机器学习数据存储中的图像位置
Required, String
"AmlDatastore://data_directory/Image_01.jpg"
image_details 图像详细信息
Optional, Dictionary
"image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format 映像类型
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif", "bmp", "tif", "tiff" }
"jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width 图像的宽度
Optional, String or Positive Integer
"499px" or 499
height 图像的高度
Optional, String or Positive Integer
"665px" or 665
label(外部键) 掩码列表,其中每个掩码都是 label, isCrowd, polygon coordinates 的字典
Required, List of dictionaries
[{"label": "can", "isCrowd": 0, "polygon": [[0.577, 0.689,
0.562, 0.681,
0.559, 0.686]]}]
label(内部键) 掩码中对象的类/标签
Required, String
"cat"
isCrowd 指示掩码是否围绕对象群
Optional, Bool
0
polygon 对象的多边形坐标
Required, List of list for multiple segments of the same instance. Float values in the range [0,1]
[[0.577, 0.689, 0.567, 0.689, 0.559, 0.686]]

实例分段的 JSONL 文件示例:

{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "can", "isCrowd": 0, "polygon": [[0.577, 0.689, 0.567, 0.689, 0.559, 0.686, 0.380, 0.593, 0.304, 0.555, 0.294, 0.545, 0.290, 0.534, 0.274, 0.512, 0.2705, 0.496, 0.270, 0.478, 0.284, 0.453, 0.308, 0.432, 0.326, 0.423, 0.356, 0.415, 0.418, 0.417, 0.635, 0.493, 0.683, 0.507, 0.701, 0.518, 0.709, 0.528, 0.713, 0.545, 0.719, 0.554, 0.719, 0.579, 0.713, 0.597, 0.697, 0.621, 0.695, 0.629, 0.631, 0.678, 0.619, 0.683, 0.595, 0.683, 0.577, 0.689]]}]}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "carton", "isCrowd": 0, "polygon": [[0.240, 0.65, 0.234, 0.654, 0.230, 0.647, 0.210, 0.512, 0.202, 0.403, 0.182, 0.267, 0.184, 0.243, 0.180, 0.166, 0.186, 0.159, 0.198, 0.156, 0.396, 0.162, 0.408, 0.169, 0.406, 0.217, 0.414, 0.249, 0.422, 0.262, 0.422, 0.569, 0.342, 0.569, 0.334, 0.572, 0.320, 0.585, 0.308, 0.624, 0.306, 0.648, 0.240, 0.657]]}, {"label": "milk_bottle",  "isCrowd": 0, "polygon": [[0.675, 0.732, 0.635, 0.731, 0.621, 0.725, 0.573, 0.717, 0.516, 0.717, 0.505, 0.720, 0.462, 0.722, 0.438, 0.719, 0.396, 0.719, 0.358, 0.714, 0.334, 0.714, 0.322, 0.711, 0.312, 0.701, 0.306, 0.687, 0.304, 0.663, 0.308, 0.630, 0.320, 0.596, 0.32, 0.588, 0.326, 0.579]]}]}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "water_bottle", "isCrowd": 0, "polygon": [[0.334, 0.626, 0.304, 0.621, 0.254, 0.603, 0.164, 0.605, 0.158, 0.602, 0.146, 0.602, 0.142, 0.608, 0.094, 0.612, 0.084, 0.599, 0.080, 0.585, 0.080, 0.539, 0.082, 0.536, 0.092, 0.533, 0.126, 0.530, 0.132, 0.533, 0.144, 0.533, 0.162, 0.525, 0.172, 0.525, 0.186, 0.521, 0.196, 0.521 ]]}, {"label": "milk_bottle", "isCrowd": 0, "polygon": [[0.392, 0.773, 0.380, 0.732, 0.379, 0.767, 0.367, 0.755, 0.362, 0.735, 0.362, 0.714, 0.352, 0.644, 0.352, 0.611, 0.362, 0.597, 0.40, 0.593, 0.444,  0.494, 0.588, 0.515, 0.585, 0.621, 0.588, 0.671, 0.582, 0.713, 0.572, 0.753 ]]}]}

实例分段的图像示例。

用于推理的数据格式

在本部分中,我们将记录在使用部署的模型时进行预测所需的输入数据格式。 可以接受内容类型为 application/octet-stream 的任何上述图像格式。

输入格式

下面是使用特定于任务的模型终结点对任何任务生成预测所需的输入格式。 部署模型后,我们可以使用以下代码段来获取所有任务的预测。

# input image for inference
sample_image = './test_image.jpg'
# load image data
data = open(sample_image, 'rb').read()
# set the content type
headers = {'Content-Type': 'application/octet-stream'}
# if authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'
# make the request and display the response
response = requests.post(scoring_uri, data, headers=headers)

输出格式

根据任务类型,对模型终结点进行的预测遵循不同的结构。 本部分将探讨多类、多标签图像分类、对象检测和实例分段任务的输出数据格式。

图像分类

图像分类的终结点返回数据集中的所有标签及其在输入图像中的概率分数,格式如下:

{
   "filename":"/tmp/tmppjr4et28",
   "probs":[
      2.098e-06,
      4.783e-08,
      0.999,
      8.637e-06
   ],
   "labels":[
      "can",
      "carton",
      "milk_bottle",
      "water_bottle"
   ]
}

多标签图像分类

对于多标签图像分类,模型终结点返回标签及其概率。

{
   "filename":"/tmp/tmpsdzxlmlm",
   "probs":[
      0.997,
      0.960,
      0.982,
      0.025
   ],
   "labels":[
      "can",
      "carton",
      "milk_bottle",
      "water_bottle"
   ]
}

对象检测

对象检测模型返回多个框,其中包含缩放后的左上角和右下角坐标,以及框标签和置信度分数。

{
   "filename":"/tmp/tmpdkg2wkdy",
   "boxes":[
      {
         "box":{
            "topX":0.224,
            "topY":0.285,
            "bottomX":0.399,
            "bottomY":0.620
         },
         "label":"milk_bottle",
         "score":0.937
      },
      {
         "box":{
            "topX":0.664,
            "topY":0.484,
            "bottomX":0.959,
            "bottomY":0.812
         },
         "label":"can",
         "score":0.891
      },
      {
         "box":{
            "topX":0.423,
            "topY":0.253,
            "bottomX":0.632,
            "bottomY":0.725
         },
         "label":"water_bottle",
         "score":0.876
      }
   ]
}

实例分段

在实例分段中,输出包含多个框,其中包含缩放后的左上角和右下角坐标、标签、置信度和多边形(非掩码)。 此处,多边形值与我们在“架构”部分中讨论的格式相同。

{
   "filename":"/tmp/tmpi8604s0h",
   "boxes":[
      {
         "box":{
            "topX":0.679,
            "topY":0.491,
            "bottomX":0.926,
            "bottomY":0.810
         },
         "label":"can",
         "score":0.992,
         "polygon":[
            [
               0.82, 0.811, 0.771, 0.810, 0.758, 0.805, 0.741, 0.797, 0.735, 0.791, 0.718, 0.785, 0.715, 0.778, 0.706, 0.775, 0.696, 0.758, 0.695, 0.717, 0.698, 0.567, 0.705, 0.552, 0.706, 0.540, 0.725, 0.520, 0.735, 0.505, 0.745, 0.502, 0.755, 0.493
            ]
         ]
      },
      {
         "box":{
            "topX":0.220,
            "topY":0.298,
            "bottomX":0.397,
            "bottomY":0.601
         },
         "label":"milk_bottle",
         "score":0.989,
         "polygon":[
            [
               0.365, 0.602, 0.273, 0.602, 0.26, 0.595, 0.263, 0.588, 0.251, 0.546, 0.248, 0.501, 0.25, 0.485, 0.246, 0.478, 0.245, 0.463, 0.233, 0.442, 0.231, 0.43, 0.226, 0.423, 0.226, 0.408, 0.234, 0.385, 0.241, 0.371, 0.238, 0.345, 0.234, 0.335, 0.233, 0.325, 0.24, 0.305, 0.586, 0.38, 0.592, 0.375, 0.598, 0.365
            ]
         ]
      },
      {
         "box":{
            "topX":0.433,
            "topY":0.280,
            "bottomX":0.621,
            "bottomY":0.679
         },
         "label":"water_bottle",
         "score":0.988,
         "polygon":[
            [
               0.576, 0.680, 0.501, 0.680, 0.475, 0.675, 0.460, 0.625, 0.445, 0.630, 0.443, 0.572, 0.440, 0.560, 0.435, 0.515, 0.431, 0.501, 0.431, 0.433, 0.433, 0.426, 0.445, 0.417, 0.456, 0.407, 0.465, 0.381, 0.468, 0.327, 0.471, 0.318
            ]
         ]
      }
   ]
}

注意

本文中使用的图像来自 Fridge Objects 数据集、版权所有 © Azure Corporation,可在 MIT 许可证下的 computervision-recipes/01_training_introduction.ipynb 中获得。

后续步骤