此内容适用于:
v3.1 (GA)
功能
文档智能支持更复杂的模块化分析功能。 使用加载项功能扩展结果,以包含从文档中提取的更多功能。 某些加载项功能会产生额外费用。 根据文档提取方案,可以启用和禁用这些可选功能。 若要启用某个功能,请将关联的功能名称添加到 features
查询字符串属性。 可以通过提供逗号分隔的功能列表,在请求中启用多个附加功能。 以下附加功能适用于 2023-07-31 (GA)
及更高版本。
版本可用性
加载项功能 |
加载项/免费 |
2024-11-30 (GA) |
2023-07-31 (正式发布) |
2022-08-31 (正式发布) |
v2.1 (GA) |
字体属性提取 |
附加功能 |
✔️ |
✔️ |
不适用 |
不适用 |
公式提取 |
附加功能 |
✔️ |
✔️ |
不适用 |
不适用 |
高分辨率提取 |
附加功能 |
✔️ |
✔️ |
不适用 |
不适用 |
条形码提取 |
免费 |
✔️ |
✔️ |
不适用 |
不适用 |
语言检测 |
免费 |
✔️ |
✔️ |
不适用 |
不适用 |
键值对 |
免费 |
✔️ |
不适用 |
不适用 |
不适用 |
查询字段 |
附加功能* |
✔️ |
不适用 |
不适用 |
n/a |
可搜索的 pdf |
加载项** |
✔️ |
不适用 |
不适用 |
n/a |
✱ 附加功能 - 查询字段的定价与其他附加功能不同。 有关详细信息,请参阅定价。
**加载项 -“可搜索 pdf”仅作为加载项功能适用于读取模型。
✱ 当前不支持 Microsoft Office 文件。
从大型文档(如工程图纸)中识别小文本是一项挑战。 文本通常与其他图形元素混合在一起,并且具有不同的字体、大小和方向。 此外,文本可以分解为单独的部分或与其他符号连接。 文档智能现在支持使用 ocr.highResolution
功能从这些类型的文档中提取内容。 通过启用此附加功能,可以提高从 A1/A2/A3 文档中提取内容的质量。
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=ocrHighResolution
# Analyze a document at a URL:
url = "(https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-highres.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-layout", document_url=url, features=[AnalysisFeature.OCR_HIGH_RESOLUTION] # Specify which add-on capabilities to enable.
)
result = poller.result()
# [START analyze_with_highres]
if any([style.is_handwritten for style in result.styles]):
print("Document contains handwritten content")
else:
print("Document does not contain handwritten content")
for page in result.pages:
print(f"----Analyzing layout from page #{page.page_number}----")
print(
f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}"
)
for line_idx, line in enumerate(page.lines):
words = line.get_words()
print(
f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
f"within bounding polygon '{format_polygon(line.polygon)}'"
)
for word in words:
print(
f"......Word '{word.content}' has a confidence of {word.confidence}"
)
for selection_mark in page.selection_marks:
print(
f"Selection mark is '{selection_mark.state}' within bounding polygon "
f"'{format_polygon(selection_mark.polygon)}' and has a confidence of {selection_mark.confidence}"
)
for table_idx, table in enumerate(result.tables):
print(
f"Table # {table_idx} has {table.row_count} rows and "
f"{table.column_count} columns"
)
for region in table.bounding_regions:
print(
f"Table # {table_idx} location on page: {region.page_number} is {format_polygon(region.polygon)}"
)
for cell in table.cells:
print(
f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'"
)
for region in cell.bounding_regions:
print(
f"...content on page {region.page_number} is within bounding polygon '{format_polygon(region.polygon)}'"
)
"styles": [true],
"pages": [
{
"page_number": 1,
"width": 1000,
"height": 800,
"unit": "px",
"lines": [
{
"line_idx": 1,
"content": "This",
"polygon": [10, 20, 30, 40],
"words": [
{
"content": "This",
"confidence": 0.98
}
]
}
],
"selection_marks": [
{
"state": "selected",
"polygon": [50, 60, 70, 80],
"confidence": 0.91
}
]
}
],
"tables": [
{
"table_idx": 1,
"row_count": 3,
"column_count": 4,
"bounding_regions": [
{
"page_number": 1,
"polygon": [100, 200, 300, 400]
}
],
"cells": [
{
"row_index": 1,
"column_index": 1,
"content": "Content 1",
"bounding_regions": [
{
"page_number": 1,
"polygon": [110, 210, 310, 410]
}
]
}
]
}
]
ocr.formula
功能将 formulas
集合中所有已识别的公式(如数学公式)提取为 content
下的顶级对象。 在 content
内,检测到的公式表示为 :formula:
。 此集合中的每个条目表示一个公式,该公式类型为 inline
或 display
,其 LaTeX 表示形式 value
及其 polygon
坐标。 最初,公式显示在每页的末尾。
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=formulas
# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/layout-formulas.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-layout", document_url=url, features=[AnalysisFeature.FORMULAS] # Specify which add-on capabilities to enable
)
result = poller.result()
# [START analyze_formulas]
for page in result.pages:
print(f"----Formulas detected from page #{page.page_number}----")
inline_formulas = [f for f in page.formulas if f.kind == "inline"]
display_formulas = [f for f in page.formulas if f.kind == "display"]
print(f"Detected {len(inline_formulas)} inline formulas.")
for formula_idx, formula in enumerate(inline_formulas):
print(f"- Inline #{formula_idx}: {formula.value}")
print(f" Confidence: {formula.confidence}")
print(f" Bounding regions: {format_polygon(formula.polygon)}")
print(f"\nDetected {len(display_formulas)} display formulas.")
for formula_idx, formula in enumerate(display_formulas):
print(f"- Display #{formula_idx}: {formula.value}")
print(f" Confidence: {formula.confidence}")
print(f" Bounding regions: {format_polygon(formula.polygon)}")
"content": ":formula:",
"pages": [
{
"pageNumber": 1,
"formulas": [
{
"kind": "inline",
"value": "\\frac { \\partial a } { \\partial b }",
"polygon": [...],
"span": {...},
"confidence": 0.99
},
{
"kind": "display",
"value": "y = a \\times b + a \\times c",
"polygon": [...],
"span": {...},
"confidence": 0.99
}
]
}
]
ocr.font
功能将 styles
集合中提取的文本的所有字体属性提取为 content
下的顶级对象。 每个样式对象都会指定一个字体属性、适用的文本范围及其相应的置信度分数。 现有样式属性扩展了更多字体属性,例如文本字体的 similarFontFamily
、斜体和正常等样式的 fontStyle
、粗体或正常样式的 fontWeight
、文本颜色的 color
和文本边界框颜色的 backgroundColor
。
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=styleFont
# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/receipt/receipt-with-tips.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-layout", document_url=url, features=[AnalysisFeature.STYLE_FONT] # Specify which add-on capabilities to enable.
)
result = poller.result()
# [START analyze_fonts]
# DocumentStyle has the following font related attributes:
similar_font_families = defaultdict(list) # e.g., 'Arial, sans-serif
font_styles = defaultdict(list) # e.g, 'italic'
font_weights = defaultdict(list) # e.g., 'bold'
font_colors = defaultdict(list) # in '#rrggbb' hexadecimal format
font_background_colors = defaultdict(list) # in '#rrggbb' hexadecimal format
if any([style.is_handwritten for style in result.styles]):
print("Document contains handwritten content")
else:
print("Document does not contain handwritten content")
print("\n----Fonts styles detected in the document----")
# Iterate over the styles and group them by their font attributes.
for style in result.styles:
if style.similar_font_family:
similar_font_families[style.similar_font_family].append(style)
if style.font_style:
font_styles[style.font_style].append(style)
if style.font_weight:
font_weights[style.font_weight].append(style)
if style.color:
font_colors[style.color].append(style)
if style.background_color:
font_background_colors[style.background_color].append(style)
print(f"Detected {len(similar_font_families)} font families:")
for font_family, styles in similar_font_families.items():
print(f"- Font family: '{font_family}'")
print(f" Text: '{get_styled_text(styles, result.content)}'")
print(f"\nDetected {len(font_styles)} font styles:")
for font_style, styles in font_styles.items():
print(f"- Font style: '{font_style}'")
print(f" Text: '{get_styled_text(styles, result.content)}'")
print(f"\nDetected {len(font_weights)} font weights:")
for font_weight, styles in font_weights.items():
print(f"- Font weight: '{font_weight}'")
print(f" Text: '{get_styled_text(styles, result.content)}'")
print(f"\nDetected {len(font_colors)} font colors:")
for font_color, styles in font_colors.items():
print(f"- Font color: '{font_color}'")
print(f" Text: '{get_styled_text(styles, result.content)}'")
print(f"\nDetected {len(font_background_colors)} font background colors:")
for font_background_color, styles in font_background_colors.items():
print(f"- Font background color: '{font_background_color}'")
print(f" Text: '{get_styled_text(styles, result.content)}'")
"content": "Foo bar",
"styles": [
{
"similarFontFamily": "Arial, sans-serif",
"spans": [ { "offset": 0, "length": 3 } ],
"confidence": 0.98
},
{
"similarFontFamily": "Times New Roman, serif",
"spans": [ { "offset": 4, "length": 3 } ],
"confidence": 0.98
},
{
"fontStyle": "italic",
"spans": [ { "offset": 1, "length": 2 } ],
"confidence": 0.98
},
{
"fontWeight": "bold",
"spans": [ { "offset": 2, "length": 3 } ],
"confidence": 0.98
},
{
"color": "#FF0000",
"spans": [ { "offset": 4, "length": 2 } ],
"confidence": 0.98
},
{
"backgroundColor": "#00FF00",
"spans": [ { "offset": 5, "length": 2 } ],
"confidence": 0.98
}
]
ocr.barcode
功能将 barcodes
集合中所有已识别的条形码提取为 content
下的顶级对象。 在 content
内,检测到的条形码表示为 :barcode:
。 此集合中的每个条目都表示一个条形码,包括条形码类型(表示为 kind
)和嵌入的条形码内容(表示为 value
)及其 polygon
坐标。 最初,条形码显示在每页的末尾。 confidence
硬编码为 1。
支持的条形码类型
条形码类型 |
示例 |
QR Code |
|
Code 39 |
|
Code 93 |
|
Code 128 |
|
UPC (UPC-A & UPC-E) |
|
PDF417 |
|
EAN-8 |
|
EAN-13 |
|
Codabar |
|
Databar |
|
展开的 Databar |
|
ITF |
|
Data Matrix |
|
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=barcodes
# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-barcodes.jpg?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-layout", document_url=url, features=[AnalysisFeature.BARCODES] # Specify which add-on capabilities to enable.
)
result = poller.result()
# [START analyze_barcodes]
# Iterate over extracted barcodes on each page.
for page in result.pages:
print(f"----Barcodes detected from page #{page.page_number}----")
print(f"Detected {len(page.barcodes)} barcodes:")
for barcode_idx, barcode in enumerate(page.barcodes):
print(f"- Barcode #{barcode_idx}: {barcode.value}")
print(f" Kind: {barcode.kind}")
print(f" Confidence: {barcode.confidence}")
print(f" Bounding regions: {format_polygon(barcode.polygon)}")
----Barcodes detected from page #1----
Detected 2 barcodes:
- Barcode #0: 123456
Kind: QRCode
Confidence: 0.95
Bounding regions: [10.5, 20.5, 30.5, 40.5]
- Barcode #1: 789012
Kind: QRCode
Confidence: 0.98
Bounding regions: [50.5, 60.5, 70.5, 80.5]
语言检测
将 languages
功能添加到 analyzeResult
请求可以预测每个文本行所检测到的主要语言,以及 analyzeResult
下 languages
集合中的 confidence
。
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=languages
# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-fonts_and_languages.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-layout", document_url=url, features=[AnalysisFeature.LANGUAGES] # Specify which add-on capabilities to enable.
)
result = poller.result()
# [START analyze_languages]
print("----Languages detected in the document----")
print(f"Detected {len(result.languages)} languages:")
for lang_idx, lang in enumerate(result.languages):
print(f"- Language #{lang_idx}: locale '{lang.locale}'")
print(f" Confidence: {lang.confidence}")
print(f" Text: '{','.join([result.content[span.offset : span.offset + span.length] for span in lang.spans])}'")
"languages": [
{
"spans": [
{
"offset": 0,
"length": 131
}
],
"locale": "en",
"confidence": 0.7
},
]
后续步骤