Named Entity Recognition cognitive skill (v2)

Article
2023-12-29

The Named Entity Recognition skill (v2) extracts named entities from text. Available entities include the types person, location and organization.

Important

Named entity recognition skill (v2) (Microsoft.Skills.Text.NamedEntityRecognitionSkill) is now discontinued replaced by Microsoft.Skills.Text.V3.EntityRecognitionSkill. Follow the recommendations in Deprecated Azure AI Search skills to migrate to a supported skill.

Note

As you expand scope by increasing the frequency of processing, adding more documents, or adding more AI algorithms, you will need to attach a billable Azure AI services resource. Charges accrue when calling APIs in Azure AI services, and for image extraction as part of the document-cracking stage in Azure AI Search. There are no charges for text extraction from documents. Execution of built-in skills is charged at the existing Azure AI services Standard Pay-in-Advance Offer price.

Image extraction is an extra charge metered by Azure AI Search, as described on the pricing page. Text extraction is free.

@odata.type

Microsoft.Skills.Text.NamedEntityRecognitionSkill

Data limits

The maximum size of a record should be 50,000 characters as measured by String.Length. If you need to break up your data before sending it to the key phrase extractor, consider using the Text Split skill. If you do use a text split skill, set the page length to 5000 for the best performance.

Skill parameters

Parameters are case-sensitive.

Parameter name	Description
categories	Array of categories that should be extracted. Possible category types: `"Person"`, `"Location"`, `"Organization"`. If no category is provided, all types are returned.
defaultLanguageCode	Language code of the input text. The following languages are supported: `de, en, es, fr, it`
minimumPrecision	A number between 0 and 1. If the precision is lower than this value, the entity is not returned. The default is 0.

Skill inputs

Input name	Description
languageCode	Optional. Default is `"en"`.
text	The text to analyze.

Skill outputs

Output name	Description
persons	An array of strings where each string represents the name of a person.
locations	An array of strings where each string represents a location.
organizations	An array of strings where each string represents an organization.
entities	An array of complex types. Each complex type includes the following fields: category (`"person"`, `"organization"`, or `"location"`) value (the actual entity name) offset (The location where it was found in the text) confidence (A value between 0 and 1 that represents that confidence that the value is an actual entity)

Sample definition

  {
    "@odata.type": "#Microsoft.Skills.Text.NamedEntityRecognitionSkill",
    "categories": [ "Person", "Location", "Organization"],
    "defaultLanguageCode": "en",
    "inputs": [
      {
        "name": "text",
        "source": "/document/content"
      }
    ],
    "outputs": [
      {
        "name": "persons",
        "targetName": "people"
      }
    ]
  }

Sample input

{
    "values": [
      {
        "recordId": "1",
        "data":
           {
             "text": "This is the loan application for Joe Romero, a Azure employee who was born in Chile and who then moved to Australia… Ana Smith is provided as a reference.",
             "languageCode": "en"
           }
      }
    ]
}

Sample output

{
  "values": [
    {
      "recordId": "1",
      "data" : 
      {
        "persons": [ "Joe Romero", "Ana Smith"],
        "locations": ["Chile", "Australia"],
        "organizations":["Microsoft"],
        "entities":  
        [
          {
            "category":"person",
            "value": "Joe Romero",
            "offset": 33,
            "confidence": 0.87
          },
          {
            "category":"person",
            "value": "Ana Smith",
            "offset": 124,
            "confidence": 0.87
          },
          {
            "category":"location",
            "value": "Chile",
            "offset": 88,
            "confidence": 0.99
          },
          {
            "category":"location",
            "value": "Australia",
            "offset": 112,
            "confidence": 0.99
          },
          {
            "category":"organization",
            "value": "Microsoft",
            "offset": 54,
            "confidence": 0.99
          }
        ]
      }
    }
  ]
}

Warning cases

If the language code for the document is unsupported, a warning is returned and no entities are extracted.