System classifications supported in Data Map

This article lists the supported system classifications in Microsoft Purview Data Map. To learn more about classification, see Data classification in Data Map.

Microsoft Purview classifies data by using RegEx, Bloom Filter and Machine Learning models. The following lists describe the format, pattern, and keywords for the Microsoft Purview defined system classifications. Each classification name is prefixed by MICROSOFT.

Note

Microsoft Purview can classify both structured (CSV, TSV, JSON, SQL Table etc.) as well as unstructured data (DOC, PDF, TXT etc.). However, there are certain classifications that are only applicable to structured data. Here is the list of classifications that Microsoft Purview doesn't apply on unstructured data - City Name, Country Name, Date Of Birth, Email, Ethnic Group, GeoLocation, Person Name, U.S. Phone Number, U.S. States, U.S. ZipCode

Note

Minimum match threshold: It is the minimum percentage of data value matches in a column that must be found by the scanner for the classification to be applied. For system classification minimum match threshold value is set at 60% and cannot be changed. For custom classification, this value is configurable.

Bloom Filter based classifications

World Cities, Country

The City and Country classifier identifies the data based on their full names as well as short codes.

Keywords

Keywords for City
  • burg
  • city
  • cities
  • city names
  • cosmopolis
  • metropolis
  • municipality
  • place
  • town
Keywords for Country
  • country
  • countries
  • country names
  • nation
  • nationality

Machine Learning based classifications

Note

Machine learning based classifiers are only supported for structured data like tabular or columnar data sources.

Person's Name

Person Name machine learning model has been trained using global datasets of names in English language. Microsoft Purview classifies full names stored in the same column as well as first and last names in separate columns.


Person's Address

Person's address classification is used to detect full address stored in a single column containing the following elements: House number, Street Name, City, State, Country/Region, Zip Code. Person's Address classifier uses machine learning model that is trained on the global addresses data set in English language.

Supported formats

Currently the address model supports the following formats in the same column:

  • number, street, city
  • name, street, pincode or zipcode
  • number, street, area, pincode or zipcode
  • street, city, pincode or zipcode
  • landmark, city

Person's Gender

Person's Gender machine learning model has been trained using US Census data and other public data sources in English language. It supports classifying 50+ genders out of the box.

Keywords

  • sex
  • gender
  • sexual
  • orientation

Person's Age

Person's Age machine learning model detects age of an individual specified in various different formats. The qualifiers for days, months, and years must be in English language.

Keywords

  • age
  • ages

Supported formats

  • {%y} y, {%m} m
  • {%y} years {%m} months
  • {%y} years and {%m} months
  • {%y} years {%w} weeks
  • {%y} years and {%w} weeks
  • {%y} y, {%d} d
  • {%y} y, {%w} w
  • {%y} years, {%d} days
  • {%y} years and {%d} days
  • "{%y} years, {%m} months and {%d} days
  • {%y} months and {%d} days
  • {%y} yr
  • {%y}.{%yd} yr
  • {%y} years
  • {%y} years old
  • {%y}.{%yd} years
  • age {%y}
  • {%y} to {%y2}
  • {%y} to {%y2} yrs
  • {%y} years to {%y2} years
  • {%m} months to {%y} years
  • {%m} m to {%y} years
  • {%y}-{%y2} yrs
  • {%y}-{%y2}
  • {%y} - {%y2}
  • {%y}+
  • {%m}-{%m2} mos
  • {%y} and over
  • {%y} and under
  • below {%y}
  • above {%y}
  • month {%m}
  • week {%w}
  • {%y}

Unsupported formats

  • {%y}y {%m}m
  • {%y}y {%d}d
  • {%y}y {%w}w
  • {%y}.{%m}
  • {%y}.{%yd}

RegEx Classifications

ABA routing number

Format

Nine digits that can be in a formatted or unformatted pattern.

Pattern

  • two digits in the ranges 00-12, 21-32, 61-72, or 80
  • two digits
  • an optional hyphen
  • four digits
  • an optional hyphen
  • a digit

Checksum

Yes

Keywords

Keyword_aba_routing
  • aba number
  • aba#
  • aba
  • abarouting#
  • abaroutingnumber
  • americanbankassociationrouting#
  • americanbankassociationroutingnumber
  • bankrouting#
  • bankroutingnumber
  • routing #
  • routing no
  • routing number
  • routing transit number
  • routing#
  • RTN

China resident identity card (PRC) number

Format

18 digits

Pattern

18 digits:

  • six digits that are an address code
  • eight digits in the form YYYYMMDD, which are the date of birth
  • three digits that are an order code
  • one digit that is a check digit

Checksum

Yes

Keywords

Keyword_china_resident_id

  • Resident Identity Card
  • PRC
  • National Identification Card
  • 身份证
  • 居民 身份证
  • 居民身份证
  • 鉴定
  • 身分證
  • 居民 身份證
  • 鑑定

Credit card number

Format

14 to 16 digits that can be formatted or unformatted (dddddddddddddddd) and that must pass the Luhn test.

Pattern

Detects cards from all major brands worldwide, including Visa, MasterCard, Discover Card, JCB, American Express, gift cards, diner's cards, Rupay and China UnionPay.

Checksum

Yes, the Luhn checksum

Keywords

Keyword_cc_verification
  • card verification
  • card identification number
  • cvn
  • cid
  • cvc2
  • cvv2
  • pin block
  • security code
  • security number
  • security no
  • issue number
  • issue no
  • cryptogramme
  • numéro de sécurité
  • numero de securite
  • kreditkartenprüfnummer
  • kreditkartenprufnummer
  • prüfziffer
  • prufziffer
  • sicherheits Kode
  • sicherheitscode
  • sicherheitsnummer
  • verfalldatum
  • codice di verifica
  • cod. sicurezza
  • cod sicurezza
  • n autorizzazione
  • código
  • codigo
  • cod. seg
  • cod seg
  • código de segurança
  • codigo de seguranca
  • codigo de segurança
  • código de seguranca
  • cód. segurança
  • cod. seguranca
  • cod. segurança
  • cód. seguranca
  • cód segurança
  • cod seguranca
  • cod segurança
  • cód seguranca
  • número de verificação
  • numero de verificacao
  • ablauf
  • gültig bis
  • gültigkeitsdatum
  • gultig bis
  • gultigkeitsdatum
  • scadenza
  • data scad
  • fecha de expiracion
  • fecha de venc
  • vencimiento
  • válido hasta
  • valido hasta
  • vto
  • data de expiração
  • data de expiracao
  • data em que expira
  • validade
  • valor
  • vencimento
  • transaction
  • transaction number
  • reference number
  • セキュリティコード
  • セキュリティ コード
  • セキュリティナンバー
  • セキュリティ ナンバー
  • セキュリティ番号
Keyword_cc_name
  • amex
  • american express
  • americanexpress
  • americano espresso
  • Visa
  • mastercard
  • master card
  • mc
  • mastercards
  • master cards
  • diner's Club
  • diners club
  • dinersclub
  • discover
  • discover card
  • discovercard
  • discover cards
  • JCB
  • BrandSmart
  • japanese card bureau
  • carte blanche
  • carteblanche
  • credit card
  • cc#
  • cc#:
  • expiration date
  • exp date
  • expiry date
  • date d’expiration
  • date d'exp
  • date expiration
  • bank card
  • bankcard
  • card number
  • card num
  • cardnumber
  • cardnumbers
  • card numbers
  • creditcard
  • credit cards
  • creditcards
  • ccn
  • card holder
  • cardholder
  • card holders
  • cardholders
  • check card
  • checkcard
  • check cards
  • checkcards
  • debit card
  • debitcard
  • debit cards
  • debitcards
  • atm card
  • atmcard
  • atm cards
  • atmcards
  • enroute
  • en route
  • card type
  • Cardmember Acct
  • cardmember account
  • Cardno
  • Corporate Card
  • Corporate cards
  • Type of card
  • card account number
  • card member account
  • Cardmember Acct.
  • card no.
  • card no
  • card number
  • carte bancaire
  • carte de crédit
  • carte de credit
  • numéro de carte
  • numero de carte
  • nº de la carte
  • nº de carte
  • kreditkarte
  • karte
  • karteninhaber
  • karteninhabers
  • kreditkarteninhaber
  • kreditkarteninstitut
  • kreditkartentyp
  • eigentümername
  • kartennr
  • kartennummer
  • kreditkartennummer
  • kreditkarten-nummer
  • carta di credito
  • carta credito
  • n. carta
  • n carta
  • nr. carta
  • nr carta
  • numero carta
  • numero della carta
  • numero di carta
  • tarjeta credito
  • tarjeta de credito
  • tarjeta crédito
  • tarjeta de crédito
  • tarjeta de atm
  • tarjeta atm
  • tarjeta debito
  • tarjeta de debito
  • tarjeta débito
  • tarjeta de débito
  • nº de tarjeta
  • no. de tarjeta
  • no de tarjeta
  • numero de tarjeta
  • número de tarjeta
  • tarjeta no
  • tarjetahabiente
  • cartão de crédito
  • cartão de credito
  • cartao de crédito
  • cartao de credito
  • cartão de débito
  • cartao de débito
  • cartão de debito
  • cartao de debito
  • débito automático
  • debito automatico
  • número do cartão
  • numero do cartão
  • número do cartao
  • numero do cartao
  • número de cartão
  • numero de cartão
  • número de cartao
  • numero de cartao
  • nº do cartão
  • nº do cartao
  • nº. do cartão
  • no do cartão
  • no do cartao
  • no. do cartão
  • no. do cartao
  • rupay
  • union pay
  • unionpay
  • diner's
  • diners
  • クレジットカード番号
  • クレジットカードナンバー
  • クレジットカード#
  • クレジットカード
  • クレジット
  • クレカ
  • カード番号
  • カードナンバー
  • カード#
  • アメックス
  • アメリカンエクスプレス
  • アメリカン エクスプレス
  • Visaカード
  • Visa カード
  • マスターカード
  • マスター カード
  • マスター
  • ダイナースクラブ
  • ダイナース クラブ
  • ダイナース
  • 有効期限
  • 期限
  • キャッシュカード
  • キャッシュ カード
  • カード名義人
  • カードの名義人
  • カードの名義
  • デビット カード
  • デビットカード
  • 中国银联
  • 银联