This article lists the supported system classifications in Microsoft Purview Data Map. To learn more about classification, see Data classification in Data Map.
Microsoft Purview classifies data by using RegEx, Bloom Filter and Machine Learning models. The following lists describe the format, pattern, and keywords for the Microsoft Purview defined system classifications. Each classification name is prefixed by MICROSOFT.
Note
Microsoft Purview can classify both structured (CSV, TSV, JSON, SQL Table etc.) as well as unstructured data (DOC, PDF, TXT etc.). However, there are certain classifications that are only applicable to structured data. Here is the list of classifications that Microsoft Purview doesn't apply on unstructured data - City Name, Country Name, Date Of Birth, Email, Ethnic Group, GeoLocation, Person Name, U.S. Phone Number, U.S. States, U.S. ZipCode
Note
Minimum match threshold: It is the minimum percentage of data value matches in a column that must be found by the scanner for the classification to be applied. For system classification minimum match threshold value is set at 60% and cannot be changed. For custom classification, this value is configurable.
Bloom Filter based classifications
World Cities, Country
The City and Country classifier identifies the data based on their full names as well as short codes.
Keywords
Keywords for City
- burg
- city
- cities
- city names
- cosmopolis
- metropolis
- municipality
- place
- town
Keywords for Country
- country
- countries
- country names
- nation
- nationality
Machine Learning based classifications
Note
Machine learning based classifiers are only supported for structured data like tabular or columnar data sources.
Person's Name
Person Name machine learning model has been trained using global datasets of names in English language. Microsoft Purview classifies full names stored in the same column as well as first and last names in separate columns.
Person's Address
Person's address classification is used to detect full address stored in a single column containing the following elements: House number, Street Name, City, State, Country/Region, Zip Code. Person's Address classifier uses machine learning model that is trained on the global addresses data set in English language.
Supported formats
Currently the address model supports the following formats in the same column:
- number, street, city
- name, street, pincode or zipcode
- number, street, area, pincode or zipcode
- street, city, pincode or zipcode
- landmark, city
Person's Gender
Person's Gender machine learning model has been trained using US Census data and other public data sources in English language. It supports classifying 50+ genders out of the box.
Keywords
- sex
- gender
- sexual
- orientation
Person's Age
Person's Age machine learning model detects age of an individual specified in various different formats. The qualifiers for days, months, and years must be in English language.
Keywords
- age
- ages
Supported formats
- {%y} y, {%m} m
- {%y} years {%m} months
- {%y} years and {%m} months
- {%y} years {%w} weeks
- {%y} years and {%w} weeks
- {%y} y, {%d} d
- {%y} y, {%w} w
- {%y} years, {%d} days
- {%y} years and {%d} days
- "{%y} years, {%m} months and {%d} days
- {%y} months and {%d} days
- {%y} yr
- {%y}.{%yd} yr
- {%y} years
- {%y} years old
- {%y}.{%yd} years
- age {%y}
- {%y} to {%y2}
- {%y} to {%y2} yrs
- {%y} years to {%y2} years
- {%m} months to {%y} years
- {%m} m to {%y} years
- {%y}-{%y2} yrs
- {%y}-{%y2}
- {%y} - {%y2}
- {%y}+
- {%m}-{%m2} mos
- {%y} and over
- {%y} and under
- below {%y}
- above {%y}
- month {%m}
- week {%w}
- {%y}
Unsupported formats
- {%y}y {%m}m
- {%y}y {%d}d
- {%y}y {%w}w
- {%y}.{%m}
- {%y}.{%yd}
RegEx Classifications
ABA routing number
Format
Nine digits that can be in a formatted or unformatted pattern.
Pattern
- two digits in the ranges 00-12, 21-32, 61-72, or 80
- two digits
- an optional hyphen
- four digits
- an optional hyphen
- a digit
Checksum
Yes
Keywords
Keyword_aba_routing
- aba number
- aba#
- aba
- abarouting#
- abaroutingnumber
- americanbankassociationrouting#
- americanbankassociationroutingnumber
- bankrouting#
- bankroutingnumber
- routing #
- routing no
- routing number
- routing transit number
- routing#
- RTN
China resident identity card (PRC) number
Format
18 digits
Pattern
18 digits:
- six digits that are an address code
- eight digits in the form YYYYMMDD, which are the date of birth
- three digits that are an order code
- one digit that is a check digit
Checksum
Yes
Keywords
Keyword_china_resident_id
- Resident Identity Card
- PRC
- National Identification Card
- 身份证
- 居民 身份证
- 居民身份证
- 鉴定
- 身分證
- 居民 身份證
- 鑑定
Credit card number
Format
14 to 16 digits that can be formatted or unformatted (dddddddddddddddd) and that must pass the Luhn test.
Pattern
Detects cards from all major brands worldwide, including Visa, MasterCard, Discover Card, JCB, American Express, gift cards, diner's cards, Rupay and China UnionPay.
Checksum
Yes, the Luhn checksum
Keywords
Keyword_cc_verification
- card verification
- card identification number
- cvn
- cid
- cvc2
- cvv2
- pin block
- security code
- security number
- security no
- issue number
- issue no
- cryptogramme
- numéro de sécurité
- numero de securite
- kreditkartenprüfnummer
- kreditkartenprufnummer
- prüfziffer
- prufziffer
- sicherheits Kode
- sicherheitscode
- sicherheitsnummer
- verfalldatum
- codice di verifica
- cod. sicurezza
- cod sicurezza
- n autorizzazione
- código
- codigo
- cod. seg
- cod seg
- código de segurança
- codigo de seguranca
- codigo de segurança
- código de seguranca
- cód. segurança
- cod. seguranca
- cod. segurança
- cód. seguranca
- cód segurança
- cod seguranca
- cod segurança
- cód seguranca
- número de verificação
- numero de verificacao
- ablauf
- gültig bis
- gültigkeitsdatum
- gultig bis
- gultigkeitsdatum
- scadenza
- data scad
- fecha de expiracion
- fecha de venc
- vencimiento
- válido hasta
- valido hasta
- vto
- data de expiração
- data de expiracao
- data em que expira
- validade
- valor
- vencimento
- transaction
- transaction number
- reference number
- セキュリティコード
- セキュリティ コード
- セキュリティナンバー
- セキュリティ ナンバー
- セキュリティ番号
Keyword_cc_name
- amex
- american express
- americanexpress
- americano espresso
- Visa
- mastercard
- master card
- mc
- mastercards
- master cards
- diner's Club
- diners club
- dinersclub
- discover
- discover card
- discovercard
- discover cards
- JCB
- BrandSmart
- japanese card bureau
- carte blanche
- carteblanche
- credit card
- cc#
- cc#:
- expiration date
- exp date
- expiry date
- date d’expiration
- date d'exp
- date expiration
- bank card
- bankcard
- card number
- card num
- cardnumber
- cardnumbers
- card numbers
- creditcard
- credit cards
- creditcards
- ccn
- card holder
- cardholder
- card holders
- cardholders
- check card
- checkcard
- check cards
- checkcards
- debit card
- debitcard
- debit cards
- debitcards
- atm card
- atmcard
- atm cards
- atmcards
- enroute
- en route
- card type
- Cardmember Acct
- cardmember account
- Cardno
- Corporate Card
- Corporate cards
- Type of card
- card account number
- card member account
- Cardmember Acct.
- card no.
- card no
- card number
- carte bancaire
- carte de crédit
- carte de credit
- numéro de carte
- numero de carte
- nº de la carte
- nº de carte
- kreditkarte
- karte
- karteninhaber
- karteninhabers
- kreditkarteninhaber
- kreditkarteninstitut
- kreditkartentyp
- eigentümername
- kartennr
- kartennummer
- kreditkartennummer
- kreditkarten-nummer
- carta di credito
- carta credito
- n. carta
- n carta
- nr. carta
- nr carta
- numero carta
- numero della carta
- numero di carta
- tarjeta credito
- tarjeta de credito
- tarjeta crédito
- tarjeta de crédito
- tarjeta de atm
- tarjeta atm
- tarjeta debito
- tarjeta de debito
- tarjeta débito
- tarjeta de débito
- nº de tarjeta
- no. de tarjeta
- no de tarjeta
- numero de tarjeta
- número de tarjeta
- tarjeta no
- tarjetahabiente
- cartão de crédito
- cartão de credito
- cartao de crédito
- cartao de credito
- cartão de débito
- cartao de débito
- cartão de debito
- cartao de debito
- débito automático
- debito automatico
- número do cartão
- numero do cartão
- número do cartao
- numero do cartao
- número de cartão
- numero de cartão
- número de cartao
- numero de cartao
- nº do cartão
- nº do cartao
- nº. do cartão
- no do cartão
- no do cartao
- no. do cartão
- no. do cartao
- rupay
- union pay
- unionpay
- diner's
- diners
- クレジットカード番号
- クレジットカードナンバー
- クレジットカード#
- クレジットカード
- クレジット
- クレカ
- カード番号
- カードナンバー
- カード#
- アメックス
- アメリカンエクスプレス
- アメリカン エクスプレス
- Visaカード
- Visa カード
- マスターカード
- マスター カード
- マスター
- ダイナースクラブ
- ダイナース クラブ
- ダイナース
- 有効期限
- 期限
- キャッシュカード
- キャッシュ カード
- カード名義人
- カードの名義人
- カードの名義
- デビット カード
- デビットカード
- 中国银联
- 银联