Azure AI 搜索中的“简单”搜索查询示例

项目
2024-12-18

在 Azure AI 搜索中，简单查询语法调用默认查询分析程序来执行全文搜索。此分析程序速度快，处理对象是全文搜索、筛选及分面搜索和前缀搜索等常见方案。本文使用示例来说明搜索文档 (REST API) 请求中的简单语法用法。

注意

另一种查询语法是 Lucene，它支持模糊搜索和通配符搜索等更复杂的查询结构。有关详细信息，请参阅完整的 Lucene 搜索语法的示例。

酒店示例索引

以下查询基于 hotels-sample-index，可以按照快速入门：在 Azure 门户中创建搜索索引中的说明进行创建。

示例查询使用 REST API 和 POST 请求来表达。可以在 REST 客户端中粘贴并运行它们。或者，在 Azure 门户中使用搜索资源管理器的 JSON 视图。在 JSON 视图中，可以粘贴本文中所示的查询示例。

请求头必须具有以下值：

密钥	值
Content-Type	application/json
api-key	`<your-search-service-api-key>`，查询或管理密钥

URI 参数必须包括具有索引名、文档集合、搜索命令和 API 版本的搜索服务终结点，类似于以下示例：

https://{{service-name}}.search.azure.cn/indexes/hotels-sample-index/docs/search?api-version=2024-07-01

请求正文的格式应为有效的 JSON：

{
    "search": "*",
    "queryType": "simple",
    "select": "HotelId, HotelName, Category, Tags, Description",
    "count": true
}

search 设置为“*”时，表示一个未指定的查询，等效于 NULL 或空搜索。它不是特别有用，但却是你可以执行的最简单的搜索，并且会显示索引中所有可检索的字段以及所有值。
默认情况下，queryType 设置为“简单”，可以省略，但包含它可以强调本文中的查询示例是采用简单语法来表达的。
select 设置为以逗号分隔的字段列表时，可用于搜索结果组合，使其只包括在搜索结果上下文中有用的字段。
count 返回与搜索条件匹配的文档数。在空搜索字符串上，计数将是索引中的所有文档（在 hotels-sample-index 中，数量为 50）。

示例 1：全文搜索

全文搜索可以是任意数量的独立词语或引号括起来的短语，其中包含或不包含布尔运算符。

POST /indexes/hotel-samples-index/docs/search?api-version=2024-07-01
{
    "search": "pool spa +airport",
    "searchMode": "any",
    "queryType": "simple",
    "select": "HotelId, HotelName, Category, Description",
    "count": true
}

由重要词语或短语组成的关键字搜索往往最适用。在索引和查询过程中，字符串字段会进行文本分析，并删除非必要的单词，例如“the”、“and”和“it”。若要查看如何在索引中标记化查询字符串，请在分析文本调用中将字符串传递给索引。

searchMode 参数可控制精准率和召回率。如果需要更高召回率，请使用默认值“any”，即如果匹配查询字符串的任何部分，则将返回结果。如果选择精准率，即必须完全匹配字符串，则将 searchMode 更改为“all”。请尝试上述两种查询方法，了解 searchMode 如何改变结果。

对“pool spa + airport”查询的响应应类似于下面的示例。

"@odata.count": 4,
"value": [
{
    "@search.score": 6.090657,
    "HotelId": "12",
    "HotelName": "Winter Panorama Resort",
    "Description": "Plenty of great skiing, outdoor ice skating, sleigh rides, tubing and snow biking. Yoga, group exercise classes and outdoor hockey are available year-round, plus numerous options for shopping as well as great spa services. Newly-renovated with large rooms, free 24-hr airport shuttle & a new restaurant. Rooms/suites offer mini-fridges & 49-inch HDTVs.",
    "Category": "Resort and Spa"
},
{
    "@search.score": 4.314683,
    "HotelId": "21",
    "HotelName": "Good Business Hotel",
    "Description": "1 Mile from the airport. Free WiFi, Outdoor Pool, Complimentary Airport Shuttle, 6 miles from Lake Lanier & 10 miles from downtown. Our business center includes printers, a copy machine, fax, and a work area.",
    "Category": "Suite"
},
{
    "@search.score": 3.575948,
    "HotelId": "27",
    "HotelName": "Starlight Suites",
    "Description": "Complimentary Airport Shuttle & WiFi. Book Now and save - Spacious All Suite Hotel, Indoor Outdoor Pool, Fitness Center, Florida Green certified, Complimentary Coffee, HDTV",
    "Category": "Suite"
},
{
    "@search.score": 2.6926985,
    "HotelId": "25",
    "HotelName": "Waterfront Scottish Inn",
    "Description": "Newly Redesigned Rooms & airport shuttle. Minutes from the airport, enjoy lakeside amenities, a resort-style pool & stylish new guestrooms with Internet TVs.",
    "Category": "Suite"
}
]

请注意响应中的搜索分数。这是匹配项的相关性分数。默认情况下，搜索服务将根据这个分数返回前 50 个匹配项。

当因为搜索不是全文搜索或者因为没有提供任何条件而没有排名时，会出现统一的分数“1.0”。例如，在空搜索（搜索=*）中，行会按任意顺序返回。包括实际条件时，你将看到搜索分数变成有意义的值。

示例 2：按 ID 查找

返回搜索结果后，合乎逻辑的下一步是提供详细信息页面，其中包含更多文档中的字段。此示例展示了如何使用获取文档通过传入文档 ID 来返回单个文档。

GET /indexes/hotels-sample-index/docs/41?api-version=2024-07-01

所有文档都有一个唯一标识符。如果使用的是 Azure 门户，请从“索引”选项卡中选择索引，然后查看字段定义以确定哪个字段是该键。在 REST API 中，Get Index (获取索引) 调用会在响应正文中返回索引定义。

对上述查询的响应中包含键为 41 的文档。在索引定义中标记为“retrievable”的任何字段都可以在搜索结果中返回并在应用中呈现。

{
    "HotelId": "41",
    "HotelName": "Windy Ocean Motel",
    "Description": "Oceanfront hotel overlooking the beach features rooms with a private balcony and 2 indoor and outdoor pools. Inspired by the natural beauty of the island, each room includes an original painting of local scenes by the owner. Rooms include a mini fridge, Keurig coffee maker, and flatscreen TV. Various shops and art entertainment are on the boardwalk, just steps away.",
    "Description_fr": "Cet hôtel en bord de mer donnant sur la plage propose des chambres dotées d'un balcon privé et de 2 piscines intérieure et extérieure. Inspiré par la beauté naturelle de l'île, chaque chambre comprend une peinture originale de scènes locales par le propriétaire. Les chambres comprennent un mini-réfrigérateur, une cafetière Keurig et une télévision à écran plat. Divers magasins et divertissements artistiques se trouvent sur la promenade, à quelques pas.",
    "Category": "Suite",
    "Tags": [
    "pool",
    "air conditioning",
    "bar"
    ],
    "ParkingIncluded": true,
    "LastRenovationDate": "2021-05-10T00:00:00Z",
    "Rating": 3.5,
    "Location": {
    "type": "Point",
    "coordinates": [
        -157.846817,
        21.295841
    ],
    "crs": {
        "type": "name",
        "properties": {
        "name": "EPSG:4326"
        }
    }
    },
    "Address": {
    "StreetAddress": "1450 Ala Moana Blvd 2238 Ala Moana Ctr",
    "City": "Honolulu",
    "StateProvince": "HI",
    "PostalCode": "96814",
    "Country": "USA"
    }
}

示例 3：按文本筛选

筛选器语法是可以单独使用或配合 search 使用的 OData 表达式。如果在同一请求中一起使用，则会先对整个索引应用 filter，再对筛选结果执行 search。 filter 可减少 search 查询需要处理的文档集，因此是一种非常有用的技术，可用于提高查询性能。

筛选器可以在索引定义中标记为 filterable 的任何字段上定义。对于 hotels-sample-index，可筛选字段包括类别、标记、ParkingIncluded、分级和大多数地址字段。

POST /indexes/hotels-sample-index/docs/search?api-version=2024-07-01
{
    "search": "art tours",
    "queryType": "simple",
    "filter": "Category eq 'Boutique'",
    "searchFields": "HotelName,Description,Category",
    "select": "HotelId,HotelName,Description,Category",
    "count": true
}

上述查询的响应仅限于归类为“Boutique”的酒店，其中包括“art”或“tours”等术语。在本例中，只有一个匹配项。

"value": [
{
    "@search.score": 1.2814453,
    "HotelId": "2",
    "HotelName": "Old Century Hotel",
    "Description": "The hotel is situated in a nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts. The hotel also regularly hosts events like wine tastings, beer dinners, and live music.",
    "Category": "Boutique"
}
]

示例 4：筛选器函数

筛选器表达式可以包含“search.ismatch”和“search.ismatchscoring”函数，便于在筛选器中构建搜索查询。此筛选器表达式在 free 上使用通配符，以选择免费 Wi-Fi、免费停车等设施。

POST /indexes/hotels-sample-index/docs/search?api-version=2024-07-01
  {
    "search": "",
    "filter": "search.ismatch('free*', 'Tags', 'full', 'any')",
    "select": "HotelName, Tags, Description",
    "count": true
  }

对上述查询的响应匹配了 27 家提供免费设施的酒店。请注意，所有结果的搜索分数都是“1”。这是因为搜索表达式为 NULL 或为空，导致逐字筛选器匹配，但未进行全文搜索。仅全文搜索时返回相关性分数。如果使用的筛选器没有 search，请确保有足够的可排序字段，以便可以控制搜索排名。

  "@odata.count": 27,
  "value": [
    {
      "@search.score": 1,
      "HotelName": "Country Residence Hotel",
      "Description": "All of the suites feature full-sized kitchens stocked with cookware, separate living and sleeping areas and sofa beds. Some of the larger rooms have fireplaces and patios or balconies. Experience real country hospitality in the heart of bustling Nashville. The most vibrant music scene in the world is just outside your front door.",
      "Tags": [
        "laundry service",
        "restaurant",
        "free parking"
      ]
    },
    {
      "@search.score": 1,
      "HotelName": "Downtown Mix Hotel",
      "Description": "Mix and mingle in the heart of the city. Shop and dine, mix and mingle in the heart of downtown, where fab lake views unite with a cheeky design.",
      "Tags": [
        "air conditioning",
        "laundry service",
        "free wifi"
      ]
    },
    {
      "@search.score": 1,
      "HotelName": "Starlight Suites",
      "Description": "Complimentary Airport Shuttle & WiFi. Book Now and save - Spacious All Suite Hotel, Indoor Outdoor Pool, Fitness Center, Florida Green certified, Complimentary Coffee, HDTV",
      "Tags": [
        "pool",
        "coffee in lobby",
        "free wifi"
      ]
    },
. . .

示例 5：范围筛选器

通过任何数据类型的筛选器表达式支持范围筛选。下面的示例演示数值和字符串范围。数据类型在范围筛选器中很重要，当数字数据位于数字字段且字符串数据位于字符串字段中时效果最佳。由于数字字符串不可比较，因此字符串字段中的数字数据不适用于范围。

以下查询是一个数字范围。在 hotels-sample-index 中，唯一可筛选的数字字段为 Rating。

POST /indexes/hotels-sample-index/docs/search?api-version=2024-07-01
{
    "search": "*",
    "filter": "Rating ge 2 and Rating lt 4",
    "select": "HotelId, HotelName, Rating",
    "orderby": "Rating desc",
    "count": true
}

对此查询的响应应类似于下面的示例，为简洁起见只截取了一部分。

"@odata.count": 27,
"value": [
{
    "@search.score": 1,
    "HotelId": "22",
    "HotelName": "Lion's Den Inn",
    "Rating": 3.9
},
{
    "@search.score": 1,
    "HotelId": "25",
    "HotelName": "Waterfront Scottish Inn",
    "Rating": 3.8
},
{
    "@search.score": 1,
    "HotelId": "2",
    "HotelName": "Old Century Hotel",
    "Rating": 3.6
},
...

接下来的查询是基于字符串字段 (Address/StateProvince) 的范围筛选器：

POST /indexes/hotels-sample-index/docs/search?api-version=2024-07-01
{
    "search": "*",
    "filter": "Address/StateProvince ge 'A*' and Address/StateProvince lt 'D*'",
    "select": "HotelId, HotelName, Address/StateProvince",
    "count": true
}

对此查询的响应应类似于下面的示例，为简洁起见只截取了一部分。在此示例中，无法通过 StateProvince 进行排序，因为该字段在索引定义中未属性化为“sortable”。

{
  "@odata.count": 9,
  "value": [
    {
      "@search.score": 1,
      "HotelId": "39",
      "HotelName": "White Mountain Lodge & Suites",
      "Address": {
        "StateProvince": "CO"
      }
    },
    {
      "@search.score": 1,
      "HotelId": "9",
      "HotelName": "Smile Up Hotel",
      "Address": {
        "StateProvince": "CA "
      }
    },
    {
      "@search.score": 1,
      "HotelId": "7",
      "HotelName": "Roach Motel",
      "Address": {
        "StateProvince": "CA "
      }
    },
    {
      "@search.score": 1,
      "HotelId": "34",
      "HotelName": "Lakefront Captain Inn",
      "Address": {
        "StateProvince": "CT"
      }
    },
    {
      "@search.score": 1,
      "HotelId": "37",
      "HotelName": "Campus Commander Hotel",
      "Address": {
        "StateProvince": "CA "
      }
    },
. . .

示例 6：地理空间搜索

hotels-sample 索引包含带有纬度和经度坐标的“位置”字段。此示例使用 geo.distance 函数来筛选从起点开始，直到所提供的任意距离（以公里为单位）圆周范围内的文档。可以调整查询 (10) 中的最后一个值，以缩小或放大查询的表面积。

POST /indexes/v/docs/search?api-version=2024-07-01
{
    "search": "*",
    "filter": "geo.distance(Location, geography'POINT(-122.335114 47.612839)') le 10",
    "select": "HotelId, HotelName, Address/City, Address/StateProvince",
    "count": true
}

对此查询的响应会返回距所提供坐标 10 公里内的所有酒店：

{
  "@odata.count": 3,
  "value": [
    {
      "@search.score": 1,
      "HotelId": "45",
      "HotelName": "Happy Lake Resort & Restaurant",
      "Address": {
        "City": "Seattle",
        "StateProvince": "WA"
      }
    },
    {
      "@search.score": 1,
      "HotelId": "24",
      "HotelName": "Uptown Chic Hotel",
      "Address": {
        "City": "Seattle",
        "StateProvince": "WA"
      }
    },
    {
      "@search.score": 1,
      "HotelId": "16",
      "HotelName": "Double Sanctuary Resort",
      "Address": {
        "City": "Seattle",
        "StateProvince": "WA"
      }
    }
  ]
}

示例 7：使用 searchMode 的布尔值

简单语法支持字符形式(+, -, |)的布尔运算符，从而支持 AND、OR 和 NOT 查询逻辑。布尔型搜索的行为符合预期，但有几个值得注意的例外情况。

在布尔搜索中，请考虑将 searchMode 参数添加为影响精度和召回的机制。有效值包括 "searchMode": "any" 偏向召回 (满足任一标准的文档都被视为匹配) 和 "searchMode": "all" 偏向精度 (所有标准都必须与文档中的相匹配)。

在布尔搜索的上下文中，如果使用多个运算符堆叠查询并获取更广泛而不是更窄的结果，默认 "searchMode": "any" 可能会产生混淆。在使用 NOT 时尤为如此，该运算符导致结果包括所有“不含”特定字词或短语的文档。

下面的示例进行了这方面的演示。该查询将查找排除“空调”一词的“餐厅”的匹配项。如果使用 searchMode (any) 运行以下查询，将返回 43 个文档：其中有包含“餐厅”一词的文档，以及所有不包含*“空调”一词的文档。

注意，布尔运算符(-)和短语“空调”之间没有空格。引号将转义（\"）。

POST /indexes/hotels-sample-index/docs/search?api-version=2024-07-01
{
    "search": "restaurant -\"air conditioning\"",
    "searchMode": "any",
    "searchFields": "Tags",
    "select": "HotelId, HotelName, Tags",
    "count": true
}

更改为 "searchMode": "all" 会对条件产生累积影响并返回较小的结果集（7 个匹配项），其文档数是包含字词“餐馆”的文档减去包含短语“空调”的文档之差。

现在，对此查询的响应应类似于下面的示例，为简洁起见只截取了一部分。

{
  "@odata.count": 14,
  "value": [
    {
      "@search.score": 3.1383743,
      "HotelId": "18",
      "HotelName": "Ocean Water Resort & Spa",
      "Tags": [
        "view",
        "pool",
        "restaurant"
      ]
    },
    {
      "@search.score": 2.028083,
      "HotelId": "22",
      "HotelName": "Lion's Den Inn",
      "Tags": [
        "laundry service",
        "free wifi",
        "restaurant"
      ]
    },
    {
      "@search.score": 2.028083,
      "HotelId": "34",
      "HotelName": "Lakefront Captain Inn",
      "Tags": [
        "restaurant",
        "laundry service",
        "coffee in lobby"
      ]
    },
...

示例 8：分页结果

前面的示例中说明了影响搜索结果组合的参数，包括确定结果中包含哪些字段的 select、排序顺序以及如何包含所有匹配项的计数。本示例继续介绍用于搜索结果组合的分页参数，通过该参数，你可以确定任何给定页面中显示的批结果数。

默认情况下，搜索服务会返回前 50 个匹配项。若要控制每个页面中的匹配数，请使用 top 定义批大小，然后使用 skip 来选取后续批。

下面的示例对 Rating 字段（“分级”既可筛选又可排序）使用筛选器和排序顺序，因为这样可以更轻松地看到分页对排序结果的影响。在常规的完全搜索查询中，最匹配的项由 @search.score 排名和分页。

POST /indexes/hotels-sample-index/docs/search?api-version=2024-07-01
{
    "search": "*",
    "filter": "Rating gt 4",
    "select": "HotelName, Rating",
    "orderby": "Rating desc",
    "top": 5,
    "count": true
}

该查询找到了 21 个匹配的文档，但是因为指定了 top，所以响应只返回前五个匹配项，评分从 4.9 开始，到 4.7 结束时，且包含“湖边 B&B”。

要获取后续 5 个文档，请跳过第一批：

POST /indexes/hotels-sample-index/docs/search?api-version=2024-07-01
{
    "search": "*",
    "filter": "Rating gt 4",
    "select": "HotelName, Rating",
    "orderby": "Rating desc",
    "top": 5,
    "skip": 5,
    "count": true
}

对第二批的响应将跳过前五个匹配项，并从“Pull'r Inn Motel”开始，返回后五个匹配项。若要继续处理更多批，需将 top 保持为 5，然后在每个新请求上将 skip 增加 5（skip=5、skip=10、skip=15，依此类推）。

{
  "@odata.count": 21,
  "value": [
    {
      "@search.score": 1,
      "HotelName": "Head Wind Resort",
      "Rating": 4.7
    },
    {
      "@search.score": 1,
      "HotelName": "Sublime Palace Hotel",
      "Rating": 4.6
    },
    {
      "@search.score": 1,
      "HotelName": "City Skyline Antiquity Hotel",
      "Rating": 4.5
    },
    {
      "@search.score": 1,
      "HotelName": "Nordick's Valley Motel",
      "Rating": 4.5
    },
    {
      "@search.score": 1,
      "HotelName": "Winter Panorama Resort",
      "Rating": 4.5
    }
  ]
}

现在，你已经使用了基本查询语法，可以尝试在代码中指定查询。以下链接介绍了如何使用 Azure SDK 设置搜索查询。

快速入门：使用 Azure SDK 进行全文搜索

可在以下链接中找到更多语法参考、查询体系结构和示例：

通过