reduce 运算符

项目
10/03/2024

适用于：✅Azure 数据资源管理器✅Azure Monitor✅Microsoft Sentinel

根据值相似性分组一系列字符串。

对于每个此类组，运算符将返回 pattern、count 和 representative。 pattern 最好地描述组，其中的 * 字符表示通配符。 count 是组中值的数量，representative 是组中的原始值之一。

语法

T | reduce [kind = ReduceKind] by Expr [with [threshold = Threshold] [, characters = Characters]]

详细了解语法约定。

参数

客户	类型	必需	说明
Expr	`string`	✔️	作为减小量的值。
阈值	`real`		一个介于 0 到 1 之间的值，用于确定与分组条件匹配以触发缩减操作所需的最小行数。默认值为 0.1。建议为大型输入设置较小的阈值。使用较小的阈值，可以将更多类似的值组合在一起，从而生成更少但更相似的组。较大的阈值需要较少的相似性，导致更多的组不太相似。请参阅示例。
字符	`string`		在字词之间进行分隔的字符的列表。默认值为每个非 ascii 数字字符。有关示例，请参阅 Characters 参数的行为。
ReduceKind	`string`		唯一有效的值是 `source`。如果指定了 `source`，则运算符会将 `Pattern` 列追加到表中的现有行，而不是通过 `Pattern` 进行聚合。

一个表，其行数与标题为 pattern、count 和 representative 的组数和列数相同。 pattern 最好地描述组，其中的 * 字符表示通配符，或任意插入字符串的占位符。 count 是组中值的数量，representative 是组中的原始值之一。

例如，reduce by city 的结果可能包括：

模式	计数	Representative
San *	5182	San Bernard
Saint *	2846	Saint Lucy
Moscow	3726	Moscow
* -on- *	2730	One -on- One
Paris	2716	Paris

示例

小阈值

运行查询

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.001 , characters = "X"

输出

模式	计数	Representative
MachineLearning*	1000	MachineLearningX4

大阈值

运行查询

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.9 , characters = "X"

输出

模式	计数	Representative
MachineLearning*	177	MachineLearningX9
MachineLearning*	102	MachineLearningX0
MachineLearning*	106	MachineLearningX1
MachineLearning*	96	MachineLearningX6
MachineLearning*	110	MachineLearningX4
MachineLearning*	100	MachineLearningX3
MachineLearning*	99	MachineLearningX8
MachineLearning*	104	MachineLearningX7
MachineLearning*	106	MachineLearningX2

Characters 参数的行为

如果未指定 Characters 参数，则每个非 ascii 数字字符都成为字词分隔符。

运行查询

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str

输出

模式	计数	Representative
others	10

但是，如果指定“Z”为分隔符，则 str 中的每个值就像是以下 2 个字词：foo 和 tostring(x)：

运行查询

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"

输出

模式	计数	Representative
foo*	10	fooZ1

将 `reduce` 应用到经过清理的输入

以下示例显示如何将 reduce 运算符应用于“sanitized”输入，在该输入中，要减少的列中 GUID 会在减少之前被替换

// Start with a few records from the Trace table.
Trace | take 10000
// We will reduce the Text column which includes random GUIDs.
// As random GUIDs interfere with the reduce operation, replace them all
// by the string "GUID".
| extend Text=replace_regex(Text, @"[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}", @"GUID")
// Now perform the reduce. In case there are other "quasi-random" identifiers with embedded '-'
// or '_' characters in them, treat these as non-term-breakers.
| reduce by Text with characters="-_"

autocluster

注意

reduce 运算符的实现很大程度上基于 Risto Vaarandi 所著论文用于从事件日志中挖掘模式的数据聚类分析算法。

通过

reduce 运算符

语法

参数

返回

示例

小阈值

大阈值

Characters 参数的行为

将 reduce 应用到经过清理的输入

相关内容

其他资源

将 `reduce` 应用到经过清理的输入