scan operator
Applies to: ✅ Azure Data Explorer ✅ Azure Monitor ✅ Microsoft Sentinel
Scans data, matches, and builds sequences based on the predicates.
Matching records are determined according to predicates defined in the operator’s steps. A predicate can depend on the state that is generated by previous steps. The output for the matching record is determined by the input record and assignments defined in the operator's steps.
Syntax
T | scan
[ with_match_id
=
MatchIdColumnName ] [ declare
(
ColumnDeclarations )
] with
(
StepDefinitions )
ColumnDeclarations syntax
ColumnName :
ColumnType[=
DefaultValue ] [,
... ]
StepDefinition syntax
step
StepName [ output
= all
| last
| none
] :
Condition [ =>
Column =
Assignment [,
... ] ] ;
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string |
✔️ | The input tabular source. |
MatchIdColumnName | string |
The name of a column of type long that is appended to the output as part of the scan execution. Indicates the 0-based index of the match for the record. |
|
ColumnDeclarations | string |
Declares an extension to the schema of T. These columns are assigned values in the steps. If not assigned, the DefaultValue is returned. Unless otherwise specified, DefaultValue is null . |
|
StepName | string |
✔️ | Used to reference values in the state of scan for conditions and assignments. The step name must be unique. |
Condition | string |
✔️ | An expression that evaluates to true or false that defines which records from the input match the step. A record matches the step when the condition is true with the step’s state or with the previous step’s state. |
Assignment | string |
A scalar expression that is assigned to the corresponding column when a record matches a step. | |
output |
string |
Controls the output logic of the step on repeated matches. all outputs all records matching the step, last outputs only the last record in a series of repeating matches for the step, and none doesn't output records matching the step. The default is all . |
Returns
A record for each match of a record from the input to a step. The schema of the output is the schema of the source extended with the column in the declare
clause.
Scan logic
scan
goes over the serialized input data, record by record, comparing each record against each step’s condition while taking into account the current state of each step.
State
The underlying state of the scan
operator can be thought of as a table with a row for each step
. Each step maintains its own state with the latest values of the columns and declared variables from all of the previous steps and the current step. If relevant, it also holds the match ID for the ongoing sequence.
If a scan operator has n steps named s_1, s_2, ..., s_n then step s_k would have k records in its state corresponding to s_1, s_2, ..., s_k. The StepName.ColumnName format is used to reference a value in the state. For instance, s_2.col1
would reference column col1
that belongs to step s_2 in the state of s_k. For a detailed example, see the scan logic walkthrough.
The state starts empty and updates whenever a scanned input record matches a step. When the state of the current step is nonempty, the step is referred to as having an active sequence.
Matching logic
Each input record is evaluated against all of the steps in reverse order, from the last step to the first. When a record r is evaluated against some step s_k, the following logic is applied:
Check 1: If the state of the previous step (s_k-1) is nonempty, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:
- The state of s_k is cleared.
- The state of s_k-1 is promoted to become the state of s_k.
- The assignments of s_k are calculated and extend r.
- The extended r is added to the output and to the state of s_k.
Note
If Check 1 results in a match, Check 2 is disregarded, and r moves on to be evaluated against s_k-1.
Check 2: If the state of s_k has an active sequence or s_k is the first step, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:
- The assignments of s_k are calculated and extend r.
- The values that represent s_k in the state of s_k are replaced with the values of the extended r.
- If s_k is defined as
output=all
, the extended r is added to the output. - If s_k is the first step, a new sequence begins and the match ID increases by
1
. This only affects the output whenwith_match_id
is used.
Once the checks for s_k are complete, r moves on to be evaluated against s_k-1.
For a detailed example of this logic, see the scan logic walkthrough.
Examples
Cumulative sum
Calculate the cumulative sum for an input column. The result of this example is equivalent to using row_cumsum().
range x from 1 to 5 step 1
| scan declare (cumulative_x:long=0) with
(
step s1: true => cumulative_x = x + s1.cumulative_x;
)
Output
x | cumulative_x |
---|---|
1 | 1 |
2 | 3 |
3 | 6 |
4 | 10 |
5 | 15 |
Cumulative sum on multiple columns with a reset condition
Calculate the cumulative sum for two input columns, reset the sum value to the current record value whenever the cumulative sum reached 10 or more.
range x from 1 to 5 step 1
| extend y = 2 * x
| scan declare (cumulative_x:long=0, cumulative_y:long=0) with
(
step s1: true => cumulative_x = iff(s1.cumulative_x >= 10, x, x + s1.cumulative_x),
cumulative_y = iff(s1.cumulative_y >= 10, y, y + s1.cumulative_y);
)
Output
x | y | cumulative_x | cumulative_y |
---|---|---|---|
1 | 2 | 1 | 2 |
2 | 4 | 3 | 6 |
3 | 6 | 6 | 12 |
4 | 8 | 10 | 8 |
5 | 10 | 5 | 18 |
Fill forward a column
Fill forward a string column. Each empty value is assigned the last seen nonempty value.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "",
2m, "B",
3m, "",
4m, "",
6m, "C",
8m, "",
11m, "D",
12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with
(
step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)
Output
Ts | Event | Event_filled |
---|---|---|
00:00:00 | A | A |
00:01:00 | A | |
00:02:00 | B | B |
00:03:00 | B | |
00:04:00 | B | |
00:06:00 | C | C |
00:08:00 | C | |
00:11:00 | D | D |
00:12:00 | D |
Sessions tagging
Divide the input into sessions: a session ends 30 minutes after the first event of the session, after which a new session starts. Note the use of with_match_id
flag, which assigns a unique value for each distinct match (session) of scan. Also note the special use of two steps in this example, inSession
has true
as condition so it captures and outputs all the records from the input while endSession
captures records that happen more than 30m from the sessionStart
value for the current match. The endSession
step has output=none
meaning it doesn't produce output records. The endSession
step is used to advance the state of the current match from inSession
to endSession
, allowing a new match (session) to begin, starting from the current record.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "A",
2m, "B",
3m, "D",
32m, "B",
36m, "C",
38m, "D",
41m, "E",
75m, "A"
]
;
Events
| sort by Ts asc
| scan with_match_id=session_id declare (sessionStart: timespan) with
(
step inSession: true => sessionStart = iff(isnull(inSession.sessionStart), Ts, inSession.sessionStart);
step endSession output=none: Ts - inSession.sessionStart > 30m;
)
Output
Ts | Event | sessionStart | session_id |
---|---|---|---|
00:00:00 | A | 00:00:00 | 0 |
00:01:00 | A | 00:00:00 | 0 |
00:02:00 | B | 00:00:00 | 0 |
00:03:00 | D | 00:00:00 | 0 |
00:32:00 | B | 00:32:00 | 1 |
00:36:00 | C | 00:32:00 | 1 |
00:38:00 | D | 00:32:00 | 1 |
00:41:00 | E | 00:32:00 | 1 |
01:15:00 | A | 01:15:00 | 2 |
Events between Start and Stop
Find all sequences of events between the event Start
and the event Stop
that occur within 5 minutes. Assign a match ID for each sequence.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "Start",
2m, "B",
3m, "D",
4m, "Stop",
6m, "C",
8m, "Start",
11m, "E",
12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with
(
step s1: Event == "Start";
step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)
Output
Ts | Event | m_id |
---|---|---|
00:01:00 | Start | 0 |
00:02:00 | B | 0 |
00:03:00 | D | 0 |
00:04:00 | Stop | 0 |
00:08:00 | Start | 1 |
00:11:00 | E | 1 |
00:12:00 | Stop | 1 |
Calculate a custom funnel of events
Calculate a funnel completion of the sequence Hail
-> Tornado
-> Thunderstorm Wind
by State
with custom thresholds on the times between the events (Tornado
within 1h
and Thunderstorm Wind
within 2h
). This example is similar to the funnel_sequence_completion plugin, but allows greater flexibility.
StormEvents
| partition hint.strategy=native by State
(
sort by StartTime asc
| scan with
(
step hail: EventType == "Hail";
step tornado: EventType == "Tornado" and StartTime - hail.StartTime <= 1h;
step thunderstormWind: EventType == "Thunderstorm Wind" and StartTime - tornado.StartTime <= 2h;
)
)
| summarize dcount(State) by EventType
Output
EventType | dcount_State |
---|---|
Hail | 50 |
Tornado | 34 |
Thunderstorm Wind | 32 |
Scan logic walkthrough
This section demonstrates the scan logic using a step-by-step walkthrough of the Events between start and stop example:
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "Start",
2m, "B",
3m, "D",
4m, "Stop",
6m, "C",
8m, "Start",
11m, "E",
12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with
(
step s1: Event == "Start";
step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)
The state
Think of the state of the scan
operator as a table with a row for each step, in which each step has its own state. This state contains the latest values of the columns and declared variables from all of the previous steps and the current step. To learn more, see State.
For this example, the state can be represented with the following table:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 |
The "X" indicates that a specific field is irrelevant for that step.
The matching logic
This section follows the matching logic through each record of the Events
table, explaining the transformation of the state and output at each step.
Note
An input record is evaluated against the steps in reverse order, from the last step (s3
) to the first step (s1
).
Record 1
Ts | Event |
---|---|
0m | "A" |
Record evaluation at each step:
s3
: Check 1 isn't passed because the state ofs2
is empty, and Check 2 isn't passed becauses3
lacks an active sequence.s2
: Check 1 isn't passed because the state ofs1
is empty, and Check 2 isn't passed becauses2
lacks an active sequence.s1
: Check 1 is irrelevant because there's no previous step. Check 2 isn't passed because the record doesn't meet the condition ofEvent == "Start"
. Record 1 is discarded without affecting the state or output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 |
Record 2
Ts | Event |
---|---|
1m | "Start" |
Record evaluation at each step:
s3
: Check 1 isn't passed because the state ofs2
is empty, and Check 2 isn't passed becauses3
lacks an active sequence.s2
: Check 1 isn't passed because the state ofs1
is empty, and Check 2 isn't passed becauses2
lacks an active sequence.s1
: Check 1 is irrelevant because there's no previous step. Check 2 is passed because the record meets the condition ofEvent == "Start"
. This match initiates a new sequence, and them_id
is assigned. Record 2 and itsm_id
(0
) are added to the state and the output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | 0 | 00:01:00 | "Start" | X | X | X | X |
s2 | X | X | |||||
s3 |
Record 3
Ts | Event |
---|---|
2m | "B" |
Record evaluation at each step:
s3
: Check 1 isn't passed because the state ofs2
is empty, and Check 2 isn't passed becauses3
lacks an active sequence.s2
: Check 1 is passed because the state ofs1
is nonempty and the record meets the condition ofTs - s1.Ts < 5m
. This match causes the state ofs1
to be cleared and the sequence ins1
to be promoted tos2
. Record 3 and itsm_id
(0
) are added to the state and the output.s1
: Check 1 is irrelevant because there's no previous step, and Check 2 isn't passed because the record doesn't meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | 0 | 00:01:00 | "Start" | 00:02:00 | "B" | X | X |
s3 |
Record 4
Ts | Event |
---|---|
3m | "D" |
Record evaluation at each step:
s3
: Check 1 isn't passed because the record doesn't meet the condition ofEvent == "Stop"
, and Check 2 isn't passed becauses3
lacks an active sequence.s2
: Check 1 isn't passed because the state ofs1
is empty. it passes Check 2 because it meets the condition ofTs - s1.Ts < 5m
. Record 4 and itsm_id
(0
) are added to the state and the output. The values from this record overwrite the previous state values fors2.Ts
ands2.Event
.s1
: Check 1 is irrelevant because there's no previous step, and Check 2 isn't passed because the record doesn't meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | 0 | 00:01:00 | "Start" | 00:03:00 | "D" | X | X |
s3 |
Record 5
Ts | Event |
---|---|
4m | "Stop" |
Record evaluation at each step:
s3
: Check 1 is passed becauses2
is nonempty and it meets thes3
condition ofEvent == "Stop"
. This match causes the state ofs2
to be cleared and the sequence ins2
to be promoted tos3
. Record 5 and itsm_id
(0
) are added to the state and the output.s2
: Check 1 isn't passed because the state ofs1
is empty, and Check 2 isn't passed becauses2
lacks an active sequence.s1
: Check 1 is irrelevant because there's no previous step. Check 2 isn't passed because the record doesn't meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 | 0 | 00:01:00 | "Start" | 00:03:00 | "D" | 00:04:00 | "Stop" |
Record 6
Ts | Event |
---|---|
6m | "C" |
Record evaluation at each step:
s3
: Check 1 isn't passed because the state ofs2
is empty, and Check 2 isn't passed becauses3
doesn't meet thes3
condition ofEvent == "Stop"
.s2
: Check 1 isn't passed because the state ofs1
is empty, and Check 2 isn't passed becauses2
lacks an active sequence.s1
: Check 1 isn't passed because there's no previous step, and Check 2 isn't passed because it doesn't meet the condition ofEvent == "Start"
. Record 6 is discarded without affecting the state or output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 | 0 | 00:01:00 | "Start" | 00:03:00 | "D" | 00:04:00 | "Stop" |
Record 7
Ts | Event |
---|---|
8m | "Start" |
Record evaluation at each step:
s3
: Check 1 isn't passed because the state ofs2
is empty, and Check 2 isn't passed because it doesn't meet the condition ofEvent == "Stop"
.s2
: Check 1 isn't passed because the state ofs1
is empty, and Check 2 isn't passed becauses2
lacks an active sequence.s1
: Check 1 isn't passed because there's no previous step. it passes Check 2 because it meets the condition ofEvent == "Start"
. This match initiates a new sequence ins1
with a newm_id
. Record 7 and itsm_id
(1
) are added to the state and the output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | 1 | 00:08:00 | "Start" | X | X | X | X |
s2 | X | X | |||||
s3 | 0 | 00:01:00 | "Start" | 00:03:00 | "D" | 00:04:00 | "Stop" |
Note
There are now two active sequences in the state.
Record 8
Ts | Event |
---|---|
11m | "E" |
Record evaluation at each step:
s3
: Check 1 isn't passed because the state ofs2
is empty, and Check 2 isn't passed because it doesn't meet thes3
condition ofEvent == "Stop"
.s2
: Check 1 is passed because the state ofs1
is nonempty and the record meets the condition ofTs - s1.Ts < 5m
. This match causes the state ofs1
to be cleared and the sequence ins1
to be promoted tos2
. Record 8 and itsm_id
(1
) are added to the state and the output.s1
: Check 1 is irrelevant because there's no previous step, and Check 2 isn't passed because the record doesn't meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | 1 | 00:08:00 | "Start" | 00:11:00 | "E" | X | X |
s3 | 0 | 00:01:00 | "Start" | 00:03:00 | "D" | 00:04:00 | "Stop" |
Record 9
Ts | Event |
---|---|
12m | "Stop" |
Record evaluation at each step:
s3
: Check 1 is passed becauses2
is nonempty and it meets thes3
condition ofEvent == "Stop"
. This match causes the state ofs2
to be cleared and the sequence ins2
to be promoted tos3
. Record 9 and itsm_id
(1
) are added to the state and the output.s2
: Check 1 isn't passed because the state ofs1
is empty, and Check 2 isn't passed becauses2
lacks an active sequence.s1
: Check 1 isn't passed because there's no previous step. it passes Check 2 because it meets the condition ofEvent == "Start"
. This match initiates a new sequence ins1
with a newm_id
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 | 1 | 00:08:00 | "Start" | 00:11:00 | "E" | 00:12:00 | "Stop" |
Final output
Ts | Event | m_id |
---|---|---|
00:01:00 | Start | 0 |
00:02:00 | B | 0 |
00:03:00 | D | 0 |
00:04:00 | Stop | 0 |
00:08:00 | Start | 1 |
00:11:00 | E | 1 |
00:12:00 | Stop | 1 |