sample operator
Applies to: ✅ Azure Data Explorer ✅ Azure Monitor ✅ Microsoft Sentinel
Returns up to the specified number of random rows from the input table.
Note
sample
is geared for speed rather than even distribution of values. Specifically, it means that it will not produce 'fair' results if used after operators that union 2 datasets of different sizes (such as aunion
orjoin
operators). It's recommended to usesample
right after the table reference and filters.sample
is a non-deterministic operator, and will return different result set each time it is evaluated during the query. For example, the following query yields two different rows (even if one would expect to return the same row twice).
Syntax
T | sample
NumberOfRows
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string |
✔️ | The input tabular expression. |
NumberOfRows | int, long, or real | ✔️ | The number of rows to return. You can specify any numeric expression. |
Examples
let _data = range x from 1 to 100 step 1;
let _sample = _data | sample 1;
union (_sample), (_sample)
Output
x |
---|
83 |
3 |
To ensure that in example above _sample
is calculated once, one can use materialize() function:
let _data = range x from 1 to 100 step 1;
let _sample = materialize(_data | sample 1);
union (_sample), (_sample)
Output
x |
---|
34 |
34 |
To sample a certain percentage of your data (rather than a specified number of rows), you can use
StormEvents | where rand() < 0.1
To sample keys rather than rows (for example - sample 10 Ids and get all rows for these Ids) you can use sample-distinct
in combination with the in
operator.
let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)