top-nested operator

Applies to: ✅ Azure Data ExplorerAzure MonitorMicrosoft Sentinel

The top-nested operator performs hierarchical aggregation and value selection.

Imagine you have a table with sales information like regions, salespeople, and amounts sold. The top-nested operator can help you answer complex questions, such as "What are the top five regions by sales, and who are the top three salespeople in each of those regions?"

The source data is partitioned based on the criteria set in the first top-nested clause, such as region. Next, the operator picks the top records in each partition using an aggregation, such as adding sales amounts. Each subsequent top-nested clause refines the partitions created by the previous clause, creating a hierarchy of more precise groups.

The result is a table with two columns per clause. One column holds the partitioning values, such as region, while the other column holds the outcomes of the aggregation calculation, like the sum of sales.

Syntax

T | top-nested [ N ] of Expr [with others = ConstExpr] by Aggregation [asc | desc] [,
  top-nested ... ]

Learn more about syntax conventions.

Parameters

Name Type Required Description
T string ✔️ The input tabular expression.
N int The number of top values to be returned for this hierarchy level. If omitted, all distinct values are returned.
Expr string ✔️ An expression over the input record indicating which value to return for this hierarchy level. Typically, it refers to a column from T or involves a calculation like bin() on a column. Optionally, set an output column name as Name = Expr.
ConstExpr string If specified, for each hierarchy level, one record is added with the value that is the aggregation over all records that didn't make it to the top.
Aggregation string The aggregation function applied to records with the same Expr value. The result determines the top records. See Supported aggregation functions. Optionally, set an output column name as Name = Aggregation.

Supported aggregation functions

The following aggregation functions are supported:

Note

Any algebraic combination of the aggregations is also supported.

Returns

A table with two columns for each clause. One column contains unique values computed using Expr, and the other column shows the results obtained from the Aggregation calculation.

Using the with others clause

Using the top-nested operator with with others adds the ability to see your top content contextualized in a wider data set. Evaluating your data in this way is valuable when rendering the data visually.

Include data from other columns

Only columns specified as a top-nested clause Expr are displayed in the output table.

To include all values of a column at a specific level:

  1. Don't specify the value of N.
  2. Use the column name as the value of Expr.
  3. Use Ignore=max(1) as the value of Aggregation.
  4. Remove the unnecessary Ignore column with project-away.

For an example, see Most recent events per state with other column data.

Performance considerations

The number of records can grow exponentially with the number of top-nested clauses, and record growth is even faster if the N parameter is not specified. This operator can consume a considerable amount of resources.

If the aggregation distribution is irregular, limit the number of distinct values to return by specifying N. Then, use the with others = ConstExpr clause to get a sense of the weight of all other cases.

Examples

Top damaged states, event types, and end locations by property damage

The following query partitions the StormEvents table by the State column and calculates the total property damage for each state. The query selects the top two states with the largest amount of property damage. Within these top two states, the query groups the data by EventType and selects the top three event types with the most damage. Then the query groups the data by EndLocation and selects the EndLocation with the highest damage. Only one EndLocation value appears in the results, possibly due to the large nature of the storm events or not documenting the end location.

StormEvents  // Data source.
| top-nested 2 of State by sum(DamageProperty),       // Top 2 States by total damaged property.
  top-nested 3 of EventType by sum(DamageProperty),   // Top 3 EventType by total damaged property for each State.
  top-nested 1 of EndLocation by sum(DamageProperty)  // Top 1 EndLocation by total damaged property for each EventType and State.
| project State, EventType, EndLocation, StateTotalDamage = aggregated_State, EventTypeTotalDamage = aggregated_EventType, EndLocationDamage = aggregated_EndLocation

Output

State EventType EndLocation StateTotalDamage EventTypeTotalDamage EndLocationDamage
CALIFORNIA Wildfire 1445937600 1326315000 1326315000
CALIFORNIA HighWind 1445937600 61320000 61320000
CALIFORNIA DebrisFlow 1445937600 48000000 48000000
OKLAHOMA IceStorm 915470300 826000000 826000000
OKLAHOMA WinterStorm 915470300 40027000 40027000
OKLAHOMA Flood COMMERCE 915470300 21485000 20000000

Top five states with property damage with others grouped

The following example uses the top-nested operator to identify the top five states with the most property damage and uses the with others clause to group damaged property for all other states. It then visualizes damaged property for the top five states and all other states as a piechart using the render command.

StormEvents
| top-nested 5 of State with others="OtherStates" by sum(DamageProperty)
| render piechart  

Output

Screenshot of the top five states with the most property damaged, and all other states grouped separately rendered as a pie-chart.

Most recent events per state with other column data

The following query retrieves the two most recent events for each US state with relevant event details. It uses max(1) within certain columns to propagate data without using the top-nested selection logic. The generated Ignore aggregation columns are removed using project-away.

StormEvents
| top-nested of State by Ignore0=max(1),                  // Partition the data by each unique value of state.
  top-nested 2 of StartTime by Ignore1=max(StartTime),    // Get the 2 most recent events in each state.
  top-nested of EndTime by Ignore2=max(1),                // Append the EndTime for each event.
  top-nested of EpisodeId by Ignore3=max(1)               // Append the EpisodeId for each event.
| project-away Ignore*                                    // Remove the unnecessary aggregation columns.
| order by State asc, StartTime desc                      // Sort results alphabetically and chronologically.

Latest records per identity with other column data

The following top-nested example extracts the latest records per identity and builds on the concepts introduced in the previous example. The first top-nested clause partitions the data by distinct values of id using Ignore0=max(1) as a placeholder. For each id, it identifies the two most recent records based on the timestamp. Other information is appended using a top-nested operator without specifying a count and using Ignore2=max(1) as a placeholder. Finally, unnecessary aggregation columns are removed using the project-away operator.

datatable(id: string, timestamp: datetime, otherInformation: string) // Create a source datatable.
[
    "Barak", datetime(2015-01-01), "1",
    "Barak", datetime(2016-01-01), "2",
    "Barak", datetime(2017-01-20), "3",
    "Donald", datetime(2017-01-20), "4",
    "Donald", datetime(2017-01-18), "5",
    "Donald", datetime(2017-01-19), "6"
]
| top-nested of id by Ignore0=max(1),                     // Partition the data by each unique value of id.
  top-nested 2 of timestamp by Ignore1=max(timestamp),    // Get the 2 most recent events for each state.
  top-nested of otherInformation by Ignore2=max(1)        // Append otherInformation for each event.
| project-away Ignore0, Ignore1, Ignore2                  // Remove the unnecessary aggregation columns.

Output

id timestamp otherInformation
Barak 2016-01-01T00:00:00Z 2
Donald 2017-01-19T00:00:00Z 6
Barak 2017-01-20T00:00:00Z 3
Donald 2017-01-20T00:00:00Z 4