hll_merge() (aggregation function)
Applies to: ✅ Azure Data Explorer ✅ Azure Monitor ✅ Microsoft Sentinel
Merges HLL results across the group into a single HLL value.
Note
You can't merge hll values that were created using different accuracy values. For more information, see hll().
Note
This function is used in conjunction with the summarize operator.
For more information, see the underlying algorithm (HyperLogLog) and estimation accuracy.
Important
The results of hll(), hll_if(), and hll_merge() can be stored and later retrieved. For example, you may want to create a daily unique users summary, which can then be used to calculate weekly counts. However, the precise binary representation of these results may change over time. There's no guarantee that these functions will produce identical results for identical inputs, and therefore we don't advise relying on them.
Syntax
hll_merge
(
hll)
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
hll | string |
✔️ | The column name containing HLL values to merge. |
Returns
The function returns the merged HLL values of hll across the group.
Tip
Use the dcount_hll function to calculate the dcount
from hll() and hll_merge() aggregation functions.
Example
The following example shows HLL results across a group merged into a single HLL value.
StormEvents
| summarize hllRes = hll(DamageProperty) by bin(StartTime,10m)
| summarize hllMerged = hll_merge(hllRes)
Output
The results show only the first five results in the array.
hllMerged |
---|
[[1024,14],["-6903255281122589438","-7413697181929588220","-2396604341988936699","5824198135224880646","-6257421034880415225", ...],[]] |
Estimation accuracy
This function uses a variant of the HyperLogLog (HLL) algorithm, which does a stochastic estimation of set cardinality. The algorithm provides a "knob" that can be used to balance accuracy and execution time per memory size:
Accuracy | Error (%) | Entry count |
---|---|---|
0 | 1.6 | 212 |
1 | 0.8 | 214 |
2 | 0.4 | 216 |
3 | 0.28 | 217 |
4 | 0.2 | 218 |
Note
The "entry count" column is the number of 1-byte counters in the HLL implementation.
The algorithm includes some provisions for doing a perfect count (zero error), if the set cardinality is small enough:
- When the accuracy level is
1
, 1000 values are returned - When the accuracy level is
2
, 8000 values are returned
The error bound is probabilistic, not a theoretical bound. The value is the standard deviation of error distribution (the sigma), and 99.7% of the estimations will have a relative error of under 3 x sigma.
The following image shows the probability distribution function of the relative estimation error, in percentages, for all supported accuracy settings: