Materialized views
Applies to: ✅ Azure Data Explorer
Materialized views expose an aggregation query over a source table, or over another materialized view.
Materialized views always return an up-to-date result of the aggregation query (always fresh). Querying a materialized view is more performant than running the aggregation directly over the source table.
Note
- To decide whether materialized views are suitable for you, review the materialized views use cases.
- Materialized views have some limitations. Before working with the feature, review the performance considerations.
- Consider using update policies where appropriate. For more information, see Materialized views vs. update policies.
- Monitor the health of your materialized views based on the recommendations in Monitor materialized views.
Why use materialized views?
By investing resources (data storage, background CPU cycles) for materialized views of commonly used aggregations, you get the following benefits:
Performance improvement: Querying a materialized view commonly performs better than querying the source table for the same aggregation function(s).
Freshness: A materialized view query always returns the most up-to-date results, independent of when materialization last took place. The query combines the materialized part of the view with the records in the source table, which haven't yet been materialized (the
delta
part), always providing the most up-to-date results.Cost reduction: Querying a materialized view consumes less resources than doing the aggregation over the source table. Retention policy of source table can be reduced if only aggregation is required. This setup reduces hot cache costs for the source table.
For example use cases, see Materialized view use cases.
How materialized views work
A materialized view is made of two components:
- A materialized part - a table holding aggregated records from the source table, which have already been processed. This table always holds a single record per the aggregation's group-by combination.
- A delta - the newly ingested records in the source table that haven't yet been processed.
Querying the materialized view combines the materialized part with the delta part, providing an up-to-date result of the aggregation query. The offline materialization process ingests new records from the delta to the materialized table, and updates existing records. If the intersection between the delta and the materialized part is large, and many records require updates, this might have a negative impact on the materialization process. See monitor materialized views on how to troubleshoot such situations.
Materialized views queries
There are 2 ways to query a materialized view:
Query the entire view: when you query the materialized view by its name, similarly to querying a table, the materialized view query combines the materialized part of the view with the records in the source table that haven't been materialized yet (the
delta
).- Querying the materialized view always returns the most up-to-date results, based on all records ingested to the source table. For more information about the materialized vs. non-materialized parts in materialized view, see how materialized views work.
- This option might not perform best as it needs to materialize the
delta
part during query time. Performance in this case depends on the view's age and the filters applied in the query. The materialized view query optimizer section includes possible ways to improve query performance when querying the entire view.
Query the materialized part only: another way of querying the view is by using the
materialized_view()
function. This option supports querying only the materialized part of the view, while specifying the max latency the user is willing to tolerate.- This option isn't guaranteed to return the most up-to-date records, but it should always be more performant than querying the entire view.
- This function is useful for scenarios in which you're willing to sacrifice some freshness for performance, for example for telemetry dashboards.
Tip
Queries over the materialized part only always perform better than querying the entire view. Always use the materialized_view()
function when applicable for your use case.
Materialized views participate in cross-cluster or cross-database queries, but aren't included in wildcard unions or searches.
- The following examples all include materialized views by the name
ViewName
:
cluster('cluster1').database('db').ViewName cluster('cluster1').database('*').ViewName database('*').ViewName database('DB*').ViewName database('*').materialized_view('ViewName') database('DB*').materialized_view('ViewName')
- The following examples do not include records from materialized views:
cluster('cluster1').database('db').* database('*').View* search in (*) search *
- The following examples all include materialized views by the name
Materialized view query optimizer
When querying the entire view, the materialized part is combined with the delta
during query time. This includes aggregating the delta
and joining it with the materialized part.
- Querying the entire view performs better if the query includes filters on the group by keys of the materialized view query. See more tips about how to create your materialized view, based on your query pattern, in the
.create materialized-view
performance tips section. - The query optimizer chooses summarize/join strategies that are expected to improve query performance. For example, the decision on whether to shuffle the query is based on number of records in
delta
part. The following client request properties provide some control over the optimizations applied. You can test these properties with your materialized view queries and evaluate their impact on queries performance.
Client request property name | Type | Description |
---|---|---|
materialized_view_query_optimization_costbased_enabled |
bool |
If set to false , disables summarize/join optimizations in materialized view queries. Uses default strategies. Default is true . |
materialized_view_shuffle |
dynamic |
Force shuffling of the materialized view query, and (optionally) provide specific keys to shuffle by. See examples below. |
ingestion_time()
function in the context of materialized views
ingestion_time() function returns null values, when used in the context of a materialized view, if querying the entire view. When querying the materialized part of the view, the return value depends on the type of materialized view:
- In materialized views which include a single
arg_max()
/arg_min()
/take_any()
aggregation, theingestion_time()
is equal to theingestion_time()
of the corresponding record in the source table. - In all other materialized views, the value of
ingestion_time()
is approximately the time of materialization (see how materialized views work).
Examples
Query the entire view. The most recent records in source table are included:
ViewName
Query the materialized part of the view only, regardless of when it was last materialized.
materialized_view("ViewName")
Query the entire view, and provide a "hint" to use
shuffle
strategy. The most recent records in source table are included:- Example #1: shuffle based on the
Id
column (similarly to usinghint.shufflekey=Id
):
set materialized_view_shuffle = dynamic([{"Name" : "ViewName", "Keys" : [ "Id" ] }]); ViewName
- Example #2: shuffle based on all keys (similarly to using
hint.strategy=shuffle
):
set materialized_view_shuffle = dynamic([{"Name" : "ViewName" }]); ViewName
- Example #1: shuffle based on the
Performance considerations
The main contributors that can impact a materialized view health are:
- Cluster resources: Like any other process running on the cluster, materialized views consume resources (CPU, memory) from the cluster. If the cluster is overloaded, adding materialized views to it may cause a degradation in the cluster's performance. Monitor your cluster's health using cluster health metrics. Optimized autoscale currently doesn't take materialized views health under consideration as part of autoscale rules.
- The materialization process is limited by the amount of memory and CPU it can consume. These limits are defined, and can be changed, in the materialized views workload group.
Overlap with materialized data: During materialization, all new records ingested to the source table since the last materialization (the delta) are processed and materialized into the view. The higher the intersection between new records and already materialized records is, the worse the performance of the materialized view will be. A materialized view works best if the number of records being updated (for example, in
arg_max
view) is a small subset of the source table. If all or most of the materialized view records need to be updated in every materialization cycle, then the materialized view might not perform well.Ingestion rate: There are no hard-coded limits on the data volume or ingestion rate in the source table of the materialized view. However, the recommended ingestion rate for materialized views is no more than 1-2GB/sec. Higher ingestion rates may still perform well. Performance depends on database size, available resources, and amount of intersection with existing data.
- Number of materialized views in cluster: The above considerations apply to each individual materialized view defined in the cluster. Each view consumes its own resources, and many views compete with each other on available resources. While there are no hard-coded limits to the number of materialized views in a cluster, the cluster may not be able to handle all materialized views, when there are many defined. The capacity policy can be adjusted if there is more than a single materialized view in the cluster. Increase the value of
ClusterMinimumConcurrentOperations
in the policy to run more materialized views concurrently.
- Materialized view definition: The materialized view definition must be defined according to query best practices for best query performance. For more information, see create command performance tips.
Materialized view over materialized view
A materialized view can be created over another materialized view if the source materialized view is a deduplication view. Specifically, the aggregation of the source materialized view must be take_any(*)
in order to deduplicate source records. The second materialized view can use any supported aggregation functions. For specific information on how to create a materialized view over a materialized view, see .create materialized-view
command.
Tip
When querying a materialized view that is defined over another materialized view, we recommend querying the materialized part only using the materialized_view()
function. Querying the entire view is not performant when both views aren't fully materialized. For more information, see materialized views queries.