.create materialized-view
Applies to: ✅ Azure Data Explorer
A materialized view is an aggregation query over a source table. It represents a single summarize
statement.
There are two possible ways to create a materialized view, as noted by the backfill option in the command:
Create the materialized view from now onward:
- The materialized view is created empty. It includes only records ingested after view creation. Creation of this kind returns immediately, and the view is immediately available for query.
Create the materialized view based on existing records in the source table:
- See Backfill a materialized view.
- Creation might take a long while to complete, depending on the number of records in the source table. The view won't be available for queries until backfill is complete.
- When you're using this option, the create command must be
async
. You can monitor execution by using the.show operations
command. - You can cancel the backfill process by using the
.cancel operation
command.
Important
On large source tables, the backfill option might take a long time to complete. If this process transiently fails while running, it won't be automatically retried. You must then re-execute the create command. For more information, see Backfill a materialized view.
Permissions
This command requires Database Admin permissions. The creator of the materialized view becomes the admin of it.
Syntax
.create
[async
] [ifnotexists
] materialized-view
[ with
(
PropertyName =
PropertyValue,
...)
] MaterializedViewName on table
SourceTableName {
Query }
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
PropertyName, PropertyValue | string |
List of properties in the form of name and value pairs, from the list of supported properties. | |
MaterializedViewName | string |
✔️ | Name of the materialized view. The view name can't conflict with table or function names in the same database and must adhere to the identifier naming rules. |
SourceTableName | string |
✔️ | Name of source table on which the view is defined. |
Query | string |
✔️ | Query definition of the materialized view. For more information and limitations, see Query parameter section. |
Note
If the materialized view already exists:
- If the
ifnotexists
flag is specified, the command is ignored. No change is applied, even if the new definition doesn't match the existing definition. - If the
ifnotexists
flag isn't specified, an error is returned. - To alter an existing materialized view, use the .alter materialized-view command.
Supported properties
The following properties are supported in the with
(
PropertyName =
PropertyValue)
clause. All properties are optional.
Name | Type | Description |
---|---|---|
backfill | bool |
Whether to create the view based on all records currently in SourceTable (true ), or to create it from now on (false ). Default is false . For more information, see Backfill a materialized view. |
effectiveDateTime | datetime |
Relevant only when you're using backfill . If it's set, creation backfills only with records ingested after the datetime. backfill must also be set to true . This property expects a datetime literal; for example, effectiveDateTime=datetime(2019-05-01) . |
updateExtentsCreationTime | bool |
Relevant only when you're using backfill . If it's set to true , Extent Creation time is assigned based on the datetime group-by key during the backfill process. For more information, see Backfill a materialized view. |
lookback | timespan |
Valid only for arg_max /arg_min /take_any materialized views. It limits the period of time in which duplicates are expected. For example, if a lookback of 6 hours is specified on an arg_max view, the deduplication between newly ingested records and existing ones will take into consideration only records that were ingested up to 6 hours ago. Lookback is relative to ingestion_time(). If the materialized view query does not preserve the ingestion_time() value, lookback cannot be defined on the view. See materialized views limitations and known issues. Defining the lookback period incorrectly might lead to duplicates in the materialized view. For example, if a record for a specific key is ingested 10 hours after a record for the same key was ingested, and the lookback is set to 6 hours, that key will be a duplicate in the view. The lookback period is applied during both materialization time and query time. |
autoUpdateSchema | bool |
Whether to automatically update the view on source table changes. Default is false . This option is valid only for views of type arg_max(Timestamp, *) /arg_min(Timestamp, *) /take_any(*) (only when the column's argument is * ). If this option is set to true , changes to the source table will be automatically reflected in the materialized view. |
dimensionTables | array | A dynamic argument that includes an array of dimension tables in the view. See Query parameter. |
folder | string |
The materialized view's folder. |
docString | string |
A string that documents the materialized view. |
allowMaterializedViewsWithoutRowLevelSecurity | bool |
Allows creating a materialized view over a table with row level security policy enabled. |
Warning
- The system will automatically disable a materialized view if changes to the source table of the materialized view, or changes in data, lead to incompatibility between the materialized view query and the expected materialized view schema.
- To avoid this error, the materialized view query must be deterministic. For example, the bag_unpack or pivot plugin results in a non-deterministic schema.
- When you're using an
arg_max(Timestamp, *)
aggregation and whenautoUpdateSchema
is false, changes to the source table can also lead to schema mismatches. Avoid this failure by defining the view query asarg_max(Timestamp, Column1, Column2, ...)
, or by using theautoUpdateSchema
option. - Using
autoUpdateSchema
might lead to irreversible data loss when columns in the source table are dropped. - Monitor automatic disabling of materialized views by using the MaterializedViewResult metric.
- After you fix incompatibility issues, you should explicitly re-enable the view by using the enable materialized view command.
Create materialized view over materialized view
You can create a materialized view over another materialized view only when the source materialized view is a take_any(*)
aggregation (deduplication). For more information, see Materialized view over materialized view and Examples.
Syntax:
.create
[async
] [ifnotexists
] materialized-view
[ with
(
PropertyName =
PropertyValue,
...)
] MaterializedViewName on materialized-view
SourceMaterializedViewName {
Query }
Parameters:
Name | Type | Required | Description |
---|---|---|---|
PropertyName, PropertyValue | string |
List of properties in the form of name and value pairs, from the list of supported properties. | |
MaterializedViewName | string |
✔️ | Name of the materialized view. The view name can't conflict with table or function names in the same database and must adhere to the identifier naming rules. |
SourceMaterializedViewName | string |
✔️ | Name of source materialized view on which the view is defined. |
Query | string |
✔️ | Query definition of the materialized view. |
Examples
Create an empty
arg_max
view that will materialize only records ingested from now on:.create materialized-view ArgMax on table T { T | summarize arg_max(Timestamp, *) by User }
Create a materialized view for daily aggregates with the backfill option, by using
async
:.create async materialized-view with (backfill=true, docString="Customer telemetry") CustomerUsage on table T { T | extend Day = bin(Timestamp, 1d) | summarize count(), dcount(User), max(Duration) by Customer, Day }
Create a materialized view with
backfill
andeffectiveDateTime
. The view is created based on records from the datetime only..create async materialized-view with (backfill=true, effectiveDateTime=datetime(2019-01-01)) CustomerUsage on table T { T | extend Day = bin(Timestamp, 1d) | summarize count(), dcount(User), max(Duration) by Customer, Day }
Create a materialized view that deduplicates the source table, based on the
EventId
column, by using a lookback of 6 hours. Records will be deduplicated against only records ingested 6 hours before current records..create materialized-view with(lookback=6h) DeduplicatedTable on table T { T | summarize take_any(*) by EventId }
Create a downsampling materialized view that's based on the previous
DeduplicatedTable
materialized view:.create materialized-view DailyUsage on materialized-view DeduplicatedTable { DeduplicatedTable | summarize count(), dcount(User) by Day=bin(Timestamp, 1d) }
The definition can include additional operators before the
summarize
statement, as long assummarize
is the last one:.create materialized-view CustomerUsage on table T { T | where Customer in ("Customer1", "Customer2", "CustomerN") | parse Url with "https://contoso.com/" Api "/" * | extend Month = startofmonth(Timestamp) | summarize count(), dcount(User), max(Duration) by Customer, Api, Month }
Here are materialized views that join with a dimension table:
.create materialized-view with (dimensionTables = dynamic(["DimUsers"])) EnrichedArgMax on table T { T | lookup DimUsers on User | summarize arg_max(Timestamp, *) by User } .create materialized-view with (dimensionTables = dynamic(["DimUsers"])) EnrichedArgMax on table T { DimUsers | project User, Age, Address | join kind=rightouter hint.strategy=broadcast T on User | summarize arg_max(Timestamp, *) by User }
Remarks
Query parameter
The following rules limit the query used in the materialized view Query parameter:
The query should reference a single fact table that is the source of the materialized view. It should include a single
summarize
operator, and one or more aggregation functions aggregated by one or more groups by expressions. Thesummarize
operator must always be the last operator in the query.A materialized view that includes only a single
arg_max
/arg_min
/take_any
aggregation might perform better than a materialized view that includes these aggregations along with other aggregations (such ascount
/dcount
/avg
). This is because some optimizations are relevant only to these kinds of materialized views. They don't apply when the view includes mixed aggregation functions (where mixed means botharg_max
/arg_min
/take_any
and other aggregations in the same view).The query shouldn't include any operators that depend on
now()
. For example, the query shouldn't havewhere Timestamp > ago(5d)
. Use the retention policy on the materialized view to limit the period of time that the view covers.The following operators are not supported in the materialized view query:
sort
,top-nested
,top
,partition
,serialize
.Composite aggregations are not supported in the definition of the materialized view. For instance, instead of using
SourceTableName | summarize Result=sum(Column1)/sum(Column2) by Id
, define the materialized view as:SourceTableName | summarize a=sum(Column1), b=sum(Column2) by Id
. During view query time, runMaterializedViewName | project Id, Result=a/b
. The required output of the view, including the calculated column (a/b
), can be encapsulated in a stored function. Access the stored function instead of accessing the materialized view directly.
- Cross-cluster and cross-database queries aren't supported.
References to external_table() and externaldata aren't supported.
The materialized view query can't include any callouts that require impersonation. Specifically, all query connectivity plugins that use impersonation aren't allowed.
In addition to the source table of the view, the query is allowed to reference one or more dimension tables. Dimension tables must be explicitly called out in the view properties. It's important to understand the following behavior when you're joining with dimension tables:
Records in the view's source table (the fact table) are materialized only once. Updates to the dimension tables don't have any impact on records that have already been processed from the fact table.
A different ingestion latency between the fact table and the dimension table might affect the view results.
Example: A view definition includes an inner join with a dimension table. At the time of materialization, the dimension record was not fully ingested, but it was already ingested into the fact table. This record will be dropped from the view and never processed again.
Similarly, if the join is an outer join, the record from fact table will be processed and added to view with a null value for the dimension table columns. Records that have already been added (with null values) to the view won't be processed again. Their values, in columns from the dimension table, will remain null.
Supported aggregation functions
The following aggregation functions are supported:
count
countif
dcount
dcountif
min
max
avg
avgif
sum
sumif
arg_max
arg_min
take_any
take_anyif
hll
make_set
make_list
make_bag
percentile
,percentiles
tdigest
Performance tips
Use a datetime group-by key: Materialized views that have a datetime column as one of their group-by keys are more efficient than those that don't. The reason is that some optimizations can be applied only when there's a datetime group-by key. If adding a datetime group-by key doesn't change the semantics of your aggregation, we recommend that you add it. You can do this only if the datetime column is immutable for each unique entity.
For example, in the following aggregation:
SourceTable | summarize take_any(*) by EventId
If
EventId
always has the sameTimestamp
value, and therefore addingTimestamp
doesn't change the semantics of the aggregation, it's better to define the view as:SourceTable | summarize take_any(*) by EventId, Timestamp
Tip
Late-arriving data in a datetime group-by key can have a negative impact on the materialized view's performance. For example, assume that a materialized view uses
bin(Timestamp, 1d)
as one of its group-by keys, and newly ingested records to the source table have oldTimestamp
values. These records might negatively affect the materialized view.If you expect late arriving records ingested to the source table, adjust the caching policy of the materialized view accordingly. For example, if records with Timestamp of six months ago are expected to be ingested to the source table, the materialization process will need to scan the materialized view for the previous six months. If this period is in cold cache, materialization will experience cache misses which will have a negative impact on the performance of the view.
If such late arriving records are not expected, we recommend that in the materialized view query, you either filter out these records or normalize their timestamp values to the current time.
Define a lookback period: If applicable to your scenario, adding a
lookback
property can significantly improve query performance. For details, see Supported properties.Add columns frequently used for filtering as group-by keys: Materialized view queries are optimized when they're filtered by one of the materialized view's group-by keys. If you know that your query pattern will often filter by a column that's immutable according to a unique entity in the materialized view, include it in the materialized view's group-by keys.
For example, a materialized view exposes
arg_max
by aResourceId
value that will often be filtered bySubscriptionId
. Assuming that aResourceId
value always belongs to the sameSubscriptionId
value, define the materialized view query as:.create materialized-view ArgMaxResourceId on table FactResources { FactResources | summarize arg_max(Timestamp, *) by SubscriptionId, ResourceId }
The preceding definition is preferable over the following:
.create materialized-view ArgMaxResourceId on table FactResources { FactResources | summarize arg_max(Timestamp, *) by ResourceId }
Use update policies where appropriate: The materialized view can include transformations, normalizations, and lookups in dimension tables. However, we recommend that you move these operations to an update policy. Leave only the aggregation for the materialized view.
For example, it's better to define the following update policy:
.alter-merge table Target policy update @'[{"IsEnabled": true, "Source": "SourceTable", "Query": "SourceTable | extend ResourceId = strcat('subscriptions/', toupper(SubscriptionId), '/', resourceId)", | lookup DimResources on ResourceId | mv-expand Events "IsTransactional": false}]'
And define the following materialized view:
.create materialized-view Usage on table Events { Target | summarize count() by ResourceId }
The alternative, of including the update policy as part of the materialized view query, might perform worse and therefore not recommended:
.create materialized-view Usage on table SourceTable { SourceTable | extend ResourceId = strcat('subscriptions/', toupper(SubscriptionId), '/', resourceId) | lookup DimResources on ResourceId | mv-expand Events | summarize count() by ResourceId }
Tip
If you require the best query time performance, but you can tolerate some data latency, use the materialized_view() function.
Backfill a materialized view
When you're creating a materialized view by using the backfill
property, the materialized view will be created based on the records available in the source table. Or it will be created based on a subset of those records, if you use effectiveDateTime
.
Behind the scenes, the backfill process splits the data to backfill into multiple batches and executes several ingest operations to backfill the view. The process might take a long while to complete when the number of records in source table is large. The process duration depends on database size. Track the progress of the backfill by using the .show operations
command.
Transient failures that occur as part of the backfill process are retried. If all retries are exhausted, the command will fail and require a manual re-execution of the create command.
We don't recommend that you use backfill when the number of records in the source table exceeds number-of-nodes X 200 million
(sometimes even less, depending on the complexity of the query). An alternative is the backfill by move extents option.
Using the backfill option is not supported for data in a cold cache. Increase the hot cache period, if necessary, for the duration of the view creation. This might require scale-out.
If you experience failures in view creation, try changing these properties:
MaxSourceRecordsForSingleIngest
: By default, the number of source records in each ingest operation during backfill is 2 million per node. You can change this default by setting this property to the desired number of records. (The value is the total number of records in each ingest operation.)Decreasing this value can be helpful when creation fails on memory limits or query timeouts. Increasing this value can speed up view creation, assuming that the database can execute the aggregation function on more records than the default.
Concurrency
: The ingest operations, running as part of the backfill process, run concurrently. By default, concurrency ismin(number_of_nodes * 2, 5)
. You can set this property to increase or decrease concurrency. We recommend increasing this value only if the databse's CPU is low, because the increase can significantly affect the database's CPU consumption.
For example, the following command will backfill the materialized view from 2020-01-01
. The maximum number of records in each ingest operation is 3 million. The command will execute the ingest operations with concurrency of 2
.
.create async materialized-view with (
backfill=true,
effectiveDateTime=datetime(2020-01-01),
MaxSourceRecordsForSingleIngest=3000000,
Concurrency=2
)
CustomerUsage on table T
{
T
| summarize count(), dcount(User), max(Duration) by Customer, bin(Timestamp, 1d)
}
If the materialized view includes a datetime group-by key, the backfill process supports overriding the extent creation time based on the datetime column. This can be useful, for example, if you want older records to be dropped before recent ones, because the retention policy is based on the extent creation time. The updateExtentsCreationTime
property is only supported if the view includes a datetime group-by key which uses the bin()
function. For example, the following backfill will assign creation time based on the Timestamp
group-by key:
.create async materialized-view with (
backfill=true,
updateExtentsCreationTime=true
)
CustomerUsage on table T
{
T
| summarize count() by Customer, bin(Timestamp, 1d)
}
Backfill by move extents
The option of backfilling by move extents backfills the materialized view based on an existing table, which isn't necessarily the source table of the materialized view. You achieve the backfill by moving extents from the specified table into the underlying materialized view table. This process implies that:
- The data in the specified table should have the same schema as the materialized view schema.
- Records in the specified table are moved to the view as is. They're assumed to be deduped based on the definition of the materialized view.
For example, if the materialized view has the following aggregation:
T | summarize arg_max(Timestamp, *) by EventId
Then the records in the source table for the move extents operation should already be deduped by EventID
.
Because the operation uses .move extents, the records will be removed from the specified table during the backfill (moved, not copied).
Backfill by move extents is not supported for all aggregation functions supported in materialized views. It will fail for aggregations such as avg()
, dcount()
, in which the underlying data stored in the view is different than the aggregation itself.
The materialized view is backfilled only based on the specified table. Materialization of records in the source table of the view will start from view creation time, by default.
If the source table of the materialized view is continuously ingesting data, creating the view by using move extents might result in some data loss. This is because records ingested into the source table, in the short time between the time of preparing the table to backfill from and the time that the view is created, won't be included in the materialized view. To handle this scenario, you can set the source_ingestion_time_from
property to the start time of the materialized view over the source table.
Use cases
The option of backfilling by move extents can be useful in two main scenarios:
When you already have a table that includes the deduplicated source data for the materialized view, and you don't need these records in this table after view creation because you're using only the materialized view.
When the source table of the materialized view is very large, and backfilling the view based on the source table doesn't work well because of the limitations mentioned earlier. In this case, you can orchestrate the backfill process yourself into a temporary table by using ingest from query commands. When the temporary table includes all records for the backfill, create the materialized view based on that table.
You can also use one of the recommended orchestration tools.
Examples:
In the following example, table
DeduplicatedTable
includes a single record perEventId
instance and will be used as the baseline for the materialized view. Only records inT
that are ingested after the view creation time will be included in the materialized view..create async materialized-view with (move_extents_from=DeduplicatedTable) MV on table T { T | summarize arg_max(Timestamp, *) by EventId }
If the
effectiveDateTime
property is specified along with themove_extents_from
property, only extents inDeduplicatedTable
whoseMaxCreatedOn
value is greater thaneffectiveDateTime
are included in the backfill (moved to the materialized view):.create async materialized-view with (move_extents_from=DeduplicatedTable, effectiveDateTime=datetime(2019-01-01)) MV on table T { T | summarize arg_max(Timestamp, *) by EventId }
The following example demonstrates the use of the
source_ingestion_time_from
property in the option of backfilling by move extents. Using bothsource_ingestion_time_from
andmove_extents_from
indicates that the materialized view is backfilled from two sources:The
move_extents_from
table:DeduplicatedTable
in the following example. This table should include all historical data to backfill. You can optionally use theeffectiveDateTime
property to include only extents inDeduplicatedTable
whoseMaxCreatedOn
value is greater thaneffectiveDateTime
.The source table of the materialized view:
T
in the following example. Backfill from this table includes only records whose ingestion_time() value is greater thansource_ingestion_time_from
.The
source_ingestion_time_from
property should be used only to handle the possible data loss in the short time between preparing the table to backfill from (DeduplicatedTable
) and the time that the view is created. Don't set this property too far in the past. That would start the materialized view with a significant lag, which might be hard to catch up with.
In the following example, assume that the current time is
2020-01-01 03:00
. TableDeduplicatedTable
is a deduped table ofT
. It includes all historical data, deduplicated until2020-01-01 00:00
. Thecreate
command usesDeduplicatedTable
for backfilling the materialized view by using move extents. Thecreate
command also includes all records inT
that were ingested since2020-01-01
..create async materialized-view with (move_extents_from=DeduplicatedTable, source_ingestion_time_from=datetime(2020-01-01)) MV on table T { T | summarize arg_max(Timestamp, *) by EventId }
Cancel materialized view creation
You can cancel the process of materialized view creation when you're using the backfill option. This action is useful when creation is taking too long and you want to stop it while it's running.
Warning
The materialized view can't be restored after you run this command.
The creation process can't be canceled immediately. The cancel command signals materialization to stop, and the creation periodically checks if a cancel was requested. The cancel command waits for a maximum period of 10 minutes until the materialized view creation process is canceled, and it reports back if cancellation was successful.
Even if the cancellation doesn't succeed within 10 minutes, and the cancel command reports failure, the materialized view will probably cancel itself later in the creation process. The .show operations
command indicates if the operation was canceled.
If the operation is no longer in progress when the .cancel operation
command is issued, the command will report an error saying so.
Syntax
.cancel operation
operationId
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
operationId |
guid |
✔️ | The operation ID returned from the .create async materialized-view command. |
Output
Name | Type | Description |
---|---|---|
OperationId | guid |
The operation ID of the .create materialized-view command. |
Operation | string |
The type of operation. |
StartedOn | datetime |
The start time of the create operation. |
CancellationState | string |
One of: Canceled successfully (creation was canceled), Cancellation failed (wait for cancellation timed out), Unknown (view creation is no longer running but wasn't canceled by this operation). |
ReasonPhrase | string |
The reason why cancellation wasn't successful. |
Examples
.cancel operation c4b29441-4873-4e36-8310-c631c35c916e
OperationId | Operation | StartedOn | CancellationState | ReasonPhrase |
---|---|---|---|---|
c4b29441-4873-4e36-8310-c631c35c916e |
MaterializedViewCreateOrAlter |
2020-05-08 19:45:03.9184142 |
Canceled successfully |
If the cancellation hasn't finished within 10 minutes, CancellationState
will indicate failure. Creation can then be canceled.