Ingest JSON formatted sample data into Azure Data Explorer

This article shows you how to ingest JSON formatted data into an Azure Data Explorer database. You'll start with simple examples of raw and mapped JSON, continue to multi-lined JSON, and then tackle more complex JSON schemas containing arrays and dictionaries. The examples detail the process of ingesting JSON formatted data using Kusto Query Language (KQL), C#, or Python.

Note

We don't recommend using .ingest management commands in production scenarios. Instead, use a data connector or programmatically ingest data using one of the Kusto client libraries.

Prerequisites

  • A Microsoft account or a Microsoft Entra user identity. An Azure subscription isn't required.
  • An Azure Data Explorer cluster and database. Create a cluster and database.

The JSON format

Azure Data Explorer supports two JSON file formats:

  • json: Line separated JSON. Each line in the input data has exactly one JSON record. This format supports parsing of comments and single-quoted properties. For more information, see JSON Lines.
  • multijson: Multi-lined JSON. The parser ignores the line separators and reads a record from the previous position to the end of a valid JSON.

Note

When ingesting using the get data experience, the default format is multijson. The format can handle multiline JSON records and arrays of JSON records. When a parsing error is encountered, the entire file is discarded. To ignore invalid JSON records, select the option to "Ignore data format errors.", which will switch the format to json (JSON Lines).

If you're using the JSON Line format (json), lines that don't represent a valid JSON records are skipped during parsing.

Ingest and map JSON formatted data

Ingestion of JSON formatted data requires you to specify the format using ingestion property. Ingestion of JSON data requires mapping, which maps a JSON source entry to its target column. When ingesting data, use the IngestionMapping property with its ingestionMappingReference (for a pre-defined mapping) ingestion property or its IngestionMappings property. This article will use the ingestionMappingReference ingestion property, which is pre-defined on the table used for ingestion. In the examples below, we'll start by ingesting JSON records as raw data to a single column table. Then we'll use the mapping to ingest each property to its mapped column.

Simple JSON example

The following example is a simple JSON, with a flat structure. The data has temperature and humidity information, collected by several devices. Each record is marked with an ID and timestamp.

{
    "timestamp": "2019-05-02 15:23:50.0369439",
    "deviceId": "2945c8aa-f13e-4c48-4473-b81440bb5ca2",
    "messageId": "7f316225-839a-4593-92b5-1812949279b3",
    "temperature": 31.0301639051317,
    "humidity": 62.0791099602725
}

Ingest raw JSON records

In this example, you ingest JSON records as raw data to a single column table. The data manipulation, using queries, and update policy is done after the data is ingested.

Use Kusto Query Language to ingest data in a raw JSON format.

  1. Sign in to https://dataexplorer.azure.cn.

  2. Select Add cluster.

  3. In the Add cluster dialog box, enter your cluster URL in the form https://<ClusterName>.<Region>.kusto.chinacloudapi.cn/, then select Add.

  4. Paste in the following command, and select Run to create the table.

    .create table RawEvents (Event: dynamic)
    

    This query creates a table with a single Event column of a dynamic data type.

  5. Create the JSON mapping.

    .create table RawEvents ingestion json mapping 'RawEventMapping' '[{"column":"Event","Properties":{"path":"$"}}]'
    

    This command creates a mapping, and maps the JSON root path $ to the Event column.

  6. Ingest data into the RawEvents table.

    .ingest into table RawEvents ('https://kustosamplefiles.blob.core.chinacloudapi.cn/jsonsamplefiles/simple.json') with '{"format":"json", "ingestionMappingReference":"RawEventMapping"}'
    

Ingest mapped JSON records

In this example, you ingest JSON records data. Each JSON property is mapped to a single column in the table.

  1. Create a new table, with a similar schema to the JSON input data. We'll use this table for all the following examples and ingest commands.

    .create table Events (Time: datetime, Device: string, MessageId: string, Temperature: double, Humidity: double)
    
  2. Create the JSON mapping.

    .create table Events ingestion json mapping 'FlatEventMapping' '[{"column":"Time","Properties":{"path":"$.timestamp"}},{"column":"Device","Properties":{"path":"$.deviceId"}},{"column":"MessageId","Properties":{"path":"$.messageId"}},{"column":"Temperature","Properties":{"path":"$.temperature"}},{"column":"Humidity","Properties":{"path":"$.humidity"}}]'
    

    In this mapping, as defined by the table schema, the timestamp entries will be ingested to the column Time as datetime data types.

  3. Ingest data into the Events table.

    .ingest into table Events ('https://kustosamplefiles.blob.core.chinacloudapi.cn/jsonsamplefiles/simple.json') with '{"format":"json", "ingestionMappingReference":"FlatEventMapping"}'
    

    The file 'simple.json' has a few line-separated JSON records. The format is json, and the mapping used in the ingest command is the FlatEventMapping you created.

Ingest multi-lined JSON records

In this example, you ingest multi-lined JSON records. Each JSON property is mapped to a single column in the table. The file 'multilined.json' has a few indented JSON records. The format multijson indicates to read records by the JSON structure.

Ingest data into the Events table.

.ingest into table Events ('https://kustosamplefiles.blob.core.chinacloudapi.cn/jsonsamplefiles/multilined.json') with '{"format":"multijson", "ingestionMappingReference":"FlatEventMapping"}'

Ingest JSON records containing arrays

Array data types are an ordered collection of values. Ingestion of a JSON array is done by an update policy. The JSON is ingested as-is to an intermediate table. An update policy runs a pre-defined function on the RawEvents table, reingesting the results to the target table. We'll ingest data with the following structure:

{
    "records":
    [
        {
            "timestamp": "2019-05-02 15:23:50.0000000",
            "deviceId": "ddbc1bf5-096f-42c0-a771-bc3dca77ac71",
            "messageId": "7f316225-839a-4593-92b5-1812949279b3",
            "temperature": 31.0301639051317,
            "humidity": 62.0791099602725
        },
        {
            "timestamp": "2019-05-02 15:23:51.0000000",
            "deviceId": "ddbc1bf5-096f-42c0-a771-bc3dca77ac71",
            "messageId": "57de2821-7581-40e4-861e-ea3bde102364",
            "temperature": 33.7529423105311,
            "humidity": 75.4787976739364
        }
    ]
}
  1. Create an update policy function that expands the collection of records so that each value in the collection receives a separate row, using the mv-expand operator. We'll use table RawEvents as a source table and Events as a target table.

    .create function EventRecordsExpand() {
        RawEvents
        | mv-expand records = Event.records
        | project
            Time = todatetime(records["timestamp"]),
            Device = tostring(records["deviceId"]),
            MessageId = tostring(records["messageId"]),
            Temperature = todouble(records["temperature"]),
            Humidity = todouble(records["humidity"])
    }
    
  2. The schema received by the function must match the schema of the target table. Use getschema operator to review the schema.

    EventRecordsExpand() | getschema
    
  3. Add the update policy to the target table. This policy will automatically run the query on any newly ingested data in the RawEvents intermediate table and ingest the results into the Events table. Define a zero-retention policy to avoid persisting the intermediate table.

    .alter table Events policy update @'[{"Source": "RawEvents", "Query": "EventRecordsExpand()", "IsEnabled": "True"}]'
    
  4. Ingest data into the RawEvents table.

    .ingest into table RawEvents ('https://kustosamplefiles.blob.core.chinacloudapi.cn/jsonsamplefiles/array.json') with '{"format":"multijson", "ingestionMappingReference":"RawEventMapping"}'
    
  5. Review data in the Events table.

    Events