创建可使用排队引入功能获取数据的应用
本文内容
适用于:✅Azure 数据资源管理器
Kusto 能够通过批处理管理器优化和批处理引入的数据,从而处理大量数据引入。 批处理管理器会在引入的数据到达其目标表之前聚合数据,从而实现更高效的处理和改进的性能。 批处理通常以 1 GB 的原始数据、1000 个单独文件或默认 5 分钟的处理时间为单位完成。 可以在数据库和表级别更新批处理策略,这通常可缩短批处理时间并减少延迟。 有关引入批处理的详细信息,请参阅 IngestionBatching 策略 和 以编程方式更改表级别引入批处理策略 。
注意
批处理还会考虑各种因素,例如目标数据库和表、运行引入的用户以及与引入关联的各种属性(例如特殊标记)。
在本文中,学习如何:
先决条件
开始之前
注意
以下示例假定引入数据的列与目标表的架构之间存在简单匹配。
如果引入的数据与表架构不匹配,则必须使用引入映射使数据的列与表架构保持一致。
将文件排队以供引入和查询结果
在首选 IDE 或文本编辑器中,使用适合首选语言的约定创建名为“基本引入 ”的项目或文件。 将 stormevent.csv 文件放置在与应用相同的位置。
注意
在以下示例中,会使用两个客户端,一个用于查询群集,另一个用于将数据引入群集。 对于客户端库支持的语言,由于两个客户端共享同一个用户提示验证器,因此会生成单个用户提示,而不是每个客户端一个用户提示。
添加以下代码:
创建连接到群集并打印 MyStormEvents 表中的行数的客户端应用。 你将使用此计数作为基线,以便与每个引入方法后的行数进行比较。 将 <your_cluster_uri>
和 <your_database>
占位符分别替换为群集 URI 和数据库名称。
using Kusto.Data;
using Kusto.Data.Net.Client;
namespace BatchIngest {
class BatchIngest {
static void Main(string[] args) {
string clusterUri = "<your_cluster_uri>";
var clusterKcsb = new KustoConnectionStringBuilder(clusterUri)
.WithAadUserPromptAuthentication();
using (var kustoClient = KustoClientFactory.CreateCslQueryProvider(clusterKcsb)) {
string database = "<your_database>";
string table = "MyStormEvents";
string query = table + " | count";
using (var response = kustoClient.ExecuteQuery(database, query, null)) {
Console.WriteLine("\nNumber of rows in " + table + " BEFORE ingestion:");
PrintResultsAsValueList(response);
}
}
}
static void PrintResultsAsValueList(IDataReader response) {
string value;
while (response.Read()) {
for (int i = 0; i < response.FieldCount; i++) {
value = "";
if (response.GetDataTypeName(i) == "Int32")
value = response.GetInt32(i).ToString();
else if (response.GetDataTypeName(i) == "Int64")
value = response.GetInt64(i).ToString();
else if (response.GetDataTypeName(i) == "DateTime")
value = response.GetDateTime(i).ToString();
else if (response.GetDataTypeName(i) == "Object")
value = response.GetValue(i).ToString() ?? "{}";
else
value = response.GetString(i);
Console.WriteLine("\t{0} - {1}", response.GetName(i), value ?? "None");
}
}
}
}
from azure.identity import InteractiveBrowserCredential
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder
def main():
credentials = InteractiveBrowserCredential()
cluster_uri = "<your_cluster_uri>"
cluster_kcsb = KustoConnectionStringBuilder.with_azure_token_credential(cluster_uri, credentials)
with KustoClient(cluster_kcsb) as kusto_client:
database = "<your_database>"
table = "MyStormEvents"
query = table + " | count"
response = kusto_client.execute_query(database, query)
print("\nNumber of rows in " + table + " BEFORE ingestion:")
print_result_as_value_list(response)
def print_result_as_value_list(response):
cols = (col.column_name for col in response.primary_results[0].columns)
for row in response.primary_results[0]:
for col in cols:
print("\t", col, "-", row[col])
if __name__ == "__main__":
main()
import { Client as KustoClient, KustoConnectionStringBuilder } from "azure-kusto-data";
import { InteractiveBrowserCredential } from "@azure/identity";
async function main() {
const credentials = new InteractiveBrowserCredential();
const clusterUri = "<your_cluster_uri>";
const clusterKcsb = KustoConnectionStringBuilder.withAadUserPromptAuthentication(clusterUri, credentials);
const kustoClient = new Client(clusterKcsb);
const database = "<your_database>";
const table = "MyStormEvents";
const query = table + " | count";
let response = await kustoClient.execute(database, query);
console.log("\nNumber of rows in " + table + " BEFORE ingestion:");
printResultAsValueList(response);
}
function printResultsAsValueList(response) {
let cols = response.primaryResults[0].columns;
for (row of response.primaryResults[0].rows()) {
for (col of cols)
console.log("\t", col.name, "-", row.getValueAt(col.ordinal) != null ? row.getValueAt(col.ordinal).toString() : "None")
}
}
main();
注意
对于 Node.js 应用,请使用 InteractiveBrowserCredentialNodeOptions
而不是 InteractiveBrowserCredentialInBrowserOptions
。
注意
Java SDK 目前不支持两个客户端共享同一个用户提示验证器,从而导致每个客户端出现用户提示。
import com.microsoft.azure.kusto.data.Client;
import com.microsoft.azure.kusto.data.ClientFactory;
import com.microsoft.azure.kusto.data.KustoOperationResult;
import com.microsoft.azure.kusto.data.KustoResultSetTable;
import com.microsoft.azure.kusto.data.KustoResultColumn;
import com.microsoft.azure.kusto.data.auth.ConnectionStringBuilder;
public class BatchIngestion {
public static void main(String[] args) throws Exception {
String clusterUri = "<your_cluster_uri>";
ConnectionStringBuilder clusterKcsb = ConnectionStringBuilder.createWithUserPrompt(clusterUri);
try (Client kustoClient = ClientFactory.createClient(clusterKcsb)) {
String database = "<your_database>";
String table = "MyStormEvents";
String query = table + " | count";
KustoOperationResult results = kustoClient.execute(database, query);
KustoResultSetTable primaryResults = results.getPrimaryResults();
System.out.println("\nNumber of rows in " + table + " BEFORE ingestion:");
printResultAsValueList(primaryResults);
}
}
public static void printResultsAsValueList(KustoResultSetTable results) {
while (results.next()) {
KustoResultColumn[] columns = results.getColumns();
for (int i = 0; i < columns.length; i++) {
System.out.println("\t" + columns[i].getColumnName() + " - " + (results.getObject(i) == null ? "None" : results.getString(i)));
}
}
}
}
在可能的情况下,使用与群集 URI 共享相同的身份验证凭据创建定义数据引入 URI 的连接字符串生成器对象。 将 <your_ingestion_uri>
占位符替换为数据引入 URI。
using Kusto.Data.Common;
using Kusto.Ingest;
using System.Data;
string ingestUri = "<your_ingestion_uri>";
var ingestKcsb = new KustoConnectionStringBuilder(ingestUri)
.WithAadUserPromptAuthentication();
from azure.kusto.data import DataFormat
from azure.kusto.ingest import QueuedIngestClient, IngestionProperties
ingest_uri = "<your_ingestion_uri>"
ingest_kcsb = KustoConnectionStringBuilder.with_azure_token_credential(ingest_uri, credentials)
import { IngestClient, IngestionProperties, DataFormat } from "azure-kusto-ingest";
const ingestUri = "<your_ingestion_uri>";
const ingestKcsb = KustoConnectionStringBuilder.withTokenCredential(ingestUri, credentials);
import com.microsoft.azure.kusto.ingest.IngestClientFactory;
import com.microsoft.azure.kusto.ingest.IngestionProperties;
import com.microsoft.azure.kusto.ingest.IngestionProperties.DataFormat;
import com.microsoft.azure.kusto.ingest.QueuedIngestClient;
String ingestUri = "<your_ingestion_uri>";
ConnectionStringBuilder ingestKcsb = ConnectionStringBuilder.createWithUserPrompt(ingestUri);
通过将 stormevent.csv 文件添加到批处理队列来引入该文件。 你将使用以下对象和属性:
QueuedIngestClient 用于创建引入客户端。
IngestionProperties 用于设置引入属性。
DataFormat 用于将文件格式指定为 CSV 。
ignore_first_record 使用以下逻辑指定是否忽略 CSV 中的第一行和类似的文件类型:
True :忽略第一行。 使用此选项可从表格文本数据中删除标题行。
False :将第一行作为常规行引入。
注意
引入支持的最大文件大小为 6 GB。 建议引入 100 MB 到 1 GB 的文件。
using (var ingestClient = KustoIngestFactory.CreateQueuedIngestClient(ingestKcsb)) {
string filePath = Path.Combine(Directory.GetCurrentDirectory(), "stormevents.csv");
Console.WriteLine("\nIngesting data from file: \n\t " + filePath);
var ingestProps = new KustoIngestionProperties(database, table) {
Format = DataSourceFormat.csv,
AdditionalProperties = new Dictionary<string, string>() {{ "ignoreFirstRecord", "True" }}
};
_= ingestClient.IngestFromStorageAsync(filePath, ingestProps).Result;
}
import os
with QueuedIngestClient(ingest_kcsb) as ingest_client:
file_path = os.path.join(os.path.dirname(__file__), "stormevents.csv")
print("\nIngesting data from file: \n\t " + file_path)
ingest_props = IngestionProperties(database, table, DataFormat.CSV, ignore_first_record=True)
ingest_client.ingest_from_file(file_path, ingest_props)
import path from 'path';
const ingestClient = new IngestClient(ingestKcsb);
const filePath = path.join(__dirname, "stormevents.csv");
console.log("\nIngesting data from file: \n\t " + filePath);
const ingestProps = new IngestionProperties({
database: database,
table: table,
format: DataFormat.CSV,
ignoreFirstRecord: true
});
await ingestClient.ingestFromFile(filePath, ingestProps);
import com.microsoft.azure.kusto.ingest.source.FileSourceInfo;
try (QueuedIngestClient ingestClient = IngestClientFactory.createClient(ingestKcsb)) {
FileSourceInfo fileSourceInfo = new FileSourceInfo(System.getProperty("user.dir") + "\\stormevents.csv", 0);
System.out.println("\nIngesting data from file: \n\t " + fileSourceInfo.toString());
IngestionProperties ingestProps = new IngestionProperties(database, table);
ingestProps.setDataFormat(DataFormat.CSV);
ingestProps.setIgnoreFirstRecord(true);
ingestClient.ingestFromFile(fileSourceInfo, ingestProps);
}
引入文件后查询表中的行数,并显示引入的最后一行。
注意
为了让引入完成,请等待 30 秒,再查询表。 对于 C#,请等待 60 秒,以便有时间以异步方式将文件添加到引入队列。
Console.WriteLine("\nWaiting 60 seconds for ingestion to complete ...");
Thread.Sleep(TimeSpan.FromSeconds(60));
using (var response = kustoClient.ExecuteQuery(database, query, null)) {
Console.WriteLine("\nNumber of rows in " + table + " AFTER ingesting the file:");
PrintResultsAsValueList(response);
}
query = table + " | top 1 by ingestion_time()";
using (var response = kustoClient.ExecuteQuery(database, query, null)) {
Console.WriteLine("\nLast ingested row:");
PrintResultsAsValueList(response);
}
# Add this to the imports at the top of the file
import time
# Add this to the main method
print("\nWaiting 30 seconds for ingestion to complete ...")
time.sleep(30)
response = kusto_client.execute_query(database, query)
print("\nNumber of rows in " + table + " AFTER ingesting the file:")
print_result_as_value_list(response)
query = table + " | top 1 by ingestion_time()"
response = kusto_client.execute_query(database, query)
print("\nLast ingested row:")
print_result_as_value_list(response)
console.log("\nWaiting 30 seconds for ingestion to complete ...");
await sleep(30000);
response = await kustoClient.execute(database, query);
console.log("\nNumber of rows in " + table + " AFTER ingesting the file:");
printResultsAsValueList(response);
query = table + " | top 1 by ingestion_time()"
response = await kustoClient.execute(database, query);
console.log("\nLast ingested row:");
printResultsAsValueList(response);
// Add the sleep function after the main method
function sleep(time) {
return new Promise(resolve => setTimeout(resolve, time));
}
System.out.println("\nWaiting 30 seconds for ingestion to complete ...");
Thread.sleep(30000);
response = kustoClient.execute(database, query);
primaryResults = response.getPrimaryResults();
System.out.println("\nNumber of rows in " + table + " AFTER ingesting the file:");
printResultsAsValueList(primaryResults);
query = table + " | top 1 by ingestion_time()";
response = kustoClient.execute(database, query);
primaryResults = response.getPrimaryResults();
System.out.println("\nLast ingested row:");
printResultsAsValueList(primaryResults);
完整代码应如下所示:
using Kusto.Data;
using Kusto.Data.Net.Client;
using Kusto.Data.Common;
using Kusto.Ingest;
using System.Data;
namespace BatchIngest {
class BatchIngest {
static void Main(string[] args) {
string clusterUri = "<your_cluster_uri>";
var clusterKcsb = new KustoConnectionStringBuilder(clusterUri)
.WithAadUserPromptAuthentication();
string ingestUri = "<your_ingestion_uri>";
var ingestKcsb = new KustoConnectionStringBuilder(ingestUri)
.WithAadUserPromptAuthentication();
using (var kustoClient = KustoClientFactory.CreateCslQueryProvider(clusterKcsb))
using (var ingestClient = KustoIngestFactory.CreateQueuedIngestClient(ingestKcsb)) {
string database = "<your_database>";
string table = "MyStormEvents";
string filePath = Path.Combine(Directory.GetCurrentDirectory(), "stormevents.csv");
string query = table + " | count";
using (var response = kustoClient.ExecuteQuery(database, query, null)) {
Console.WriteLine("\nNumber of rows in " + table + " BEFORE ingestion:");
PrintResultsAsValueList(response);
}
Console.WriteLine("\nIngesting data from file: \n\t " + filePath);
var ingestProps = new KustoIngestionProperties(database, table) {
Format = DataSourceFormat.csv,
AdditionalProperties = new Dictionary<string, string>() {{ "ignoreFirstRecord", "True" }}
};
_= ingestClient.IngestFromStorageAsync(filePath, ingestProps).Result;
Console.WriteLine("\nWaiting 60 seconds for ingestion to complete ...");
Thread.Sleep(TimeSpan.FromSeconds(60));
using (var response = kustoClient.ExecuteQuery(database, query, null)) {
Console.WriteLine("\nNumber of rows in " + table + " AFTER ingesting the file:");
PrintResultsAsValueList(response);
}
query = table + " | top 1 by ingestion_time()";
using (var response = kustoClient.ExecuteQuery(database, query, null))
{
Console.WriteLine("\nLast ingested row:");
PrintResultsAsValueList(response);
}
}
}
static void PrintResultsAsValueList(IDataReader response) {
while (response.Read()) {
for (int i = 0; i < response.FieldCount; i++) {
if (response.GetDataTypeName(i) == "Int64")
Console.WriteLine("\t{0} - {1}", response.GetName(i), response.IsDBNull(i) ? "None" : response.GetInt64(i));
else
Console.WriteLine("\t{0} - {1}", response.GetName(i), response.IsDBNull(i) ? "None" : response.GetString(i));
}
}
}
}
}
import os
import time
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder, DataFormat
from azure.kusto.ingest import QueuedIngestClient, IngestionProperties
from azure.identity import InteractiveBrowserCredential
def main():
credentials = InteractiveBrowserCredential()
cluster_uri = "<your_cluster_uri>"
cluster_kcsb = KustoConnectionStringBuilder.with_azure_token_credential(cluster_uri, credentials)
ingest_uri = "<your_ingestion_uri>"
ingest_kcsb = KustoConnectionStringBuilder.with_azure_token_credential(ingest_uri, credentials)
with KustoClient(cluster_kcsb) as kusto_client:
with QueuedIngestClient(ingest_kcsb) as ingest_client:
database = "<your_database>"
table = "MyStormEvents"
file_path = os.path.join(os.path.dirname(__file__), "stormevents.csv")
query = table + " | count"
response = kusto_client.execute_query(database, query)
print("\nNumber of rows in " + table + " BEFORE ingestion:")
print_result_as_value_list(response)
print("\nIngesting data from file: \n\t " + file_path)
ingest_props = IngestionProperties(database, table, DataFormat.CSV, ignore_first_record=True)
ingest_client.ingest_from_file(file_path, ingest_props)
print("\nWaiting 30 seconds for ingestion to complete ...")
time.sleep(30)
response = kusto_client.execute_query(database, query)
print("\nNumber of rows in " + table + " AFTER ingesting the file:")
print_result_as_value_list(response)
query = table + " | top 1 by ingestion_time()"
response = kusto_client.execute_query(database, query)
print("\nLast ingested row:")
print_result_as_value_list(response)
def print_result_as_value_list(response):
cols = (col.column_name for col in response.primary_results[0].columns)
for row in response.primary_results[0]:
for col in cols:
print("\t", col, "-", row[col])
if __name__ == "__main__":
main()
import path from 'path';
import { Client, KustoConnectionStringBuilder } from "azure-kusto-data";
import { IngestClient, IngestionProperties, DataFormat } from "azure-kusto-ingest";
import { InteractiveBrowserCredential } from "@azure/identity";
async function main() {
const credentials = new InteractiveBrowserCredential();
const clusterUri = "<your_cluster_uri>";
const clusterKcsb = KustoConnectionStringBuilder.withTokenCredential(clusterUri, credentials);
const ingestUri = "<your_ingestion_uri>";
const ingestKcsb = KustoConnectionStringBuilder.withTokenCredential(ingestUri, credentials);
const kustoClient = new Client(clusterKcsb);
const ingestClient = new IngestClient(ingestKcsb);
const database = "<your_database>";
const table = "MyStormEventsJS";
const filePath = path.join(__dirname, "stormevents.csv");
const query = table + ` | count`;
let response = await kustoClient.execute(database, query);
printResultsAsValueList(response);
console.log("\nIngesting data from file: \n\t " + filePath);
const ingestProps = new IngestionProperties({
database: database,
table: table,
format: DataFormat.CSV,
ignoreFirstRecord: true
});
await ingestClient.ingestFromFile(filePath, ingestProps);
console.log("\nWaiting 30 seconds for ingestion to complete ...");
await sleep(30000);
response = await kustoClient.execute(database, queryCount);
console.log("\nNumber of rows in " + table + " AFTER ingesting the file:");
printResultsAsValueList(response);
query = table + " | top 1 by ingestion_time()"
response = await kustoClient.execute(database, query);
console.log("\nLast ingested row:");
printResultsAsValueList(response);
}
function sleep(time) {
return new Promise(resolve => setTimeout(resolve, time));
}
function printResultsAsValueList(response) {
let cols = response.primaryResults[0].columns;
for (row of response.primaryResults[0].rows()) {
for (col of cols)
console.log("\t", col.name, "-", row.getValueAt(col.ordinal) != null ? row.getValueAt(col.ordinal).toString() : "None")
}
}
main();
import com.microsoft.azure.kusto.data.Client;
import com.microsoft.azure.kusto.data.ClientFactory;
import com.microsoft.azure.kusto.data.KustoOperationResult;
import com.microsoft.azure.kusto.data.KustoResultSetTable;
import com.microsoft.azure.kusto.data.KustoResultColumn;
import com.microsoft.azure.kusto.data.auth.ConnectionStringBuilder;
import com.microsoft.azure.kusto.ingest.IngestClientFactory;
import com.microsoft.azure.kusto.ingest.IngestionProperties;
import com.microsoft.azure.kusto.ingest.IngestionProperties.DataFormat;
import com.microsoft.azure.kusto.ingest.QueuedIngestClient;
import com.microsoft.azure.kusto.ingest.source.FileSourceInfo;
public class BatchIngestion {
public static void main(String[] args) throws Exception {
String clusterUri = "<your_cluster_uri>";
ConnectionStringBuilder clusterKcsb = ConnectionStringBuilder.createWithUserPrompt(clusterUri);
String ingestUri = "<your_ingestion_uri>";
ConnectionStringBuilder ingestKcsb = ConnectionStringBuilder.createWithUserPrompt(ingestUri);
try (Client kustoClient = ClientFactory.createClient(clusterKcsb);
QueuedIngestClient ingestClient = IngestClientFactory.createClient(ingestKcsb)) {
String database = "<your_database>";
String table = "MyStormEvents";
FileSourceInfo fileSourceInfo = new FileSourceInfo(System.getProperty("user.dir") + "\\stormevents.csv", 0);
String query = table + " | count";
KustoOperationResult results = kustoClient.execute(database, query);
KustoResultSetTable primaryResults = results.getPrimaryResults();
System.out.println("\nNumber of rows in " + table + " BEFORE ingestion:");
printResultAsValueList(primaryResults);
System.out.println("\nIngesting data from file: \n\t " + fileSourceInfo.toString());
IngestionProperties ingestProps = new IngestionProperties(database, table);
ingestProps.setDataFormat(DataFormat.CSV);
ingestProps.setIgnoreFirstRecord(true);
ingestClient.ingestFromFile(fileSourceInfo, ingestProps);
System.out.println("\nWaiting 30 seconds for ingestion to complete ...");
Thread.sleep(30000);
response = kustoClient.execute(database, query);
primaryResults = response.getPrimaryResults();
System.out.println("\nNumber of rows in " + table + " AFTER ingesting the file:");
printResultsAsValueList(primaryResults);
query = table + " | top 1 by ingestion_time()";
response = kustoClient.execute(database, query);
primaryResults = response.getPrimaryResults();
System.out.println("\nLast ingested row:");
printResultsAsValueList(primaryResults);
}
}
public static void printResultsAsValueList(KustoResultSetTable results) {
while (results.next()) {
KustoResultColumn[] columns = results.getColumns();
for (int i = 0; i < columns.length; i++) {
System.out.println("\t" + columns[i].getColumnName() + " - " + (results.getObject(i) == null ? "None" : results.getString(i)));
}
}
}
}
运行应用
在命令行界面中,使用以下命令运行应用:
# Change directory to the folder that contains the management commands project
dotnet run .
python basic_ingestion.py
在 Node.js 环境中:
node basic-ingestion.js
在浏览器环境中,使用适当的命令来运行应用。 例如,对于 Vite-React,请使用以下命令:
npm run dev
mvn install exec:java -Dexec.mainClass="<groupId>.BatchIngestion"
你应会看到如下所示的结果:
Number of rows in MyStormEvents BEFORE ingestion:
Count - 0
Ingesting data from file:
C:\MyApp\stormevents.csv
Waiting 30 seconds for ingestion to complete
Number of rows in MyStormEvents AFTER ingesting the file:
Count - 1000
Last ingested row:
StartTime - 2018-01-26 00:00:00+00:00
EndTime - 2018-01-27 14:00:00+00:00
State - MEXICO
DamageProperty - 0
DamageCrops - 0
Source - Unknown
StormSummary - {}
将内存中数据排队以用于引入和查询结果
可以通过创建包含数据的流,然后排队进行引入,以从内存中引入数据。
例如,可以修改应用以替换 从文件引入 代码,如下所示:
将流描述符包添加到文件顶部的导入。
import io
from azure.kusto.ingest import StreamDescriptor
import { Readable } from "stream";
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import com.microsoft.azure.kusto.ingest.source.StreamSourceInfo;
添加包含要引入的数据的内存中字符串。
string singleLine = "2018-01-26 00:00:00.0000000,2018-01-27 14:00:00.0000000,MEXICO,0,0,Unknown,\"{}\"";
var stringStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(singleLine));
single_line = '2018-01-26 00:00:00.0000000,2018-01-27 14:00:00.0000000,MEXICO,0,0,Unknown,"{}"'
string_stream = io.StringIO(single_line)
const singleLine = '2018-01-26 00:00:00.0000000,2018-01-27 14:00:00.0000000,MEXICO,0,0,Unknown,"{}"';
const stringStream = new Readable();
stringStream.push(singleLine);
stringStream.push(null);
String singleLine = "2018-01-26 00:00:00.0000000,2018-01-27 14:00:00.0000000,MEXICO,0,0,Unknown,\"{}\"";
InputStream stream = new ByteArrayInputStream(StandardCharsets.UTF_8.encode(singleLine).array());
StreamSourceInfo streamSourceInfo = new StreamSourceInfo(stream);
将引入属性设置为不忽略第一条记录,因为内存中字符串没有标题行。
ingestProps.AdditionalProperties = new Dictionary<string, string>() {{ "ignoreFirstRecord", "False" }};
ingest_props = IngestionProperties(database, table, DataFormat.CSV, ignore_first_record=False)
ingestProps.ignoreFirstRecord = false;
ingestProps.setIgnoreFirstRecord(false);
通过将内存中数据添加到批处理队列来引入数据。 如果可能,请提供原始数据的大小。
_= ingestClient.IngestFromStreamAsync(stringStream, ingestProps, new StreamSourceOptions {Size = stringStream.Length}).Result;
stream_descriptor = StreamDescriptor(string_stream, is_compressed=False, size=len(single_line))
ingest_client.ingest_from_stream(stream_descriptor, ingest_props)
stringStream.size = singleLine.length;
await ingestClient.ingestFromStream(stringStream, ingestProps);
ingestClient.ingestFromStream(streamSourceInfo, ingestProps);
更新代码的概述应如下所示:
using Kusto.Data;
using Kusto.Data.Net.Client;
using Kusto.Data.Common;
using Kusto.Ingest;
using System.Data;
namespace BatchIngest {
class BatchIngest {
static void Main(string[] args) {
...
string singleLine = "2018-01-26 00:00:00.0000000,2018-01-27 14:00:00.0000000,MEXICO,0,0,Unknown,\"{}\"";
var stringStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(singleLine));
using (var kustoClient = KustoClientFactory.CreateCslQueryProvider(clusterKcsb))
using (var ingestClient = KustoIngestFactory.CreateQueuedIngestClient(ingestKcsb)) {
string database = "<your_database>";
string table = "MyStormEvents";
...
Console.WriteLine("\nIngesting data from memory:");
ingestProps.AdditionalProperties = new Dictionary<string, string>() {{ "ignoreFirstRecord", "False" }};
_= ingestClient.IngestFromStreamAsync(stringStream, ingestProps, new StreamSourceOptions {Size = stringStream.Length}).Result;
...
}
}
static void PrintResultsAsValueList(IDataReader response) {
...
}
}
}
import io
import time
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder, DataFormat
from azure.kusto.ingest import QueuedIngestClient, IngestionProperties, StreamDescriptor
from azure.identity import InteractiveBrowserCredential
def main():
...
single_line = '2018-01-26 00:00:00.0000000,2018-01-27 14:00:00.0000000,MEXICO,0,0,Unknown,"{}"'
string_stream = io.StringIO(single_line)
with KustoClient(cluster_kcsb) as kusto_client:
with QueuedIngestClient(ingest_kcsb) as ingest_client:
database = "<your_database>"
table = "MyStormEvents"
...
print("\nIngesting data from memory:")
ingest_props = IngestionProperties(database, table, DataFormat.CSV, ignore_first_record=False)
stream_descriptor = StreamDescriptor(string_stream, is_compressed=False, size=len(single_line))
ingest_client.ingest_from_stream(stream_descriptor, ingest_props)
...
def print_result_as_value_list(response):
...
if __name__ == "__main__":
main()
import path from 'path';
import { Client, KustoConnectionStringBuilder } from "azure-kusto-data";
import { IngestClient, IngestionProperties, DataFormat } from "azure-kusto-ingest";
import { InteractiveBrowserCredential } from "@azure/identity";
import { Readable } from "stream";
async function main() {
...
const singleLine = "2018-01-26 00:00:00.0000000,2018-01-27 14:00:00.0000000,MEXICO,0,0,Unknown,\"{}\"";
const stringStream = Readable.from(singleLine);
stringStream.push(singleLine);
stringStream.push(null);
const kustoClient = new Client(clusterKcsb);
const ingestClient = new IngestClient(ingestKcsb);
const database = "<your_database>"
const table = "MyStormEvents"
...
console.log("\nIngesting data from memory:");
stringStream.size = singleLine.length;
ingestProps.ignoreFirstRecord = false;
await ingestClient.ingestFromStream(stringStream, ingestProps);
...
}
function sleep(time) {
...
}
function printResultsAsValueList(response) {
...
}
main();
import com.microsoft.azure.kusto.data.Client;
import com.microsoft.azure.kusto.data.ClientFactory;
import com.microsoft.azure.kusto.data.KustoOperationResult;
import com.microsoft.azure.kusto.data.KustoResultSetTable;
import com.microsoft.azure.kusto.data.KustoResultColumn;
import com.microsoft.azure.kusto.data.auth.ConnectionStringBuilder;
import com.microsoft.azure.kusto.ingest.IngestClientFactory;
import com.microsoft.azure.kusto.ingest.IngestionProperties;
import com.microsoft.azure.kusto.ingest.IngestionProperties.DataFormat;
import com.microsoft.azure.kusto.ingest.QueuedIngestClient;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import com.microsoft.azure.kusto.ingest.source.StreamSourceInfo;
public class BatchIngestion {
public static void main(String[] args) throws Exception {
...
String singleLine = "2018-01-26 00:00:00.0000000,2018-01-27 14:00:00.0000000,MEXICO,0,0,Unknown,\"{}\"";
InputStream stream = new ByteArrayInputStream(StandardCharsets.UTF_8.encode(singleLine).array());
StreamSourceInfo streamSourceInfo = new StreamSourceInfo(stream);
try (Client kustoClient = ClientFactory.createClient(clusterKcsb);
QueuedIngestClient ingestClient = IngestClientFactory.createClient(ingestKcsb)) {
String database = "<your_database>";
String table = "MyStormEvents";
...
System.out.println("\nIngesting data from memory:");
ingestProps.setIgnoreFirstRecord(false);
ingestClient.ingestFromStream(streamSourceInfo, ingestProps);
...
}
}
public static void printResultsAsValueList(KustoResultSetTable results) {
...
}
}
运行应用时,应看到类似于以下所示的结果。 请注意,引入后,表中的行数增加了 1。
Number of rows in MyStormEvents BEFORE ingestion:
Count - 1000
Ingesting data from memory:
Waiting 30 seconds for ingestion to complete ...
Number of rows in MyStormEvents AFTER ingesting from memory:
Count - 1001
Last ingested row:
StartTime - 2018-01-26 00:00:00+00:00
EndTime - 2018-01-27 14:00:00+00:00
State - MEXICO
DamageProperty - 0
DamageCrops - 0
Source - Unknown
StormSummary - {}
将 blob 排队以用于引入和查询结果
可以从 Azure 存储 Blob、Azure Data Lake 文件和 Amazon S3 文件引入数据。
例如,可以通过将 从内存引入 代码替换为以下内容,修改应用:
首先将 stormevent.csv 文件上传到存储帐户,并生成具有读取权限的 URI,例如,使用 Azure Blob 的 SAS 令牌 。
将 Blob 描述符包添加到文件顶部的导入。
from azure.kusto.ingest import BlobDescriptor
No additional packages are required.
import com.microsoft.azure.kusto.ingest.source.BlobSourceInfo;
使用 Blob URI 创建 Blob 描述符,设置引入属性,然后从 Blob 引入数据。 将 <your_blob_uri>
占位符替换为 blob URI。
string blobUri = "<your_blob_uri>";
ingestProps.AdditionalProperties = new Dictionary<string, string>() { { "ignoreFirstRecord", "True" } };
_= ingestClient.IngestFromStorageAsync(blobUri, ingestProps).Result;
blob_uri = "<your_blob_uri>"
ingest_props = IngestionProperties(database, table, DataFormat.CSV, ignore_first_record=True)
blob_descriptor = BlobDescriptor(blob_uri)
ingest_client.ingest_from_blob(blob_descriptor, ingest_props)
const blobUri = "<your_blob_uri>";
ingestProps.ignoreFirstRecord = true;
await ingestClient.ingestFromBlob(blobUri, ingestProps);
String blobUri = "<your_blob_uri>";
ingestProps.setIgnoreFirstRecord(true);
BlobSourceInfo blobSourceInfo = new BlobSourceInfo(blobUri, 100);
ingestClient.ingestFromBlob(blobSourceInfo, ingestProps);
更新代码的概述应如下所示:
using Kusto.Data;
using Kusto.Data.Net.Client;
using Kusto.Data.Common;
using Kusto.Ingest;
using System.Data;
namespace BatchIngest {
class BatchIngest {
static void Main(string[] args) {
...
string blobUri = "<your_blob_uri>";
using (var kustoClient = KustoClientFactory.CreateCslQueryProvider(clusterKcsb))
using (var ingestClient = KustoIngestFactory.CreateQueuedIngestClient(ingestKcsb)) {
string database = "<your_database>";
string table = "MyStormEvents";
...
Console.WriteLine("\nIngesting data from memory:");
ingestProps.AdditionalProperties = new Dictionary<string, string>() { { "ignoreFirstRecord", "True" } };
_=_ ingestClient.IngestFromStorageAsync(blobUri, ingestProps).Result;
...
}
}
static void PrintResultsAsValueList(IDataReader response) {
...
}
}
}
import time
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder, DataFormat
from azure.kusto.ingest import QueuedIngestClient, IngestionProperties, BlobDescriptor
from azure.identity import InteractiveBrowserCredential
def main():
...
blob_uri = "<your_blob_uri>"
with KustoClient(cluster_kcsb) as kusto_client:
with QueuedIngestClient(ingest_kcsb) as ingest_client:
database = "<your_database>"
table = "MyStormEvents"
...
print("\nIngesting data from a blob:")
blob_descriptor = BlobDescriptor(blob_uri)
ingest_props = IngestionProperties(database, table, DataFormat.CSV, ignore_first_record=True)
ingest_client.ingest_from_blob(blob_descriptor, ingest_props)
...
def print_result_as_value_list(response):
...
if __name__ == "__main__":
main()
import path from 'path';
import { Client, KustoConnectionStringBuilder } from "azure-kusto-data";
import { IngestClient, IngestionProperties, DataFormat } from "azure-kusto-ingest";
import { InteractiveBrowserCredential } from "@azure/identity";
import { Readable } from "stream";
async function main() {
...
const blobUri = "<your_blob_uri>";
const kustoClient = new Client(clusterKcsb);
const ingestClient = new IngestClient(ingestKcsb);
const database = "<your_database>"
const table = "MyStormEvents"
...
console.log("\nIngesting data from a blob:");
ingestProps.ignoreFirstRecord = true;
await ingestClient.ingestFromBlob(blobUri, ingestProps);
...
}
function sleep(time) {
...
}
function printResultsAsValueList(response) {
...
}
main();
import com.microsoft.azure.kusto.data.Client;
import com.microsoft.azure.kusto.data.ClientFactory;
import com.microsoft.azure.kusto.data.KustoOperationResult;
import com.microsoft.azure.kusto.data.KustoResultSetTable;
import com.microsoft.azure.kusto.data.KustoResultColumn;
import com.microsoft.azure.kusto.data.auth.ConnectionStringBuilder;
import com.microsoft.azure.kusto.ingest.IngestClientFactory;
import com.microsoft.azure.kusto.ingest.IngestionProperties;
import com.microsoft.azure.kusto.ingest.IngestionProperties.DataFormat;
import com.microsoft.azure.kusto.ingest.QueuedIngestClient;
import com.microsoft.azure.kusto.ingest.source.BlobSourceInfo;
public class BatchIngestion {
public static void main(String[] args) throws Exception {
...
String blobUri = "<your_blob_uri>";
try (Client kustoClient = ClientFactory.createClient(clusterKcsb);
QueuedIngestClient ingestClient = IngestClientFactory.createClient(ingestKcsb)) {
String database = "<your_database>";
String table = "MyStormEvents";
...
System.out.println("\nIngesting data from a blob:");
ingestProps.setIgnoreFirstRecord(true);
BlobSourceInfo blobSourceInfo = new BlobSourceInfo(blobUri, 100);
ingestClient.ingestFromBlob(blobSourceInfo, ingestProps);
...
}
}
public static void printResultsAsValueList(KustoResultSetTable results) {
...
}
}
运行应用时,应看到类似于以下所示的结果。 请注意,引入后,表中的行数增加了 1000。
Number of rows in MyStormEvents BEFORE ingestion:
Count - 1001
Ingesting data from a blob:
Waiting 30 seconds for ingestion to complete ...
Number of rows in MyStormEvents AFTER ingesting from a blob:
Count - 2001
Last ingested row:
StartTime - 2018-01-26 00:00:00+00:00
EndTime - 2018-01-27 14:00:00+00:00
State - MEXICO
DamageProperty - 0
DamageCrops - 0
Source - Unknown
StormSummary - {}
下一步