Streaming on Azure Databricks
You can use Azure Databricks for near real-time data ingestion, processing, machine learning, and AI for streaming data.
Azure Databricks offers numerous optimizations for streaming and incremental processing, including the following:
- Delta Live Tables provides declarative syntax for incremental processing. See What is Delta Live Tables?.
- Auto Loader simplifies incremental ingestion from cloud object storage. See What is Auto Loader?.
- Unity Catalog adds data governance to streaming workloads. See Using Unity Catalog with Structured Streaming.
Delta Lake provides the storage layer for these integrations. See Delta table streaming reads and writes.
-
Learn the basics of near real-time and incremental processing with Structured Streaming on Azure Databricks.
-
Managing the intermediate state information of stateful Structured Streaming queries can help prevent unexpected latency and production problems.
-
This article contains recommendations to configure production incremental processing workloads with Structured Streaming on Azure Databricks to fulfill latency and cost requirements for real-time or batch applications.
-
Learn how to monitor Structured Streaming applications on Azure Databricks.
-
Learn how to leverage Unity Catalog in conjunction with Structured Streaming on Azure Databricks.
-
Learn how to use Delta Lake tables as streaming sources and sinks.
-
See examples of using Spark Structured Streaming with Cassandra, Azure Synapse Analytics, Python notebooks, and Scala notebooks in Azure Databricks.
Azure Databricks has specific features for working with semi-structured data fields contained in Avro, protocol buffers, and JSON data payloads. To learn more, see:
Additional resources
Apache Spark provides a Structured Streaming Programming Guide that has more information about Structured Streaming.
For reference information about Structured Streaming, Databricks recommends the following Apache Spark API references: