Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure Databricks provides dedicated primitives for manipulating arrays in Apache Spark SQL; these make working with arrays much easier and more concise and do away with the large amounts of boilerplate code typically required. The primitives revolve around two functional programming constructs: higher-order functions and anonymous (lambda) functions. These work together to allow you to define functions that manipulate arrays in SQL. A higher-order function takes an array, implements how the array is processed, and what the result of the computation will be. It delegates to a lambda function how to process each item in the array.
Introduction to higher-order functions notebook
Higher-order functions tutorial Python notebook
Apache Spark built-in functions
Apache Spark has built-in functions for manipulating complex types (for example, array types), including higher-order functions.
The following notebook illustrates Apache Spark built-in functions.