Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This section includes examples showing how to train machine learning models on Azure Databricks using many popular open-source libraries.
You can also use AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.
Machine learning examples
Package | Notebook(s) | Features |
---|---|---|
scikit-learn | Machine learning tutorial | Unity Catalog, classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow |
scikit-learn | End-to-end example | Unity Catalog, classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost |
MLlib | MLlib examples | Binary classification, decision trees, GBT regression, Structured Streaming, custom transformer |
xgboost | XGBoost examples | Python, PySpark, and Scala, single node workloads and distributed training |
Hyperparameter tuning examples
For general information about hyperparameter tuning in Azure Databricks, see Hyperparameter tuning.
Package | Notebook | Features |
---|---|---|
Optuna | Get started with Optuna | Optuna, distributed Optuna, scikit-learn, MLflow |
Hyperopt | Distributed hyperopt | Distributed hyperopt, scikit-learn, MLflow |
Hyperopt | Compare models | Use distributed hyperopt to search hyperparameter space for different model types simultaneously |
Hyperopt | Distributed training algorithms and hyperopt | Hyperopt, MLlib |
Hyperopt | Hyperopt best practices | Best practices for datasets of different sizes |