Model training examples

Article
2024-09-09

This section includes examples showing how to train machine learning models on Azure Databricks using many popular open-source libraries.

You can also use AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.

Machine learning examples

Package	Notebook(s)	Features
scikit-learn	Machine learning tutorial	Unity Catalog, classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow
scikit-learn	End-to-end example	Unity Catalog, classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost
MLlib	MLlib examples	Binary classification, decision trees, GBT regression, Structured Streaming, custom transformer
xgboost	XGBoost examples	Python, PySpark, and Scala, single node workloads and distributed training

Hyperparameter tuning examples

For general information about hyperparameter tuning in Azure Databricks, see Hyperparameter tuning.

Package	Notebook	Features
Optuna	Get started with Optuna	Optuna, distributed Optuna, scikit-learn, MLflow
Hyperopt	Distributed hyperopt	Distributed hyperopt, scikit-learn, MLflow
Hyperopt	Compare models	Use distributed hyperopt to search hyperparameter space for different model types simultaneously
Hyperopt	Distributed training algorithms and hyperopt	Hyperopt, MLlib
Hyperopt	Hyperopt best practices	Best practices for datasets of different sizes

Model training examples

Machine learning examples

Hyperparameter tuning examples

Additional resources