用于机器学习的 Databricks Runtime 7.3 LTS (EoS)
注意
对此 Databricks Runtime 版本的支持已结束。 有关终止支持日期,请参阅终止支持历史记录。 有关所有受支持的 Databricks Runtime 版本,请参阅 Databricks Runtime 发行说明版本和兼容性。
Databricks 于 2020 年 9 月发布此版本。 2020 年 10 月为其声明了长期支持 (LTS)。
用于机器学习的 Databricks Runtime 7.3 LTS 基于 Databricks Runtime 7.3 LTS (EoS),为机器学习和数据科学提供了随时可用的环境。 Databricks Runtime ML 包含许多常用的机器学习库,包括 TensorFlow、PyTorch 和 XGBoost。 它还支持使用 Horovod 进行分布式深度学习训练。
有关详细信息,包括有关如何创建 Databricks Runtime ML 群集的说明,请参阅 Databricks 上的 AI 和机器学习。
有关从 Databricks Runtime 6.x 进行迁移的帮助信息,请参阅 Databricks Runtime 7.x 迁移指南 (EoS)。
新增功能和主要更改
适用于机器学习的 Databricks Runtime 7.3 LTS 是基于 Databricks Runtime 7.3 LTS 构建的。 有关 Databricks Runtime 7.3 LTS 中的新增功能(包括 Apache Spark MLlib 和 SparkR)的信息,请参阅 Databricks Runtime 7.3 LTS (EoS) 发行说明。
Databricks Runtime ML Python 环境的主要更改
辅助角色上的 Conda 激活
以前,使用 %conda
更新笔记本环境时,不会在辅助角色 Python 进程上激活新环境。 如果 PySpark UDF 函数调用第三方函数,而该函数使用在 Conda 环境中安装的资源,则会导致问题。 此限制不再存在。
此外还应该查看 Databricks Runtime 7.3 LTS (EoS) 中 Databricks Runtime Python 环境的主要更改。 如需查看已安装的 Python 包及其版本的完整列表,请参阅 Python 库。
升级的 Python 包
- mlflow 1.9.1 -> 1.11.0
- tensorflow 2.2.0 -> 2.3.0
- tensorboard 2.2.2 -> 2.3.0
- pytorch 1.5.1 -> 1.6.0
- torchvision 0.6.1 -> 0.7.0
- petastorm 0.9.2 -> 0.9.5
系统环境
适用于机器学习的 Databricks Runtime 7.3 LTS 中的系统环境与 Databricks Runtime 7.3 LTS 不同,如下所示:
- DBUtils:Databricks Runtime ML 不包含库实用工具 (dbutils.library)(旧版)。
可改用
%pip
和%conda
命令。 请参阅作用域为笔记本的 Python 库。 - 对于 GPU 群集,Databricks Runtime ML 包含以下 NVIDIA GPU 库:
- CUDA 10.1 Update 2
- cuDNN 7.6.5
- NCCL 2.7.3
- TensorRT 6.0.1
库
以下部分列出了适用于机器学习的 Databricks Runtime 7.3 LTS 中包含的库,这些库与 Databricks Runtime 7.3 LTS 中包含的库不同。
本节内容:
顶层库
适用于机器学习的 Databricks Runtime 7.3 LTS 包含以下顶层库:
- GraphFrames
- Horovod 和 HorovodRunner
- MLflow
- PyTorch
- spark-tensorflow-connector
- TensorFlow
- TensorBoard
Python 库
适用于机器学习的 Databricks Runtime 7.3 LTS 使用 Conda 进行 Python 包管理,并且包含许多常用的 ML 包。
除了在以下部分的 Conda 环境中指定的包外,适用于机器学习的 Databricks Runtime 7.3 LTS 还会安装以下包:
- hyperopt 0.2.4.db2
- sparkdl 2.1.0-db1
CPU 群集上的 Python 库
name: databricks-ml
channels:
- pytorch
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- absl-py=0.9.0=py37_0
- asn1crypto=1.3.0=py37_1
- astor=0.8.0=py37_0
- backcall=0.1.0=py37_0
- backports=1.0=py_2
- bcrypt=3.2.0=py37h7b6447c_0
- blas=1.0=mkl
- blinker=1.4=py37_0
- boto3=1.12.0=py_0
- botocore=1.15.0=py_0
- c-ares=1.15.0=h7b6447c_1001
- ca-certificates=2020.7.22=0
- cachetools=4.1.1=py_0
- certifi=2020.6.20=pyhd3eb1b0_3 # (updated from py37_0 in June 15, 2021 maintenance update)
- cffi=1.14.0=py37he30daa8_1 # (updated from py37h2e261b9_0 in June 15, 2021 maintenance update)
- chardet=3.0.4=py37_1003
- click=7.0=py37_0
- cloudpickle=1.3.0=py_0
- configparser=3.7.4=py37_0
- cpuonly=1.0=0
- cryptography=2.8=py37h1ba5d50_0
- cycler=0.10.0=py37_0
- cython=0.29.15=py37he6710b0_0
- decorator=4.4.1=py_0
- dill=0.3.1.1=py37_1
- docutils=0.15.2=py37_0
- entrypoints=0.3=py37_0
- flask=1.1.1=py_1
- freetype=2.9.1=h8a8886c_1
- future=0.18.2=py37_1
- gast=0.3.3=py_0
- gitdb=4.0.5=py_0
- gitpython=3.1.0=py_0
- google-auth=1.11.2=py_0
- google-auth-oauthlib=0.4.1=py_2
- google-pasta=0.2.0=py_0
- grpcio=1.27.2=py37hf8bcb03_0
- gunicorn=20.0.4=py37_0
- h5py=2.10.0=py37h7918eee_0
- hdf5=1.10.4=hb1b8bf9_0
- icu=58.2=he6710b0_3
- idna=2.8=py37_0
- intel-openmp=2020.0=166
- ipykernel=5.1.4=py37h39e3cac_0
- ipython=7.12.0=py37h5ca1d4c_0
- ipython_genutils=0.2.0=py37_0
- isodate=0.6.0=py_1
- itsdangerous=1.1.0=py37_0
- jedi=0.14.1=py37_0
- jinja2=2.11.1=py_0
- jmespath=0.10.0=py_0
- joblib=0.14.1=py_0
- jpeg=9b=h024ee3a_2
- jupyter_client=5.3.4=py37_0
- jupyter_core=4.6.1=py37_0
- kiwisolver=1.1.0=py37he6710b0_0
- krb5=1.17.1=h173b8e3_0 # (updated from 1.16.4 in June 15, 2021 maintenance update)
- ld_impl_linux-64=2.33.1=h53a641e_7
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.3=he6710b0_2 # (updated from 3.2.1 in June 15, 2021 maintenance update)
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran-ng=7.3.0=hdf63c60_0
- libpng=1.6.37=hbc83047_0
- libpq=12.2=h20c2e04_0 # (updated from 11.2 in June 15, 2021 maintenance update)
- libprotobuf=3.11.4=hd408876_0
- libsodium=1.0.16=h1bed415_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- libtiff=4.1.0=h2733197_0
- lightgbm=2.3.0=py37he6710b0_0
- lz4-c=1.8.1.2=h14c3975_0
- mako=1.1.2=py_0
- markdown=3.1.1=py37_0
- markupsafe=1.1.1=py37h14c3975_1
- matplotlib-base=3.1.3=py37hef1b27d_0
- mkl=2020.0=166
- mkl-service=2.3.0=py37he904b0f_0
- mkl_fft=1.0.15=py37ha843d7b_0
- mkl_random=1.1.0=py37hd6b4f25_0
- ncurses=6.2=he6710b0_1
- networkx=2.4=py_1
- ninja=1.10.0=py37hfd86e86_0
- nltk=3.4.5=py37_0
- numpy=1.18.1=py37h4f9e942_0
- numpy-base=1.18.1=py37hde5b4d6_1
- oauthlib=3.1.0=py_0
- olefile=0.46=py37_0
- openssl=1.1.1k=h27cfd23_0 # (updated from 1.1.1g in June 15, 2021 maintenance update)
- packaging=20.1=py_0
- pandas=1.0.1=py37h0573a6f_0
- paramiko=2.7.1=py_0
- parso=0.5.2=py_0
- patsy=0.5.1=py37_0
- pexpect=4.8.0=py37_1
- pickleshare=0.7.5=py37_1001
- pillow=7.0.0=py37hb39fc2d_0
- pip=20.0.2=py37_3
- plotly=4.9.0=py_0
- prompt_toolkit=3.0.3=py_0
- protobuf=3.11.4=py37he6710b0_0
- psutil=5.6.7=py37h7b6447c_0
- psycopg2=2.8.6=py37h3c74f83_1 # (updated from 2.8.4 in June 15, 2021 maintenance update)
- ptyprocess=0.6.0=py37_0
- pyasn1=0.4.8=py_0
- pyasn1-modules=0.2.7=py_0
- pycparser=2.19=py37_0
- pygments=2.5.2=py_0
- pyjwt=1.7.1=py37_0
- pynacl=1.3.0=py37h7b6447c_0
- pyodbc=4.0.30=py37he6710b0_0
- pyopenssl=19.1.0=py_1
- pyparsing=2.4.6=py_0
- pysocks=1.7.1=py37_1
- python=3.7.10=hdb3f193_0 # (updated from 3.7.6 in June 15, 2021 maintenance update)
- python-dateutil=2.8.1=py_0
- python-editor=1.0.4=py_0
- pytorch=1.6.0=py3.7_cpu_0
- pytz=2019.3=py_0
- pyzmq=18.1.1=py37he6710b0_0
- readline=8.1=h27cfd23_0 # (updated from 7.0 in June 15, 2021 maintenance update)
- requests=2.22.0=py37_1
- requests-oauthlib=1.3.0=py_0
- retrying=1.3.3=py37_2
- rsa=4.0=py_0
- s3transfer=0.3.3=py37_1
- scikit-learn=0.22.1=py37hd81dba3_0
- scipy=1.4.1=py37h0b6359f_0
- setuptools=45.2.0=py37_0
- simplejson=3.17.0=py37h7b6447c_0
- six=1.14.0=py37_0
- smmap=3.0.4=py_0
- sqlite=3.35.4=hdfb4753_0 # (updated from 3.31.1 in June 15, 2021 maintenance update)
- sqlparse=0.3.0=py_0
- statsmodels=0.11.0=py37h7b6447c_0
- tabulate=0.8.3=py37_0
- tk=8.6.10=hbc83047_0 # (updated from 8.6.8 in June 15, 2021 maintenance update)
- torchvision=0.7.0=py37_cpu
- tornado=6.0.3=py37h7b6447c_3
- tqdm=4.42.1=py_0
- traitlets=4.3.3=py37_0
- unixodbc=2.3.7=h14c3975_0
- urllib3=1.25.8=py37_0
- wcwidth=0.1.8=py_0
- websocket-client=0.56.0=py37_0
- werkzeug=1.0.0=py_0
- wheel=0.34.2=py37_0
- wrapt=1.11.2=py37h7b6447c_0
- xz=5.2.5=h7b6447c_0 # (updated from 5.2.4 in June 15, 2021 maintenance update)
- zeromq=4.3.1=he6710b0_3
- zlib=1.2.11=h7b6447c_3
- zstd=1.3.7=h0b5b093_0
- pip:
- astunparse==1.6.3
- azure-core==1.8.0
- azure-storage-blob==12.4.0
- databricks-cli==0.11.0
- diskcache==5.0.2
- docker==4.3.1
- gorilla==0.3.0
- horovod==0.19.5
- joblibspark==0.2.0
- keras-preprocessing==1.1.2
- koalas==1.2.0
- mleap==0.16.1
- mlflow==1.11.0
- msrest==0.6.18
- opt-einsum==3.3.0
- petastorm==0.9.5
- pyarrow==1.0.1
- pyyaml==5.3.1
- querystring-parser==1.2.4
- seaborn==0.10.0
- spark-tensorflow-distributor==0.1.0
- tensorboard==2.3.0
- tensorboard-plugin-wit==1.7.0
- tensorflow-cpu==2.3.0
- tensorflow-estimator==2.3.0
- termcolor==1.1.0
- xgboost==1.1.1
prefix: /databricks/conda/envs/databricks-ml
GPU 群集上的 Python 库
name: databricks-ml-gpu
channels:
- pytorch
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- absl-py=0.9.0=py37_0
- asn1crypto=1.3.0=py37_1
- astor=0.8.0=py37_0
- backcall=0.1.0=py37_0
- backports=1.0=py_2
- bcrypt=3.2.0=py37h7b6447c_0
- blas=1.0=mkl
- blinker=1.4=py37_0
- boto3=1.12.0=py_0
- botocore=1.15.0=py_0
- c-ares=1.15.0=h7b6447c_1001
- ca-certificates=2020.7.22=0
- cachetools=4.1.1=py_0
- certifi=2020.6.20=pyhd3eb1b0_3 # (updated from py37_0 in June 15, 2021 maintenance update)
- cffi=1.14.0=py37he30daa8_1 # (updated from py37h2e261b9_0 in June 15, 2021 maintenance update)
- chardet=3.0.4=py37_1003
- click=7.0=py37_0
- cloudpickle=1.3.0=py_0
- configparser=3.7.4=py37_0
- cryptography=2.8=py37h1ba5d50_0
- cudatoolkit=10.1.243=h6bb024c_0
- cycler=0.10.0=py37_0
- cython=0.29.15=py37he6710b0_0
- decorator=4.4.1=py_0
- dill=0.3.1.1=py37_1
- docutils=0.15.2=py37_0
- entrypoints=0.3=py37_0
- flask=1.1.1=py_1
- freetype=2.9.1=h8a8886c_1
- future=0.18.2=py37_1
- gast=0.3.3=py_0
- gitdb=4.0.5=py_0
- gitpython=3.1.0=py_0
- google-auth=1.11.2=py_0
- google-auth-oauthlib=0.4.1=py_2
- google-pasta=0.2.0=py_0
- grpcio=1.27.2=py37hf8bcb03_0
- gunicorn=20.0.4=py37_0
- h5py=2.10.0=py37h7918eee_0
- hdf5=1.10.4=hb1b8bf9_0
- icu=58.2=he6710b0_3
- idna=2.8=py37_0
- intel-openmp=2020.0=166
- ipykernel=5.1.4=py37h39e3cac_0
- ipython=7.12.0=py37h5ca1d4c_0
- ipython_genutils=0.2.0=py37_0
- isodate=0.6.0=py_1
- itsdangerous=1.1.0=py37_0
- jedi=0.14.1=py37_0
- jinja2=2.11.1=py_0
- jmespath=0.10.0=py_0
- joblib=0.14.1=py_0
- jpeg=9b=h024ee3a_2
- jupyter_client=5.3.4=py37_0
- jupyter_core=4.6.1=py37_0
- kiwisolver=1.1.0=py37he6710b0_0
- krb5=1.16.4=h173b8e3_0 # (updated from 1.16.4 in June 15, 2021 maintenance update)
- ld_impl_linux-64=2.33.1=h53a641e_7
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.3=he6710b0_2 # (updated from 3.2.1 in June 15, 2021 maintenance update)
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran-ng=7.3.0=hdf63c60_0
- libpng=1.6.37=hbc83047_0
- libpq=12.2=h20c2e04_0 # (updated from 11.2 in June 15, 2021 maintenance update)
- libprotobuf=3.11.4=hd408876_0
- libsodium=1.0.16=h1bed415_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- libtiff=4.1.0=h2733197_0
- lightgbm=2.3.0=py37he6710b0_0
- lz4-c=1.8.1.2=h14c3975_0
- mako=1.1.2=py_0
- markdown=3.1.1=py37_0
- markupsafe=1.1.1=py37h14c3975_1
- matplotlib-base=3.1.3=py37hef1b27d_0
- mkl=2020.0=166
- mkl-service=2.3.0=py37he904b0f_0
- mkl_fft=1.0.15=py37ha843d7b_0
- mkl_random=1.1.0=py37hd6b4f25_0
- ncurses=6.2=he6710b0_1
- networkx=2.4=py_1
- ninja=1.10.0=py37hfd86e86_0
- nltk=3.4.5=py37_0
- numpy=1.18.1=py37h4f9e942_0
- numpy-base=1.18.1=py37hde5b4d6_1
- oauthlib=3.1.0=py_0
- olefile=0.46=py37_0
- openssl=1.1.1k=h27cfd23_0 # (updated from 1.1.1g in June 15, 2021 maintenance update)
- packaging=20.1=py_0
- pandas=1.0.1=py37h0573a6f_0
- paramiko=2.7.1=py_0
- parso=0.5.2=py_0
- patsy=0.5.1=py37_0
- pexpect=4.8.0=py37_1
- pickleshare=0.7.5=py37_1001
- pillow=7.0.0=py37hb39fc2d_0
- pip=20.0.2=py37_3
- plotly=4.9.0=py_0
- prompt_toolkit=3.0.3=py_0
- protobuf=3.11.4=py37he6710b0_0
- psutil=5.6.7=py37h7b6447c_0
- psycopg2=2.8.6=py37h3c74f83_1 # (updated from 2.8.4 in June 15, 2021 maintenance update)
- ptyprocess=0.6.0=py37_0
- pyasn1=0.4.8=py_0
- pyasn1-modules=0.2.7=py_0
- pycparser=2.19=py37_0
- pygments=2.5.2=py_0
- pyjwt=1.7.1=py37_0
- pynacl=1.3.0=py37h7b6447c_0
- pyodbc=4.0.30=py37he6710b0_0
- pyopenssl=19.1.0=py_1
- pyparsing=2.4.6=py_0
- pysocks=1.7.1=py37_1
- python=3.7.10=hdb3f193_0 # (updated from 3.7.6 in June 15, 2021 maintenance update)
- python-dateutil=2.8.1=py_0
- python-editor=1.0.4=py_0
- pytorch=1.6.0=py3.7_cuda10.1.243_cudnn7.6.3_0
- pytz=2019.3=py_0
- pyzmq=18.1.1=py37he6710b0_0
- readline=8.1=h27cfd23_0 # (updated from 7.0 in June 15, 2021 maintenance update)
- requests=2.22.0=py37_1
- requests-oauthlib=1.3.0=py_0
- retrying=1.3.3=py37_2
- rsa=4.0=py_0
- s3transfer=0.3.3=py37_1
- scikit-learn=0.22.1=py37hd81dba3_0
- scipy=1.4.1=py37h0b6359f_0
- setuptools=45.2.0=py37_0
- simplejson=3.17.0=py37h7b6447c_0
- six=1.14.0=py37_0
- smmap=3.0.4=py_0
- sqlite=3.35.4=hdfb4753_0 # (updated from 3.31.1 in June 15, 2021 maintenance update)
- sqlparse=0.3.0=py_0
- statsmodels=0.11.0=py37h7b6447c_0
- tabulate=0.8.3=py37_0
- tk=8.6.10=hbc83047_0 # (updated from 8.6.8 in June 15, 2021 maintenance update)
- torchvision=0.7.0=py37_cu101
- tornado=6.0.3=py37h7b6447c_3
- tqdm=4.42.1=py_0
- traitlets=4.3.3=py37_0
- unixodbc=2.3.7=h14c3975_0
- urllib3=1.25.8=py37_0
- wcwidth=0.1.8=py_0
- websocket-client=0.56.0=py37_0
- werkzeug=1.0.0=py_0
- wheel=0.34.2=py37_0
- wrapt=1.11.2=py37h7b6447c_0
- xz=5.2.5=h7b6447c_0 # (updated from 5.2.4 in June 15, 2021 maintenance update)
- zeromq=4.3.1=he6710b0_3
- zlib=1.2.11=h7b6447c_3
- zstd=1.3.7=h0b5b093_0
- pip:
- astunparse==1.6.3
- azure-core==1.8.0
- azure-storage-blob==12.4.0
- databricks-cli==0.11.0
- diskcache==5.0.2
- docker==4.3.1
- gorilla==0.3.0
- horovod==0.19.5
- joblibspark==0.2.0
- keras-preprocessing==1.1.2
- koalas==1.2.0
- mleap==0.16.1
- mlflow==1.11.0
- msrest==0.6.18
- opt-einsum==3.3.0
- petastorm==0.9.5
- pyarrow==1.0.1
- pyyaml==5.3.1
- querystring-parser==1.2.4
- seaborn==0.10.0
- spark-tensorflow-distributor==0.1.0
- tensorboard==2.3.0
- tensorboard-plugin-wit==1.7.0
- tensorflow==2.3.0
- tensorflow-estimator==2.3.0
- termcolor==1.1.0
- xgboost==1.1.1
prefix: /databricks/conda/envs/databricks-ml-gpu
包含 Python 模块的 Spark 包
Spark 包 | Python 模块 | 版本 |
---|---|---|
graphframes | graphframes | 0.8.0-db2-spark3.0 |
R 库
R 库与 Databricks Runtime 7.3 LTS 中的 R 库完全相同。
Java 库和 Scala 库(Scala 2.12 群集)
除了 Databricks Runtime 7.3 LTS 中的 Java 库和 Scala 库之外,适用于机器学习的 Databricks Runtime 7.3 LTS 还包含以下 JAR:
组 ID | 项目 ID | 版本 |
---|---|---|
com.typesafe.akka | akka-actor_2.12 | 2.5.23 |
ml.combust.mleap | mleap-databricks-runtime_2.12 | 0.17.3-4882dc3 |
ml.dmlc | xgboost4j-spark_2.12 | 1.0.0 |
ml.dmlc | xgboost4j_2.12 | 1.0.0 |
org.mlflow | mlflow-client | 1.11.0 |
org.scala-lang.modules | scala-java8-compat_2.12 | 0.8.0 |
org.tensorflow | spark-tensorflow-connector_2.12 | 1.15.0 |