Ubuntu 18.04 基于NVIDIA 2080安装TensorFlow-GPU 1.13.1（ubuntu18.04安装2080ti显卡驱动）

25-04-28 1

本篇文章给大家谈谈Ubuntu18.04基于NVIDIA2080安装TensorFlow-GPU1.13.1，以及ubuntu18.04安装2080ti显卡驱动的知识点，同时本文还将给你拓展Choic

本篇文章给大家谈谈Ubuntu 18.04 基于NVIDIA 2080安装TensorFlow-GPU 1.13.1，以及ubuntu18.04安装2080ti显卡驱动的知识点，同时本文还将给你拓展Choiche GPU tensorflow-directml 或 multi-gpu、GCP 中 NVIDIA P100 GPU 和 Committed NVIDIA P100 GPU 限制名称的区别、GPU Mounter - 支持 GPU 热挂载的 Kubernetes 插件、GPU 内存的分级综述（gpu memory hierarchy）等相关知识，希望对各位有所帮助，不要忘了收藏本站喔。

本文目录一览：

Ubuntu 18.04 基于NVIDIA 2080安装TensorFlow-GPU 1.13.1（ubuntu18.04安装2080ti显卡驱动）
Choiche GPU tensorflow-directml 或 multi-gpu
GCP 中 NVIDIA P100 GPU 和 Committed NVIDIA P100 GPU 限制名称的区别
GPU Mounter - 支持 GPU 热挂载的 Kubernetes 插件
GPU 内存的分级综述（gpu memory hierarchy）

Ubuntu 18.04 基于NVIDIA 2080安装TensorFlow-GPU 1.13.1（ubuntu18.04安装2080ti显卡驱动）

Ubuntu 18.04 基于NVIDIA 2080安装tensorflow-gpu 1.13.1

官方文档

注意版本一一对应
https://tensorflow.google.cn/install/source

其他请参考

Ubuntu16.04 基于NVIDIA 1080Ti安装TensorFlow-GPU

安装环境

系统：Ubuntu 18.04.02 desktop
显卡：NVIDIA GeForce GTX 2080
显卡驱动：NVIDIA-Linux-x86_64-410.72.run
CUDA：cuda_10.0.130_410.48_linux
cuDNN：
- libcudnn7_7.5.0.56-1+cuda10.0_amd64
- libcudnn7-dev_7.5.0.56-1+cuda10.0_amd64
- libcudnn7-doc_7.5.0.56-1+cuda10.0_amd64
tensorflow-gpu：1.13.1
安装版本选择时不要安装最新版,往低降一两个稳定版，注意相应软件之间的兼容性；

查看NVIDIA显卡驱动

[email protected]:~$ nvidia-smi
Mon Mar 25 23:16:33 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:03:00.0 Off |                  N/A |
| 24%   40C    P0     1W / 225W |      0MiB /  7949MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[email protected]:~$

安装CUDA

[email protected]:/data/tools/GeForce-RTX-2080$ sudo sh  cuda_10.0.130_410.48_linux.run
-----------------
Do you accept the prevIoUsly read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: y

Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: n

Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration,such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: n

Install the CUDA 10.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-10.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 10.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /home/netc ]:

Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...
Installing the CUDA Samples in /home/netc ...
copying samples to /home/netc/NVIDIA_CUDA-10.0_Samples Now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-10.0
Samples:  Installed in /home/netc

Please make sure that
 -   PATH includes /usr/local/cuda-10.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64,or,add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit,run the uninstall script in /usr/local/cuda-10.0/bin
To uninstall the NVIDIA Driver,run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_13131.log
[email protected]:/data/tools/GeForce-RTX-2080$

查看CUDA版本

[email protected]:/data/tools/GeForce-RTX-2080$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools,release 10.0,V10.0.130
[email protected]:/data/tools/GeForce-RTX-2080$

更新pip3

[email protected]:~/cudnn_samples_v7/mnistCUDNN$ sudo pip3 install --upgrade pip
The directory ‘/home/netc/.cache/pip/http‘ or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory.If executing pip with sudo,you may want sudo‘s -H flag.
The directory ‘/home/netc/.cache/pip‘ or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo,you may want sudo‘s -H flag.
Collecting pip
  Downloading http://mirrors.aliyun.com/pypi/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3-py2.py3-none-any.whl (1.4MB)
    100% |████████████████████████████████| 1.4MB 4.0MB/s
Installing collected packages: pip
  Found existing installation: pip 9.0.1
    Not uninstalling pip at /usr/lib/python3/dist-packages,outside environment /usr
Successfully installed pip-19.0.3

安装tensorflow-gpu

[email protected]:~/cudnn_samples_v7/mnistCUDNN$ sudo pip3 install --index-url https://mirrors.aliyun.com/pypi/simple tensorflow-gpu
The directory ‘/home/netc/.cache/pip/http‘ or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory.If executing pip with sudo,you may want sudo‘s -H flag.
Looking in indexes: https://mirrors.aliyun.com/pypi/simple
Collecting tensorflow-gpu
  Downloading https://mirrors.aliyun.com/pypi/packages/7b/b1/0ad4ae02e17ddd62109cd54c291e311c4b5fd09b4d0678d3d6ce4159b0f0/tensorflow_gpu-1.13.1-cp36-cp36m-manylinux1_x86_64.whl (345.2MB)
    100% |████████████████████████████████| 345.2MB 4.4MB/s
Collecting absl-py>=0.1.6 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/da/3f/9b0355080b81b15ba6a9ffcf1f5ea39e307a2778b2f2dc8694724e8abd5b/absl-py-0.7.1.tar.gz (99kB)
    100% |████████████████████████████████| 102kB 4.7MB/s
Collecting astor>=0.6.0 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/35/6b/11530768cac581a12952a2aad00e1526b89d242d0b9f59534ef6e6a1752f/astor-0.7.1-py2.py3-none-any.whl
Collecting numpy>=1.13.3 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/35/d5/4f8410ac303e690144f0a0603c4b8fd3b986feb2749c435f7cdbb288f17e/numpy-1.16.2-cp36-cp36m-manylinux1_x86_64.whl (17.3MB)
    100% |████████████████████████████████| 17.3MB 4.3MB/s
Collecting keras-applications>=1.0.6 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/90/85/64c82949765cfb246bbdaf5aca2d55f400f792655927a017710a78445def/Keras_Applications-1.0.7-py2.py3-none-any.whl (51kB)
    100% |████████████████████████████████| 61kB 7.2MB/s
Collecting gast>=0.2.0 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Collecting tensorboard<1.14.0,>=1.13.0 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/0f/39/bdd75b08a6fba41f098b6cb091b9e8c7a80e1b4d679a581a0ccd17b10373/tensorboard-1.13.1-py3-none-any.whl (3.2MB)
    100% |████████████████████████████████| 3.2MB 4.2MB/s
Collecting termcolor>=1.1.0 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/8a/48/a76be51647d0eb9f10e2a4511bf3ffb8cc1e6b14e9e4fab46173aa79f981/termcolor-1.1.0.tar.gz
Requirement already satisfied: wheel>=0.26 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (0.30.0)
Collecting grpcio>=1.8.6 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/f4/dc/5503d89e530988eb7a1aed337dcb456ef8150f7c06132233bd9e41ec0215/grpcio-1.19.0-cp36-cp36m-manylinux1_x86_64.whl (10.8MB)
    100% |████████████████████████████████| 10.8MB 4.1MB/s
Collecting tensorflow-estimator<1.14.0rc0,>=1.13.0 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/bb/48/13f49fc3fa0fdf916aa1419013bb8f2ad09674c275b4046d5ee669a46873/tensorflow_estimator-1.13.0-py2.py3-none-any.whl (367kB)
    100% |████████████████████████████████| 368kB 9.9MB/s
Collecting protobuf>=3.6.1 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/c5/60/ca38e967360212ddbb004141a70f5f6d47296e1fba37964d8ac6cb631921/protobuf-3.7.0-cp36-cp36m-manylinux1_x86_64.whl (1.2MB)
    100% |████████████████████████████████| 1.2MB 3.9MB/s
Requirement already satisfied: six>=1.10.0 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.11.0)
Collecting keras-preprocessing>=1.0.5 (from tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/c0/bf/0315ef6a9fd3fc2346e85b0ff1f5f83ca17073f2c31ac719ab2e4da0d4a3/Keras_Preprocessing-1.0.9-py2.py3-none-any.whl (59kB)
    100% |████████████████████████████████| 61kB 4.8MB/s
Collecting h5py (from keras-applications>=1.0.6->tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/30/99/d7d4fbf2d02bb30fb76179911a250074b55b852d34e98dd452a9f394ac06/h5py-2.9.0-cp36-cp36m-manylinux1_x86_64.whl (2.8MB)
    100% |████████████████████████████████| 2.8MB 4.1MB/s
Collecting markdown>=2.6.8 (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/7a/6b/5600647404ba15545ec37d2f7f58844d690baf2f81f3a60b862e48f29287/Markdown-3.0.1-py2.py3-none-any.whl (89kB)
    100% |████████████████████████████████| 92kB 4.4MB/s
Collecting werkzeug>=0.11.15 (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/24/4d/2fc4e872fbaaf44cc3fd5a9cd42fda7e57c031f08e28c9f35689e8b43198/Werkzeug-0.15.1-py2.py3-none-any.whl (328kB)
    100% |████████████████████████████████| 337kB 4.4MB/s
Collecting mock>=2.0.0 (from tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/e6/35/f187bdf23be87092bd0f1200d43d23076cee4d0dec109f195173fd3ebc79/mock-2.0.0-py2.py3-none-any.whl (56kB)
    100% |████████████████████████████████| 61kB 4.9MB/s
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from protobuf>=3.6.1->tensorflow-gpu) (39.0.1)
Collecting pbr>=0.11 (from mock>=2.0.0->tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu)
  Downloading https://mirrors.aliyun.com/pypi/packages/14/09/12fe9a14237a6b7e0ba3a8d6fcf254bf4b10ec56a0185f73d651145e9222/pbr-5.1.3-py2.py3-none-any.whl (107kB)
    100% |████████████████████████████████| 112kB 4.4MB/s
Installing collected packages: absl-py,astor,numpy,h5py,keras-applications,gast,protobuf,markdown,werkzeug,grpcio,tensorboard,termcolor,pbr,mock,tensorflow-estimator,keras-preprocessing,tensorflow-gpu
  Running setup.py install for absl-py ... done
  Running setup.py install for gast ... done
  Found existing installation: protobuf 3.0.0
    Uninstalling protobuf-3.0.0:
      Successfully uninstalled protobuf-3.0.0
  Running setup.py install for termcolor ... done
Successfully installed absl-py-0.7.1 astor-0.7.1 gast-0.2.2 grpcio-1.19.0 h5py-2.9.0 keras-applications-1.0.7 keras-preprocessing-1.0.9 markdown-3.0.1 mock-2.0.0 numpy-1.16.2 pbr-5.1.3 protobuf-3.7.0 tensorboard-1.13.1 tensorflow-estimator-1.13.0 tensorflow-gpu-1.13.1 termcolor-1.1.0 werkzeug-0.15.1
[email protected]:~/cudnn_samples_v7/mnistCUDNN$ python3
Python 3.6.7 (default,Oct 22 2018,11:32:17)
[GCC 8.2.0] on linux
Type "help","copyright","credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant(‘Hello,TensorFlow!‘)
>>> sess = tf.Session()
2019-03-25 23:32:23.967770: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1),but there must be at least one NUMA node,so returning NUMA node zero
2019-03-25 23:32:23.968691: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2ce8960 executing computations on platform CUDA. Devices:
2019-03-25 23:32:23.968749: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2080,Compute Capability 7.5
2019-03-25 23:32:23.992261: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] cpu Frequency: 2200065000 Hz
2019-03-25 23:32:23.994027: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x33acc10 executing computations on platform Host. Devices:
2019-03-25 23:32:23.994073: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>,<undefined>
2019-03-25 23:32:23.994507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.8
pciBusID: 0000:03:00.0
totalMemory: 7.76GiB freeMemory: 7.62GiB
2019-03-25 23:32:23.994558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-25 23:32:23.995840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-25 23:32:23.995878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-03-25 23:32:23.995900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-03-25 23:32:23.996310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7413 MB memory) -> physical GPU (device: 0,name: GeForce RTX 2080,pci bus id: 0000:03:00.0,compute capability: 7.5)
>>> print(sess.run(hello))
b‘Hello,TensorFlow!‘

报错总结：

运行import tensorflow时报错：

ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

原因：
tensorflow版本与CUDA的版本不对应，tensorflow需要的cuda为10.0；
对应关系：https://tensorflow.google.cn/install/source

查看cuda版本

cat /usr/local/cuda/version.txt

查看cudnn版本

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Choiche GPU tensorflow-directml 或 multi-gpu

如何解决Choiche GPU tensorflow-directml 或 multi-gpu

我正在 Windows PC 上使用 tensorflow 训练模型，但训练量很低，所以我正在尝试将 tensorflow 配置为使用 GPU。我安装了 tensorflow-directml（在 python 3.6 的 conda 环境中），因为我的 GPU 是 AMD Radeon GPU。用这个简单的代码

import tensorflow as tf
tf.test.is_gpu_available()

我收到此输出

2021-05-14 11:02:30.113880：我 tensorflow/core/platform/cpu_feature_guard.cc:142] 你的 cpu 支持此 TensorFlow 二进制文件未编译为使用的说明：AVX2 2021-05-14 11:02:30.121580：我张量流/stream_executor/platform/default/dso_loader.cc:99] 成功打开动态库 C:\\Users\\v.rocca\\anaconda3\\envs\\tfradeon\\lib\\site-packages\\tensorflow_core\\python/directml.adbd007a01a52364381a1c71ebb6fa1b2389c88d.dll 2021-05-14 11:02:30.765470：我张量流/核心/common_runtime/dml/dml_device_cache.cc:249] DirectML 设备枚举：找到 2 个兼容的适配器。 2021-05-14 11:02:30.984834：我张量流/核心/common_runtime/dml/dml_device_cache.cc:185] DirectML：在适配器 0 上创建设备（Radeon (TM) 530）2021-05-14 11:02:31.150992：我张量流/stream_executor/platform/default/dso_loader.cc:99] 成功打开动态库Kernel32.dll 2021-05-14 11:02:31.174716：我张量流/核心/common_runtime/dml/dml_device_cache.cc:185] DirectML：在适配器 1 上创建设备（Intel(R) UHD Graphics 620）True

因此，tensorflow 使用 Intel 的集成 GPU 而不是 Radeon GPU。如果我从管理硬件中禁用英特尔 GPU，我会在输出中收到正确的 GPU

2021-05-14 10:47:09.171568：我 tensorflow/core/platform/cpu_feature_guard.cc:142] 你的 cpu 支持此 TensorFlow 二进制文件未编译为使用的说明：AVX2 2021-05-14 10:47:09.176828：我张量流/stream_executor/platform/default/dso_loader.cc:99] 成功打开动态库 C:\\Users\\v.rocca\\anaconda3\\envs\\tfradeon\\lib\\site-packages\\tensorflow_core\\python/directml.adbd007a01a52364381a1c71ebb6fa1b2389c88d.dll 2021-05-14 10:47:09.421265：我张量流/核心/common_runtime/dml/dml_device_cache.cc:249] DirectML 设备枚举：找到 1 个兼容的适配器。 2021-05-14 10:47:09.626567：我张量流/核心/common_runtime/dml/dml_device_cache.cc:185] DirectML：在适配器 0 上创建设备（Radeon (TM) 530）

我不想每次都禁用英特尔 GPU，所以这是我的问题。是否可以选择我想使用的 GPU？或者是否可以同时使用两个 GPU？谢谢

解决方法

来自Microsoft：

gpu_config = tf.GPUOptions()
gpu_config.visible_device_list = "1"

session = tf.Session(config=tf.ConfigProto(gpu_options=gpu_config))

GCP 中 NVIDIA P100 GPU 和 Committed NVIDIA P100 GPU 限制名称的区别

如何解决GCP 中 NVIDIA P100 GPU 和 Committed NVIDIA P100 GPU 限制名称的区别

目前，我正在尝试提高 GKE 中 NVIDIA P100 GPU 的配额限制。当我使用限制名称在配额中进行过滤时，我会得到两种类型的选项 - NVIDIA P100 GPU 和承诺的 NVIDIA P100 GPU。这两者有什么区别？

解决方法

顾名思义：

NVIDIA P100 GPU：您可以在项目中使用的 GPU 配额（并附加到 GCE）。仅当 GPU 连接到活动的 GCE 时，您才需要付费。
已提交的 NVIDIA P100 GPU：您可以在项目中提交（预留）的 GPU 配额。即使不使用或连接到虚拟机，您也要为此 GPU 付费，但您将获得折扣

GPU Mounter - 支持 GPU 热挂载的 Kubernetes 插件

前言

GPU Mounter 是一个支持动态调整运行中 Pod 可用 GPU 资源的 Kubernetes 插件，已经开源在 GitHub^[1]：

支持 Pod 可用 GPU 资源的动态调整
兼容 Kubernetes 调度器
无侵入式修改
REST API 接口
一键部署

下面聊一聊我对 GPU 容器化和 GPU 挂载的认识，以及为什么需要 GPU 热挂载。

1. GPU 容器化与 GPU 挂载

GPU 挂载很好理解，即为容器或 Pod 挂载 GPU 资源，允许容器中的应用程序使用。在容器化的趋势席卷各个领域的今天，深度学习也同样无法 “幸免”。各大云服务提供商，推出了自己的深度学习云平台（如国内阿里 PAI、腾讯 TI-ONE、百度 BML，国外 AWS Sagemaker 等），深度学习领域的研究者，也开始倾向于在本地采用 Docker 容器的方式构建深度学习训练环境。截止到目前 DockerHub 上 tensorflow 镜像被超过 10M 次，pytorch 镜像被拉取超过 1M 次，可见容器化的影响。

谈到深度学习的容器化，GPU 挂载是一个绕不开的话题，为此 Docker、Kubernetes、Nvidia 都做出了很多贡献：

Nvidia 贡献了 nvidia-docker、nvidia-container-runtime、k8s-deivice-plugin 等，支持在 Docker 和 Kubernetes 环境下使用 Nvidia GPU 资源
Docker 从 19.03 版本开始原生支持--gpus参数对接 nvidia-container-runtime
Kubernetes 从 1.8 版本开始提供 Device Plugin 接口，解耦源代码中对 Nvidia GPU 的原生支持

因为有了上述工作，我们在 Docker 或 Kubernetes 环境中想要使用 GPU，只需一个--gpus参数或者一个nvidia.com/gpu资源字段即可完成 GPU 资源的挂载。

2. 当前 GPU 挂载方案的不足

当前的 GPU 容器化的方案仍然存在一点不足，无法动态调整一个已经正在运行的容器或 Pod 可用的 GPU 资源。即我们必须在启动容器时就一次设定好容器可用的 GPU 资源，如果容器已经启动而我们又想要调整其 GPU 资源，只能先关掉这个容器，重新设定后再启动。

也许正处于这一限制，当前各大深度学习云平台均无法支持调整一个运行中实例的 GPU 资源的特性。

关于 Docker 和 Kubernetes 为什么没有解决这一问题，我的理解是容器或 Pod 通常被认为应该是无状态的（Stateless），应该维持其不变性（Immutability），即容器启动后就不应该更改其配置，如果有需要，应该基于一个满足要求的镜像重新开启新的容器。从容器的通用应用场景来看，这种观点是没有问题的，但是在深度学习平台场景下，这一点我认为值得商榷，深度学习应用的依赖通常比较复杂，难以构建标准统一的 “万能” 镜像即插即用。而出于安全的原因，平台一般只允许用户使用平台提供的通用镜像，因此用户不得不破坏不变性，在运行中的容器里安装各种复杂依赖，因此深度学习平台的容器应该被认为是有状态的。

3. 什么是 / 为什么需要 GPU 热挂载？

GPU 热挂载即调整一个运行中容器的 GPU 资源，能够增加或删除一个运行中的容器可用的 GPU 资源而无需暂停或重启容器。

GPU 热挂载这个场景在深度学习云平台上其实很常见，我们来考虑下用户使用深度学习云平台的基本流程。

用户启动一个实例后实际上还需要基于平台提供的基础镜像环境再去下载导入数据集和安装其他复杂的依赖库，这一过程数据集规模较大或代码依赖在较为复杂时可能需要耗费较长时间，然而由于无法在环境准备完成后再挂载 GPU 资源，用户不得不在一开始启动实例时就申请所需 GPU 资源。在上述准备环境的过程中 GPU 实际上处于闲置状态，对用户来说需要承受昂贵的 GPU 费用，对平台而言降低了整个平台的资源利用率。

而如果有了 GPU 热挂载的特性，我们就可以将上述流程优化成下图：

显而易见 GPU 的闲置时间可以大大减少。

4. GPU Mounter - 支持 GPU 热挂载的 Kubernetes 插件

出于上面的原因，我开源了一个 Kubernetes 插件支持 GPU 资源的热挂载。

利用 GPU 热挂载这一特性我们就可以将上述的流程优化成如下：

具体部署与使用详见 GitHub 仓库^[2]的 README。

如果觉得有价值希望能点一个 star 让更多人看到，也欢迎提 Issue 和 PR 帮助我更好的改进这个项目。

参考资料

[1]

GitHub: https://link.zhihu.com/?target=https%3A//github.com/pokerfaceSad/GPUMounter

[2]

GitHub 仓库: https://link.zhihu.com/?target=https%3A//github.com/pokerfaceSad/GPUMounter

原文链接：https://zhuanlan.zhihu.com/p/338251170

你可能还喜欢

点击下方图片即可阅读

Jenkins 大叔与 kubernetes 船长手牵手

云原生是一种信仰

关注公众号

后台回复◉k8s◉获取史上最方便快捷的 Kubernetes 高可用部署工具，只需一条命令，连 ssh 都不需要！

点击 "阅读原文" 获取更好的阅读体验！

发现朋友圈变“安静”了吗？

本文分享自微信公众号 - 云原生实验室（cloud_native_yang）。
如有侵权，请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一起分享。

GPU 内存的分级综述（gpu memory hierarchy）

GPU 内存的分级（gpu memory hierarchy）

小普中科院化学所在读博士研究生

研究课题，计算机模拟并行软件的开发与应用

Email: yaopu2019@126.com （欢迎和我讨论问题）

摘要 (Abstact)

GPU 的存储是多样化的，其速度和数量并不相同，了解 GPU 存储对于程序的性能调优有着重要的意义。本文介绍如下几个问题:

1. 内存类型有什么？2) 查询自己设备的内存大小 3）内存访问速度 4）不同级别的存储关系 5）使用注意事项。各种存储结构的优缺点。

正文

GPU 结构图

①寄存器内存（Register memory）

优点：访问速度的冠军！

缺点：数量有限

使用：在__global__函数，或者___device__ 函数内，定义的普通变量，就是寄存器变量。

例子：

 1 //kernel.cu
 2 
 3 __global__ void register_test()
 4 
 5 {
 6 
 7  int a = 1.0;
 8 
 9 double b = 2.0;
10 
11 }
12 
13  
14 
15 //main.cu
16 
17 int main()
18 
19 {
20 
21 int nBlock = 100;
22 
23 register_test <<<nBlock,128>>>();
24 
25 return 0;
26 
27 }
28 
29  
30 
31

②共享内存（Shared memory）

优点：

1 缓存速度快比全局内存快 2 两个数量级

2 线程块内，所有线程可以读写。

3 生命周期与线程块同步

缺点：大小有限制

使用：关键词 __shared__ 如 __shared__ double A [128];

适用条件：

使用场合，如规约求和 : a = sum A [i]

如果不是频繁修改的变量，比如矢量加法。

是编程优化中的重要手段！

C [i] = A [i] + B [i] 则没有必要将 A,B 进行缓存到 shared memory 中。

 1 /kernel.cu
 2 
 3 __global__ void shared_test()
 4 
 5 {
 6 
 7 __shared__ double A[128];
 8 
 9  int a = 1.0;
10 
11 double b = 2.0;
12 
13 int tid = threadIdx.x;
14 
15 A[tid] = a;
16 
17 }

另外一种开辟 shared memory 的方式

kernel 函数内，声明方式

extern __shared__ unsigned int s_out[];

执行 kernel_func<<n_block,block_size,shared_mem_size>>>();

③全局内存 (Global Memory)

优点：

1 空间最大（GB 级别）

2. 可以通过 cudaMemcpy 等与 Host 端，进行交互。

3. 生命周期比 Kernel 函数长

4. 所有线程都能访问

缺点：访存最慢

 1 //kernel.cu
 2 
 3 __global__ void shared_test(int *B)
 4 
 5 {
 6 
 7 double b = 2.0;
 8 
 9 int tid = threadIdx.x;
10 
11 int id = blockDim.x*128 + threadIdx.x;
12 
13 int a = B[id] ;
14 
15 }

④纹理内存

优点，比普通的 global memory 快

缺点：使用起来，需要四个步骤，麻烦一点

适用场景：比较大的只需要读取 array，采用纹理方式访问，会实现加速

使用的四个步骤（这里以 1 维 float 数组为例子），初学者，自己手敲一遍代码！！！

第一步，声明纹理空间，全局变量：

texture<float, 1, cudaReadModeElementType> tex1D_load;

第二步，绑定纹理

第三步，使用

第四步，解绑定

具体看代码，（最好自己敲一遍！）

  1 #include <iostream>
  2 
  3 #include <time.h>
  4 
  5 #include <assert.h>
  6 
  7 #include <cuda_runtime.h>
  8 
  9 #include "helper_cuda.h"
 10 
 11 #include <iostream>
 12 
 13 #include <ctime>
 14 
 15 #include <stdio.h>
 16 
 17  
 18 
 19 using namespace std;
 20 
 21  
 22 
 23 texture<float, 1, cudaReadModeElementType> tex1D_load;
 24 
 25 //第一步，声明纹理空间,全局变量
 26 
 27  
 28 
 29 __global__ void kernel(float *d_out, int size)
 30 
 31 {
 32 
 33     //tex1D_load 为全局变量，不在参数表中
 34 
 35     int index;
 36 
 37     index = blockIdx.x * blockDim.x + threadIdx.x;
 38 
 39     if (index < size)
 40 
 41     {
 42 
 43         d_out[index] = tex1Dfetch(tex1D_load, index); //第三步，抓取纹理内存的值
 44 
 45         //从纹理中抓取值
 46 
 47         printf("%f\n", d_out[index]);
 48 
 49     }
 50 
 51 }
 52 
 53  
 54 
 55 int main()
 56 
 57 {
 58 
 59     int size = 120;
 60 
 61     size_t Size = size * sizeof(float);
 62 
 63     float *harray;
 64 
 65     float *d_in;
 66 
 67     float *d_out;
 68 
 69  
 70 
 71     harray = new float[size];
 72 
 73     checkCudaErrors(cudaMalloc((void **)&d_out, Size));
 74 
 75     checkCudaErrors(cudaMalloc((void **)&d_in, Size));
 76 
 77  
 78 
 79     //initial host memory
 80 
 81  
 82 
 83     for (int m = 0; m < 4; m++)
 84 
 85     {
 86 
 87         printf("m = %d\n", m);
 88 
 89         for (int loop = 0; loop < size; loop++)
 90 
 91         {
 92 
 93             harray[loop] = loop + m * 1000;
 94 
 95         }
 96 
 97         //拷贝到d_in中
 98 
 99         checkCudaErrors(cudaMemcpy(d_in, harray, Size, cudaMemcpyHostToDevice));
100 
101  
102 
103         //第二步,绑定纹理
104 
105         checkCudaErrors(cudaBindTexture(0, tex1D_load, d_in, Size));
106 
107         //0表示没有偏移
108 
109  
110 
111         int nBlocks = (Size - 1) / 128 + 1;
112 
113         kernel<<<nBlocks, 128>>>(d_out, size); //第三步
114 
115         cudaUnbindTexture(tex1D_load);         //第四，解纹理
116 
117         getLastCudaError("Kernel execution failed");
118 
119         checkCudaErrors(cudaDeviceSynchronize());
120 
121     }
122 
123     delete[] harray;
124 
125     cudaUnbindTexture(&tex1D_load);
126 
127     checkCudaErrors(cudaFree(d_in));
128 
129     checkCudaErrors(cudaFree(d_out));
130 
131     return 0;
132 
133 }

总结如下表

要点：

1 在同一个 warp 内，多线线程访问一个 bank 的不同地址，造成 confict, 影响 shared memory 的速度。

2 解决 bank confict 的方法： padding。

3 const memory 用于存储固定常量，比如固定的参数等。

结束语

小普中科院化学所在读博士研究生

研究课题，计算机模拟并行软件的开发与应用

Email: yaopu2019@126.com （欢迎和我讨论问题，私信和邮件都 OK!）

让程序使得更多人受益！

参考文献

1) CUDA 专家手册 GPU 编程权威指南 [M] 2014

2) CUDA Toolkit Documentation v10.1.168 https://docs.nvidia.com/cuda/

关于Ubuntu 18.04 基于NVIDIA 2080安装TensorFlow-GPU 1.13.1和ubuntu18.04安装2080ti显卡驱动的问题就给大家分享到这里，感谢你花时间阅读本站内容，更多关于Choiche GPU tensorflow-directml 或 multi-gpu、GCP 中 NVIDIA P100 GPU 和 Committed NVIDIA P100 GPU 限制名称的区别、GPU Mounter - 支持 GPU 热挂载的 Kubernetes 插件、GPU 内存的分级综述（gpu memory hierarchy）等相关知识的信息别忘了在本站进行查找喔。

本文标签：