Rocky Linux 9 AI 系列 007 — 安装 NVIDIA Container Toolkit

安装 NVIDIA 容器工具包

如果想在容器中可以调用 NVIDIA GPU 资源,除了在宿主机上安装 NVIDIA 显卡驱动外,还需要安装 NVIDIA Container Toolkit。

# 配置仓库信息
[root@gpu-server-001 ~]# wget -O /etc/yum.repos.d/nvidia-container-toolkit.repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo

# 安装容器工具包
[root@gpu-server-001 ~]# dnf install -y nvidia-container-toolkit

# 该 nvidia-ctk 命令会修改 /etc/docker/daemon.json 主机上的文件,以便 Docker 可以使用 NVIDIA 容器运行时。
[root@gpu-server-001 ~]# nvidia-ctk runtime configure --runtime=docker
WARN[0000] Ignoring runtime-config-override flag for docker 
INFO[0000] Config file does not exist; using empty config 
INFO[0000] Wrote updated config to /etc/docker/daemon.json 
INFO[0000] It is recommended that docker daemon be restarted. 
[root@gpu-server-001 ~]# more /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

# 重启 docker 服务
[root@gpu-server-001 ~]# systemctl restart docker

# 使用 nvidia/cuda:12.4.1-cudnn-runtime-rockylinux9 容器验证是否能够调用 GPU 资源(注意对应cuda 版本,需要与宿主机安装的 CUDA 驱动版本相同)
[root@gpu-server-001 ~]# docker run --rm --runtime=nvidia --gpus all -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all nvidia/cuda:12.4.1-cudnn-runtime-rockylinux9 nvidia-smi

==========
== CUDA ==
==========

CUDA Version 12.4.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Wed Aug 21 12:27:34 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:03:00.0 Off |                  N/A |
| 26%   37C    P0             58W /  250W |       0MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

容器中调用 GPU 资源

前面通过 nvidia-ctk runtime configure --runtime=docker 命令自动修改了 /etc/docker/daemon.json 配置文件,其中配置了 NVIDIA GPU 容器运行时相关参数,但在运行容器或者配置 Docker Compose 文件时,仍然需要明确指定使用 NVIDIA 运行时,以确保容器能够正确利用 GPU 资源。

[root@gpu-server-001 ~]# cat /etc/docker/daemon.json
{  
    "runtimes": {  
        "nvidia": {  
            "args": [],  
            "path": "nvidia-container-runtime"  
        }  
    }  
}  

Docker Compose 示例

在 Docker Compose 文件中,您需要在服务的配置部分添加 runtime: nvidia,并确保适当的环境变量以指示需要使用的 GPU 资源。

[root@gpu-server-001 ~]# cat docker-compose.yaml
services:  
  tts:  
    image: tts:latest  
    runtime: nvidia  
    environment:  
      - NVIDIA_VISIBLE_DEVICES=all  # 或者指定特定的设备,比如:0、1  
      - NVIDIA_DRIVER_CAPABILITIES=all  # 或者指定特定的能力,比如:compute、utility  
    deploy:  
      resources:  
        reservations:  
          devices:  
            - capabilities: [gpu]  

参数说明:

  • runtime: nvidia:指示 Docker 使用 nvidia-container-runtime 来启动容器。
  • environment
    • NVIDIA_VISIBLE_DEVICES:指定哪些 GPU 设备对容器可见。all 表示所有 GPU 设备,或者手动指定特定的设备 ID(比如 0、1)。
    • NVIDIA_DRIVER_CAPABILITIES:指定容器需要的 GPU 功能。例如:compute、utility 表示只需要计算和实用工具功能。all 表示所有功能。

Docker 命令行示例

[root@gpu-server-001 ~]# docker run --runtime=nvidia --gpus all -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all nvidia/cuda:12.4.1-cudnn-runtime-rockylinux9 nvidia-smi

通过以上配置,您可以在 Docker Compose 文件或 Docker 命令行中正确指定使用 NVIDIA 容器运行时,从而使您的容器能够利用 GPU 资源进行计算。

参考文献

[1] Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit 1.16.0 documentation

Avatar photo

关于 木子

Founder of the Rocky Linux Chinese community, MVP、VMware vExpert、TVP, advocate for cloud native technologies, with over ten years of experience in site reliability engineering (SRE) and the DevOps field. Passionate about Cloud Computing、Microservices、CI&CD、DevOps、Kubernetes, currently dedicated to promoting and implementing Rocky Linux in Chinese-speaking regions.
用一杯咖啡支持我们,我们的每一篇[文档]都经过实际操作和精心打磨,而不是简单地从网上复制粘贴。期间投入了大量心血,只为能够真正帮助到您。
暂无评论

发送评论 编辑评论


|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇