离线安装Docker以及NVIDIA-docker
硬件
显卡
显卡驱动英伟达 cuda-drivers
1.软件综述:
1.installed the NVIDIA driver and Docker engine for your Linux distribution Note
01. NVIDIA driver install the cuda-drivers
02. Docker engine
2 ###cuda
nvcc --version nvcc是CUDA的编译器,可以从CUDA Toolkit的/bin⽬录中获取,类似于gcc就是c语⾔的编译器
nvidia-smi 是NVIDIA System Management Interfac
3.nvidia-docker 是英伟达公司专门为docker⽅便使⽤GPU设备,⽽开发的⼀种插件
nvidia-container-toolkit 还是 nvidia-docker2
Docker 19.03的发布,不赞成使⽤nvidia-docker2软件包,因为Docker运⾏时中现在已将NVIDIA GPU作为设备本地⽀持
nvidia将提供原⽣的显卡⽀持,只需要安装 nvidia-container-toolkit⼯具包不再像使⽤nvidia-docker/2那样复杂配置,⽽且不⽀持⽤docker-compose 2.安装docker
版本 >=19.03
1.安装
⽅式⼀:在线安装
sudo apt-get update
# 查看版本
apt-cache madison docker-ce
# 安装特定版本 sudo apt-get install docker-ce=5:18.09.1~3-0~ubuntu-xenial docker-ce-cli=5:18.09.1~3-0~ubuntu-xenial containerd.io
sudo apt-get install docker-ce docker-ce-cli containerd.io
⽅式⼆:离线docker
步骤1:下载deb包
download.docker/linux/ubuntu/dists/xenial/pool/stable/amd64/
步骤2:安装
sudo dpkg -i containerd.io_1.2.2-3_amd64.deb
sudo dpkg -i docker-ce-cli_19.03.9_3-0_ubuntu-xenial_amd64.deb
sudo dpkg -i docker-ce_19.03.9_3-0_ubuntu-xenial_amd64.deb
2.确认是否安装成功
docker --version
docker version
3.包说明:
docker依赖清楚的情况下
containerd.io - daemon to interface with the OS API (in this case, LXC - Linux Containers),
essentially decouples Docker from the OS, also provides container services for non-Docker container managers
docker-ce - Docker daemon, this is the part that does all the management work, requires the other two on Linu
docker-ce-cli - CLI tools to control the daemon, you can install them on their own if you want to control a remote Docker daemon
参考: docs.docker/engine/install/ubuntu/
安装 nvidia-docker
离线安装版本
注意 nvidia-container-toolkit需要主机已安装当前新版的docker 19.03
下载⽅法:
参考 github/NVIDIA/nvidia-docker ,
在⼀台可以上⽹的机器上,配置apt源,⽤来下载deb包
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
root@ubuntu:~/nvidia-docker-package# . /etc/os-release;echo $ID$VERSION_ID
ubuntu16.04
##1.更新源
#Ubuntu 16.04/18.04, Debian Jessie/Stretch
# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update
#2.下载包
$ apt download libnvidia-container1
$ apt download libnvidia-container-tools
$ apt download nvidia-container-toolkit
###3.分析依赖-确定安装顺序
apt show nvidia-container-toolkit
apt show libnvidia-container-tools
apt show libnvidia-container1
###4.安装-把下载的包放到要安装的服务器中,安装安装
docker重启容器命令sudo dpkg -i libnvidia-container1_1.4.0-1_amd64.deb
sudo dpkg -i libnvidia-container-tools_1.4.0-1_amd64.deb
sudo dpkg -i nvidia-container-toolkit_1.5.0-1_amd64.deb
###5 重启docker
sudo systemctl restart docker
###6.创建容器 un表⽰create+start,是创建新容器镜像 + 标签
sudo docker run --rm --gpus all -v /home/test:/home/test -v /data:/data --name "test_act" --shm-size="8g" -it ufoym/deepo:latest bash
备注说明:
⽅法⼀:安装nvidia-container-toolkit,后添加—gpus参数来使⽤
使⽤nvidia-container-toolkit的最⼤优点:linux主机不需要安装 CUDA toolkit,仅安装显卡驱动 cuda-drivers 即可
sudo docker run --gpus all -v /home/test:/home/test -v /data:/data --name "test_act" --shm-size="8g" -it ufoym/deepo:latest bash
⽅法⼆: nvidia-docker2 and nvidia-container-runtime
安装nvidia-container-runtime,在⾸次运⾏时添加—runtime=nvidia参数,后续启动、结束都不需要再加
sudo apt-get install nvidia-container-runtime
sudo docker run --gpus all-v /home/test:/home/test -v /data:/data --name "test_act" --shm-size="8g" -it ufoym/deepo:latest bash nvidia-docker2
###各个包具体情况
libnvidia-container 提供⼀个库和⼀个简单的CLI程序,
使⽤这个库可以使NVIDIA GPU使⽤ Linux 容器。
github/NVIDIA/libnvidia-container
nvidia-container-runtime
它在原有的runc容器运⾏时的基础上增加⼀个 prestart hook⽤于调⽤ libnvidia-container 库
github/NVIDIA/nvidia-container-runtime
nvidia-docker2.0 是⼀个简单的包,它主要通过修改docker的配置⽂件
来让docker使⽤ NVIDIA Container runtime
github/NVIDIA/nvidia-docker
###nvidia-docker 离线的资源获取
相同linux的有⽹的下载3个包到当前⽬录,拷贝这些包到没有⽹的服务器上。
apt download libnvidia-container-tools
apt download nvidia-container-runtime
apt download nvidia-docker2
scp
sudo dpkg -i
sudo systemctl daemon-reload
sudo systemctl restart docker
###查看依赖包顺序
检查软件包依赖关系
apt-cache depends nvidia-docker2
apt-cache rdepends nvidia-docker
apt-cache depends nvidia-docker2
nvidia-docker2
依赖: nvidia-container-runtime
|依赖: docker-ce
|依赖: <docker-ee>
依赖: docker.io
破坏: <nvidia-docker>
替换: <nvidia-docker>
apt-cache depends nvidia-container-runtime
nvidia-container-runtime
依赖: nvidia-container-toolkit
依赖: nvidia-container-toolkit
依赖: libseccomp2
apt-cache depends nvidia-container-toolkit
apt-cache depends libseccomp2
apt-cache rdepends nvidia-container-runtime
apt-cache rdepends nvidia-container-runtime-hook
程序位置
/usr/bin/nvidia-container-runtime
配置位置
/etc/docker/daemon.json
配置内容
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"insecure-registries":["12.10.12.12:1212"]
}
####通过apt命令已安装软件
apt list --installed |grep docker
###进⼊和⼀些操作
sudo docker run --runtime=nvidia -v /data:/data --name="gpu_form" --shm-size="8g" -it gpu:latest
sudo nvidia-docker start "ai_platform"
sudo nvidia-docker exec -it "ai_platform" /bin/bash
登录docker 私有服务器
#01.登录
docker login 100.100.100.100:8000 # 账号密码会提⽰写⼊
解决办法: docker的守护进程参数配置⼀般在⽂件 /etc/docker/daemon.json
在/etc/docker/daemon.json配置中加⼊⾮安全仓库,允许⽤http协议拉取,写法很简单
{ "insecure-registries":["100.100.100.100:8000"] }
然后
sudo systemctl daemon-reload #守护进程重启
sudo service docker restart #重启docker服务
# 下载镜像
使⽤⼯具: docker
使⽤命令: docker pull 100.100.100.100:8000/cnn/cnn_update:latest
02.加载镜像
cd /home/test/registry/
sudo docker load < cnn_update.tar
03. 查看镜像是否加载成功
sudo docker images
#100.100.100.100:8000/cnn/cnn_update latest 000000f3514b 1 hours ago 0.5GB
04.根据镜像创建docker 容器
sudo docker run --gpus all -v /home/test:/home/test -v /data:/data --name "test_act" --shm-size="8g" -it 100.100.100.100:8000/cnn/cnn_update:latest bash
初次创建环境后,会直接进⼊容器
05.命令 stop rm --it,进⼊了命令交互界⾯
sudo docker start "test_train"
sudo docker exec -it "test_train" /bin/bash
说明
01. 主机有GPU硬件--GPU驱动
02. cuda
03. 主机安装docker 有nvidia-docker
应⽤:创建container 需要有 --gpus 参数
sudo docker run --runtime=nvidia --gpus all -v /home/premodel:/home/premodel_bigdata -v /data:/data --name="ai_mine" --shm-size="8g" -it ai_dev:latest 离线的情况下,可以将下载好的backbone 放在 ~/.cache/torch/hub/checkpoints/ 中即可
/root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
cp /home/premodel/vgg16-397923af.pth /root/.cache/torch/hub/checkpoints/
参考
docs.docker/engine/install/ubuntu/
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论