Ubuntu20.04 安装强化学习环境(CUDA、Conda)
1.禁用原驱动
防止黑屏
sudo vim /etc/default/grub
或
sudo gedit /etc/default/grub
#编辑打开的文件,找到GRUB_CMDLINE_LINUX_DEFAULT那一行,在后面加上(在quiet splash后
打一个空格) nomodeset(保险起见,nomodeset后面加多一个空格),保存,然后在终端输入 sudo update-grub 重启后就OK了!!!
2.sudo gedit /etc/modprobe.d/blacklist.conf
最后一行输入
blacklist nouveau
options nouveau modeset=0
终端运行
sudo update-initramfs -u
sudo reboot
3.检验
lsmod | grep nouveau
如果没有显示内容,则表示nouveau被成功禁用
2.安装驱动
1.查看自己系统代号
lspci | grep -i vga
%%
0000:01:00.0 VGA compatible controller: NVIDIA Corporation Device 25a0 (rev a1)
%%
https://admin.pci-ids.ucw.cz/mods/PC/10de?action=help?help=pci网站查询 25a0
返回:
Name: GA107M [GeForce RTX 3050 Ti Mobile]
2.下载驱动
Download The Official NVIDIA Drivers | NVIDIA(科学上网)
根据1的搜索下载对应的驱动
3.安装依赖
sudo apt-get update
sudo apt-get install g++
sudo apt-get install gcc
sudo apt-get install make
4.赋予权限
//赋予可执行文件的权限
sudo chmod a+x ./NVIDIA-Linux-x86_64-570.144.run
5.//运行
sudo ./NVIDIA-Linux-x86_64-570.144.run
6.安装
1.Multiple kernel module types are available for this system. Which would you like to use?
NVIDIA Proprietary | MIT/GPL选择左边
2.There appears to already be a driver installed on your system (version:
570.144). As part of installing this driver (version: 570.144), the
.existing driver will be uninstalled. Are you sure you want to continue?
Continue installation Abort installation 选择左边
3.Install NVIDIA's 32-bit compatibility libraries?
Yes No 选择左边
4.The initramfs will likely need to be rebuilt due to the following condition(s): * Nouveau is present in the initramfs. Would you like to rebuild the initramfs?
Do not rebuild initramfs |Rebuild initramfs选择右边
5.Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up.
Yes No 选择左边
6.Your X configuration file has been successfully updated. Installation of
the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 570.144)
is now complete.
OK 完成了
7.检验
————————————————
nvidia-smi
输出下面就是成功了
Sun Apr 27 09:41:38 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.144 Driver Version: 570.144 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 61C P0 17W / 80W | 0MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
3. CUDA以及cuDnn配置
0.卸载流程
卸载cuda
cd /usr/local/cuda-xx.x/bin
sudo ./cuda-uninstaller
sudo rm -rf /usr/local/cuda-xx.x
卸载cudnn
sudo rm -rf /usr/local/cuda/include/cudnn.h
sudo rm -rf /usr/local/cuda/lib64/libcudnn*
验证
nvcc -V
——找不到就行
1.CUDA Toolkit Archive | NVIDIA Developer 网站安装
需要注意这里下载的版本不能大于上面命令
nvidia-smi
显示的CUDA Version
我是12.8 选择12.8即可
2.查看是否存在已安装的驱动版本
ls /usr/src | grep nvidia
nvidia-570.144
若输出了与刚刚下载的版本一致的nvidia驱动则表示正常,则继续安装
3.查看ubuntu版本
lsb_release -a
%%
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
%%
4.进入刚刚网站进行run安装即可
按照提示
wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run
chmod 777 ./cuda_12.8.0_570.86.10_linux.run
sudo ./cuda_12.8.0_570.86.10_linux.run
sudo sh cuda_12.8.0_570.86.10_linux.run
5.进入安装
选择continue
输入accept
在选择勾选的时候,注意需要把nvidia驱动的选项去掉,,因为前面已经手动安装了nvidia的驱动,这里不需要安装。 最后一个nvidia-fs是 NVIDIA 文件系统相关的内核对象可以暂时不安装
运行需要一段时间,漫长的等待后…
如输出以下信息则表示成功安装
cuda
6.配置环境
sudo gedit ~/.bashrc
(注意这里 x 替换成自己的cuda版本)
export PATH=/usr/local/cuda-11.x/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.x/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
source ~/.bashrc
7.验证
nvcc -V
输出下面证明安装成功
8.配置cuDNN
进入 cuDNN Archive | NVIDIA Developer
上面 CUDA 采用 run 文件进行安装,那么cuDNN推荐使用tar包进行安装:Local Installer for Linux x86_64 (Tar)
解压
tar -xvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
配置权限(注: 均需要换成自己对应的具体的CUDA 和 cuDNN版本)
sudo cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/include/cudnn* /usr/local/cuda-12.8/include
sudo cp -P cudnn-linux-x86_64-8.9.7.29_cuda12-archive/lib/libcudnn* /usr/local/cuda-12.8/lib64
sudo chmod a+r /usr/local/cuda-12.8/include/cudnn*.h /usr/local/cuda-12.8/lib64/libcudnn*
验证
cat /usr/local/cuda-xx.x/include/cudnn.h | grep CUDNN_MAJOR -A 2
或
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
输出下面就是成功
4.Conda配置
1.下载anaconda
Download Now | Anaconda 网站下载linux版本
2.安装
bash ./Anaconda3-2024.10-1-Linux-x86_64.sh
一路enter,遇到许可证按q退出查看,再yes一下,然后一路enter
输出以下信息表示安装成功
3.最后激活一下环境变量
source ~/.bashrc
4.验证conda
若输出以下信息,则表示安装成功!
————————————————
5.若报错输出,conda:未找到命令
sudo gedit ~/.bashrc
在文件最后添加,/path/to/conda是自己的conda的安装路径,默认在~/anaconda3/bin
export PATH="/path/to/conda/bin:$PATH"
我是:export PATH="~/anaconda3/bin:$PATH"
source ~/.bashrc
6.各个基本指令
conda create -n name python=3.9
查看所有环境
conda env list
激活环境
conda activate env
退出环境
conda deactivate
删除环境
conda remove --name env --all
查看已安装的包
conda list
5.安装torch
进入自己的conda环境,根据自己的CUDA环境在下面网站安装适配的torch包即可
Previous PyTorch Versions | PyTorch
本人是CUDA12.8,安装低于12.8最新的即可
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
进入终端
conda activate env
进入python交互界面
python
Python 3.9.21 (main, Dec 11 2024, 16:24:11)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
2.6.0+cu126
>>> print(torch.cuda.is_available())
True
>>>
按照上述输出,返回信息如下就是安装成功了