Linux 修復顯卡驅動問題
當以下指令都失效時,開始一步一步檢查問題
> nvcc -V
zsh: command not found: nvcc
> nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
- 確認 cuda 是否有在
$PATH
中。
> echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
若沒有,將 export PATH=/usr/local/cuda/bin:$PATH
加入到 .zshrc
當中(若使用zsh),讓 zsh 知道 nvcc
是放在哪個地方。 加完之後,再次執行就可以看到
❯ source ./zshrc #重新讀取文件中之設定
❯ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
- 接下來處理
nvidia-smi
之問題
> sudo systemctl status nvidia-persistenced
[sudo] password for Sam504:
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2023-05-31 14:51:18 CST; 43s ago
Process: 955 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced (code=exited, status=0/SUCCESS)
Process: 893 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose (code=exited, st
May 31 14:51:18 liboffice nvidia-persistenced[897]: Started (897)
May 31 14:51:18 liboffice nvidia-persistenced[897]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/
May 31 14:51:18 liboffice nvidia-persistenced[897]: PID file unlocked.
May 31 14:51:18 liboffice nvidia-persistenced[893]: nvidia-persistenced failed to initialize. Check syslog for more details.
May 31 14:51:18 liboffice nvidia-persistenced[897]: PID file closed.
May 31 14:51:18 liboffice nvidia-persistenced[897]: The daemon no longer has permission to remove its runtime data directory /var
May 31 14:51:18 liboffice nvidia-persistenced[897]: Shutdown (897)
May 31 14:51:18 liboffice systemd[1]: nvidia-persistenced.service: Control process exited, code=exited status=1
May 31 14:51:18 liboffice systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
May 31 14:51:18 liboffice systemd[1]: Failed to start NVIDIA Persistence Daemon.
可以發現是 fail 的。 嘗試重新安裝驅動 使用以下指令尋找要安裝之驅動
> ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001C03sv00001462sd00003281bc03sc00i00
vendor : NVIDIA Corporation
model : GP106 [GeForce GTX 1060 6GB]
driver : nvidia-driver-418 - third-party free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-515 - third-party free
driver : nvidia-driver-510 - third-party free
driver : nvidia-driver-440 - third-party free
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-460 - third-party free
driver : nvidia-driver-465 - third-party free
...
driver : xserver-xorg-video-nouveau - distro free builtin
我目前是找最新的安裝
sudo apt install nvidia-driver-418
sudo reboot
若安裝上有問題,可以嘗試將所有驅動刪除重新開機之後再安裝
sudo apt remove --purge '^nvidia-.*'
sudo apt autoremove
sudo apt autoclean
完成之後一定要記得重開機再執行 nvidia-smi
以上,就完成驅動問題修復了。
可以使用以下指令在顯卡工作時動態查看相關參數
watch -n -0.1 -d nvidia-smi