NVIDIA-SMI has failed on Remote Server

https://forums.developer.nvidia.com/t/nvidia-smi-has-failed-on-remote-server/315075

oleg.s

1 Nov 2024 Hello,

I have a fresh server on-site which I can remotely connect to via ssh…

I have two A16 in the server. If I run lspci -v grep -i nvidia I get:

2a:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1) Subsystem: NVIDIA Corporation Device 14a9 Kernel modules: nvidiafb, nouveau 2b:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1) Subsystem: NVIDIA Corporation Device 14a9 Kernel modules: nvidiafb, nouveau 2c:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1) Subsystem: NVIDIA Corporation Device 14a9 Kernel modules: nvidiafb, nouveau 2d:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1) Subsystem: NVIDIA Corporation Device 14a9 Kernel modules: nvidiafb, nouveau b8:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1) Subsystem: NVIDIA Corporation Device 14a9 Kernel modules: nvidiafb, nouveau b9:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1) Subsystem: NVIDIA Corporation Device 14a9 Kernel modules: nvidiafb, nouveau ba:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1) Subsystem: NVIDIA Corporation Device 14a9 Kernel modules: nvidiafb, nouveau bb:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1) Subsystem: NVIDIA Corporation Device 14a9 Kernel modules: nvidiafb, nouveau

Running sudo mokutil –sb-state​ it told me that it is off…

Then, what I did was:

sudo ubuntu-drivers install –gpgpu nvidia:535-server sudo apt install nvidia-utils-535-server reboot…

nvidia-smi → Error: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

sudo apt install –reinstall linux-headers-$(uname -r) reboot…

nvidia-smi → Error: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Attached is the bug report… nvidia-bug-report.log.gz (234.9 KB)

I could not run nvidia-settings command as it was not installed…

Could you please help what I need to do? Attached is also the bug report.

Solved… do not use basic drivers from ubuntu… use NVIDIA ones… datacenter-driver Downloads NVIDIA Developer

298 views

2 links

post by oleg.s on Nov 29, 2024

oleg.s Nov 2024 Solved… do not use basic drivers from ubuntu… use NVIDIA ones… datacenter-driver Downloads | NVIDIA Developer

post by MarkusHoHo on Dec 2, 2024

MarkusHoHo Moderator Dec 2024 Just for reference, Data centre driver documentation:

docs.nvidia.com NVIDIA Datacenter Drivers :: NVIDIA Data centre GPU Driver Documentation Documentation for NVIDIA® Datacenter Drivers.

14 days later Closed on Dec 16, 2024

Closed on Dec 16, 2024

Updated: