Non-specific error message at boot from nVidia drivers

Micronaut · 2019-07-07 23:38:45

Every time my Devuan Ascii system with recently installed nvidia proprietary drivers boots, I get the frankly useless "Error running install command for nvidia" message as the very first thing on the screen. Yet the driver seems to work perfectly. There are two threads I have found about this problem already. There may be more. I thought it a good idea to start a new thread because this is (hopefully) a different approach.
https://dev1galaxy.org/viewtopic.php?id=2311
https://dev1galaxy.org/viewtopic.php?pid=10922

As I said in the first thread, this looks to me like something that should have been removed being left behind by an installer.

The second thread, in the Hardware & System Configuration forums like this post, contains a suggestion to comment out a line of a config file in modprobe.d, but those files have been replaced by the install of the nvidia drivers on my system. The file with the module init commands is now named glx--nvidia-modprobe.conf and it's in the /alternatives directory instead of /etc/modprobe.d Since the original poster did not return and report his results, I can only guess if it worked. Here is the content of that file on my system.

install nvidia modprobe -i nvidia-current $CMDLINE_OPTS

install nvidia-modeset modprobe nvidia ; modprobe -i nvidia-current-modeset $CMDLINE_OPTS

install nvidia-drm modprobe nvidia-modeset ; modprobe -i nvidia-current-drm $CMDLINE_OPTS

install nvidia-uvm modprobe nvidia ; modprobe -i nvidia-current-uvm $CMDLINE_OPTS

remove nvidia modprobe -r -i nvidia-drm nvidia-modeset nvidia-uvm nvidia

remove nvidia-modeset modprobe -r -i nvidia-drm nvidia-modeset

# These aliases are defined in *all* nvidia modules.
# Duplicating them here sets higher precedence and ensures the selected
# module gets loaded instead of a random first match if more than one
# version is installed. See #798207.
alias	pci:v000010DEd00000E00sv*sd*bc04sc80i00*	nvidia
alias	pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*	nvidia
alias	pci:v000010DEd*sv*sd*bc03sc02i00*		nvidia
alias	pci:v000010DEd*sv*sd*bc03sc00i00*		nvidia

Do any of those lines appear redundant to an expert who knows how this part of the boot process works? As I said, the generic "error running install command" message looks to me like an installer not completely removed. But I can't distinguish what all of these lines do. If not, is there anywhere else I could look for driver config commands that might be superfluous once the driver is installed?

Marjorie · 2019-07-08 21:47:21

When I look for lines including nvidia when I boot I see this:

marjorie@erewhon:~$ sudo cat /var/log/dmesg | grep -i nvidia
[    1.251024] udevd[87]: Error running install command for nvidia
[    3.685958] nvidia: loading out-of-tree module taints kernel.
[    3.685972] nvidia: module license 'NVIDIA' taints kernel.
[    3.710764] nvidia-nvlink: Nvlink Core is being initialized, major device number 248
[    3.711615] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  390.116  Sun Jan 27 07:21:36 PST 2019 (using threaded interrupts)
[    4.350801] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input13
[    4.351196] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input14
[    5.661013] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  390.116  Sun Jan 27 06:30:32 PST 2019
[    5.703206] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver

The failure to load the module is by udev. Somewhat later, in userspace the module is loaded, the mode is set and finally the driver is loaded.

If we look at the man page for nvidia-modprobe it says:

DESCRIPTION
       The  nvidia-modprobe  utility  is  used  by user-space NVIDIA driver components to make sure the NVIDIA kernel module is loaded and that the NVIDIA
       character device files are present.  These facilities are normally provided by Linux distribution configuration systems such as udev.  When  possi‐
       ble,  it is recommended to use your Linux distribution's native mechanisms for managing kernel module loading and device file creation.  This util‐
       ity is provided as a fallback to work out-of-the-box in a distribution-independent way.

My take on this is that devuan's udev (-eudev) is failing to load the module and that it's then being loaded as a fallback by nvidia-modprobe.

The issue is then why udev fails to load the module and I've no information on why this fails.

I didn't see the warning message on my old Mint Rosa distribution, The nvidia lines there were:

/media/marjorie/dc44c8b4-9d77-4ca3-9b39-1b0b352d7a80/var/log$ sudo cat ./dmesg | grep nvidia
[   16.201881] nvidia: loading out-of-tree module taints kernel.
[   16.201889] nvidia: module license 'NVIDIA' taints kernel.
[   16.239873] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[   16.253093] nvidia-nvlink: Nvlink Core is being initialized, major device number 245
[   16.253816] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  384.130  Wed Mar 21 03:37:26 PDT 2018 (using threaded interrupts)
[   16.260391] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  384.130  Wed Mar 21 02:59:49 PDT 2018
[   16.262006] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   16.529536] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 244
[   16.544189] systemd-udevd[979]: failed to execute '/bin/systemctl' '/bin/systemctl start --no-block nvidia-persistenced.service': No such file or directory
[   16.685718] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card1/input13
[   16.685806] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card1/input14
[   17.390018] systemd-udevd[1038]: failed to execute '/bin/systemctl' '/bin/systemctl stop --no-block nvidia-persistenced': No such file or directory

Micronaut · 2019-07-13 01:07:59

Well, that's an interesting thought. If the modprobe.conf commands are in fact not working at all, maybe I can just comment them out and let this user-space fail-safe do the loading? What confuses me is that the message "udevd[*number*]" is always so different. Isn't that number supposed to tell you something useful about where in the boot process the error occurred? Is it really failing at wildly different points every time? Where in the boot process would this other load system be invoked? I am not familiar with the Linux boot process at such a low-level of detail.

chris2be8 · 2019-07-13 18:56:58

I think the number in "udevd[*number*]" is the PID of the udevd task. Run ps -ef | grep udev and see if the PID matches.

On my system /etc/modprobe.d/nvidia.conf contains

install nvidia modprobe -i nvidia-current $CMDLINE_OPTS

install nvidia-modeset modprobe nvidia ; modprobe -i nvidia-current-modeset $CMDLINE_OPTS

install nvidia-drm modprobe nvidia-current-modeset ; modprobe -i nvidia-current-drm $CMDLINE_OPTS

install nvidia-uvm modprobe nvidia ; modprobe -i nvidia-current-uvm $CMDLINE_OPTS

remove nvidia modprobe -r -i nvidia-drm nvidia-modeset nvidia-uvm nvidia

# These aliases are defined in *all* nvidia modules.
# Duplicating them here sets higher precedence and ensures the selected
# module gets loaded instead of a random first match if more than one
# version is installed. See #798207.
alias   pci:v000010DEd00000E00sv*sd*bc04sc80i00*        nvidia
alias   pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*        nvidia
alias   pci:v000010DEd*sv*sd*bc03sc02i00*               nvidia
alias   pci:v000010DEd*sv*sd*bc03sc00i00*               nvidia

But I've not rebooted it for a while and I can't remember if I got that message when I did.

Chris

Micronaut · 2019-07-13 21:38:08

Hmm, some people just don't turn their computer off at night?

The only difference I can see is that mine has an extra "remove" line. What matters, of course, is how these lines are invoked during the boot process. Following that process is a very technical feat that I'm not sure I can figure out. I hate to the Windows-style "uninstall and reinstall" routine, but that may be my only recourse. Or just ignore the error since the drivers seem to be working fine.

The officially official Devuan Forum!

#1 2019-07-07 23:38:45

Non-specific error message at boot from nVidia drivers

#2 2019-07-08 21:47:21

Re: Non-specific error message at boot from nVidia drivers

#3 2019-07-13 01:07:59

Re: Non-specific error message at boot from nVidia drivers

#4 2019-07-13 18:56:58

Re: Non-specific error message at boot from nVidia drivers

#5 2019-07-13 21:38:08

Re: Non-specific error message at boot from nVidia drivers

Board footer