Posts by Altoid

Altoid · Installation

Head_on_a_Stick wrote:

... sorry, my mistake.

Don't worry. 8^D

Head_on_a_Stick wrote:

... 4.19 kernel documentation doesn't seem to have a section for the e1000e module ...

I found this:

https://www.kernel.org/doc/html/v5.2/ne … 1000e.html

The e1000 driver is no longer maintained by Intel and is integrated into the kernel.
Not the case with e1000e up to now. (?)

See: https://www.intel.com/content/www/us/en … ducts.html

Intel Support wrote:

Note
The e1000 driver is no longer maintained as a standalone component. Request support from the maintainer of your Linux* distribution.

and

The Linux* e1000e driver supports the Intel® PRO/1000 PCI-E (82563/6/7, 82571/2/3/4/7/8/9, or 82583) I217/I218/I219 based gigabit network adapters.
--- snip ---
The drivers are only supported as a loadable module. We don't supply patches against the kernel source to allow for static linking of the drivers.

https://downloadmirror.intel.com/15817/eng/readme.txt

Head_on_a_Stick wrote:

You can also use the modinfo command.

I don't think it will make any difference.
The thing is that there are many sources on the web explaining that e1000e.EEE=0 is what is used to turn off the %&$# EEE.

Head_on_a_Stick wrote:

... error message is printed to the kernel ring buffer rather than stdout or stderr ...

I see ...

Edit:
Reading some more, I found a parameter called SmartPowerDownEnable:

SmartPowerDownEnable
Valid Range: 0,1
Default Value: 0 (disabled)
Allows the PHY to turn off in lower power states. The user can turn off this parameter in supported chipsets.

Just for the fun of it ...

[root@devuan groucho]# rmmod e1000e
[root@devuan groucho]# modprobe -v e1000e SmartPowerDownEnable=0
insmod /lib/modules/4.19.0-16-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko SmartPowerDownEnable=0
[root@devuan groucho]#

groucho@devuan:~$ sudo dmesg
--- snip ---
1972.926673] e1000e: eth0 NIC Link is Down
[ 2004.654613] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[ 2004.654617] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 2004.654790] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[ 2004.654793] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[ 2004.967316] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[ 2004.967321] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[ 2004.967388] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[ 2007.811375] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[ 2007.811486] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$

It works ...
But I'm not too enthusiastic about this one because I don't know exactly what it is or how close it is to EEE - or not.
In any case, default is 0 ie: disabled.

What do you think?

Thanks for your input.

Best,

A.

Altoid · Installation

Hello:

Head_on_a_Stick wrote:

... available module parameters:
ls /sys/module/e1000e/parameters

groucho@devuan:~$ ls /sys/module/e1000e/parameters
copybreak
groucho@devuan:~$ cat /sys/module/e1000e/parameters/copybreak
256
groucho@devuan:~$

Head_on_a_Stick wrote:

Official documentation here: https://www.kernel.org/doc/html/v4.19/n … e1000.html

Hmm ...
I think this is the e1000 driver but the 82566DM-2 controller uses the e1000e driver.
At least in my Devuan it loads the e1000e module.

groucho@devuan:~$ sudo ethtool -i eth0
driver: e1000e             <---- | x |
version: 3.2.6-k
firmware-version: 1.4-0
expansion-rom-version: 
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
groucho@devuan:~$

See https://downloadmirror.intel.com/15817/eng/readme.txt

Head_on_a_Stick wrote:

No:

$ doas modprobe -v e1000e madeup_nonsense=1                               
insmod /lib/modules/5.11.16-zen1-1-zen/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko.xz madeup_nonsense=1
$

Ah ...
Thanks for the heads up, good to know.
No error for madeup_nonsense=1 then?

Thanks for your input.

Best,

A.

Altoid · Installation

Hello:

Here I am again with another chapter of the e1000e saga.
This particular one regarding module loading how-to.

If interested, here's some background: https://dev1galaxy.org/viewtopic.php?id=4274

From what I have learnt, apart from how the install sets up modules to be loaded, it can be done via modprobe from the command line
eg:

groucho@devuan:~$ sudo modprobe e1000e

Also, module configuration parameters can be added by adding a proper stanza to the kerneo command line:

groucho@devuan:~$ sudo dmesg
--- snip ---
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-16-amd64 root=UUID=d6841f29-e39b-4c87-9c52-3a9c3bafe2d3 ro e1000e.EEE=0 .....
--- snip ---
groucho@devuan:~$

... or by adding a *.conf file in /etc/modprobe.d:

eg:

groucho@devuan:~$ echo "options e1000e EEE=0" | sudo tee /etc/modprobe.d/e1000e.conf
groucho@devuan:~$ cat /etc/modprobe.d/e1000e.conf
options e1000e EEE=0
groucho@devuan:~$

I don't know if there's more to this, but that's what I have an idea about.

Now, let's see what's happening with my nemesis, the e1000e module:

If I add the e1000e.EEE=0 stanza to the kernel command line, I get this line in dmesg:

groucho@devuan:~$ sudo dmesg | grep e1000e
--- snip ---
[ 2.158949] e1000e: unknown parameter 'EEE' ignored
[ 2.237022] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[ 2.257549] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
--- snip ---
groucho@devuan:~$

Curiously enough, calling the wrong module ie: igb.EEE=0 does not generate a message of any sort.
Right.

As the kernel command line trick obviously does not work, I tried using the *.conf above.
As a result, I get this line in dmesg:

groucho@devuan:~$ sudo dmesg | grep e1000e
--- snip ---
[    2.166788] e1000e: unknown parameter 'EEE' ignored
[    2.227702] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    2.241841] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
--- snip ---
groucho@devuan:~$

Clearly the e1000e module exists and is accessed (as far as the kernel is concerned), but does not accept the EEE parameter.

The last option I have is trying with modprobe.

1. see if it is loaded

groucho@devuan:~$ lsmod | grep -i e1000e
e1000e                282624  0
groucho@devuan:~$

2. unload it and check

[root@devuan groucho]# rmmod e1000e
[root@devuan groucho]# lsmod | grep e1000e
[root@devuan groucho]#

3. load it again with the required parameter

[root@devuan groucho]# modprobe -v e1000e EEE=0
insmod /lib/modules/4.19.0-16-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko EEE=0
[root@devuan groucho]#

Ahh ...
So now the unknown parameter is known?

Q: If it was unknown, wouldn't the -v have made modprobe print something to that effect?

man modprobe wrote:

-v, --verbose
Print messages about what the program is doing. Usually modprobe only prints messages if something goes wrong.

It seems that the e1000e module has EEE enabled by default.
See https://access.redhat.com/documentation … t_Ethernet

Not only does it seem impossible to disable it via the usual methods: ethtool cannot query or access EEE settings because the e1000e does not support it.

No idea as to how to go about this, this EEE is probably the source of my bad shutdowns but if I can't reliably turn it off, it not possible to know.
ie: if I can't query the controller, how can I know?

A different driver, more up to date from Intel? IBM? RedHat?
A backport from Chimaera?

Any ideas would be welcome.

Best,

A.

Altoid · Installation

Hello:

dice wrote:

do you still have 4.9.0-8-amd64 ...

No, I don't.

That's the reason I posted about the old modules in the first place.
I didn't understand why these files pertaining to 4.9.0-8-amd64 and 4.19.0-14-amd64 were still around.

More importantly, why in spite of having manually removed the old kernels (each time) these files were there.
Still don't know the reason, but they are not there anymore:

[root@devuan groucho]# dpkg -S /lib/modules/*
linux-image-4.19.0-16-amd64, linux-headers-4.19.0-16-amd64: /lib/modules/4.19.0-16-amd64
[root@devuan groucho]#

Thanks for your input.

Best,

A.

Altoid · Installation

Hello:

dice wrote:

... are only going to be found if you have that version of the kernel installed and not properly uninstalled ...
~$ dpkg -S /lib/modules/*
linux-image-4.19.0-14-amd64: /lib/modules/4.19.0-14-amd64
linux-image-4.19.0-16-amd64: /lib/modules/4.19.0-16-amd64
linux-image-4.19.0-6-amd64: /lib/modules/4.19.0-6-amd64

I think that is the idea.
Unneccesary modules which for some reason are still there.
I found them by sheer chance while wrestling with the e1000e module issue I have. (more module stuff in a next thread)

The dpkg -S /lib/modules/* stanza checks to see if all the modules in /lib/modules/* are properly referenced.
If there are any which are not, it informs that there was no matching path.
ie: a path to the corresponding kernel-image (?) among other things (?).

Thanks a lot for your input.

Best,

A.

Altoid · Installation

Hello:

dice wrote:

unless you are using that kernel ...

Yes, makes sense.
But you never know.

Found this a while ago:

https://unix.stackexchange.com/question … ib-modules

You run # dpkg -S /lib/modules/* to check whether any installed package matches those directories.
Then you can delete any directory for which the above says: dpkg-query: no path found matching pattern /lib/modules/...

Thanks for your input.

Best,

A.

Altoid · Installation

Hello:

Can't be a coincidence ...

My box runs the last Devuan:

groucho@devuan:~$ uname -a
Linux devuan 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux
groucho@devuan:~$

But I have just found out that the old e1000e driver module from 4.9.0-8 is still in my system.

groucho@devuan:~$ locate e1000e.ko
/lib/modules/4.19.0-16-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
/lib/modules/4.9.0-8-amd64/updates/drivers/net/ethernet/intel/e1000e/e1000e.ko
groucho@devuan:~$

Apparently it is the only module that has been left behind ...

groucho@devuan:~$ locate /updates/drivers
/lib/modules/4.9.0-8-amd64/updates/drivers
/lib/modules/4.9.0-8-amd64/updates/drivers/net
/lib/modules/4.9.0-8-amd64/updates/drivers/net/ethernet
/lib/modules/4.9.0-8-amd64/updates/drivers/net/ethernet/intel
/lib/modules/4.9.0-8-amd64/updates/drivers/net/ethernet/intel/e1000e
/lib/modules/4.9.0-8-amd64/updates/drivers/net/ethernet/intel/e1000e/e1000e.ko
groucho@devuan:~$

apt autoremove, apt autoclean and apt purge come up empty.

And synaptic shows no residual configurations.

Do I just zap it?

Thanks in advance,

A.

Altoid · Installation

Hello:

geki wrote:

... and hope for the best.

The plot thickens ...

Since I set up the igb.EEE=0 stanza in the kernel command line, things had been coming along well enough.

But this morning I had another, albeit different, bad shutdown.
It had not reared its head for the longest while, probably because it was obscured by the other one.

This one reboots the box on shutdown with the fans on.
Not as bad but still quite annoying.

I then realised that I had not edited my shutdown script to its previous version.
ie: the one disabling WoL before shutting down and had left it at the version that removed the e1000e module before shutting down.

ie:

This one ...

#!/bin/sh
# added to troubleshoot nic related bad shutdown 
PATH=/sbin:/bin:/usr/sbin:/usr/bin:

# sync
# remove e1000e module
# shutdown system directly (no shutdownhelper) 
sync && sudo rmmod -s -v e1000e && sudo shutdown -h now

instead of this other one ...

#!/bin/sh
# added to troubleshoot nic related bad shutdown 
PATH=/sbin:/bin:/usr/sbin:/usr/bin:

# sync
# disable onboard eth wol
# shutdown system directly (no shutdownhelper) 
sync && sudo ethtool -s eth0 wol d && sudo shutdown -h now

Made me think that it was the reason for the tty1 output being different.
ie: no e1000e: eth0 NIC Link is Down or e1000e: EEE TX LPI TIMER: 00000000 in the output.

And that maybe the igb.EEE=0 bit was not really working. 8^7

Once things were as I thought they should be, I rebooted and shutdown while getting a video and got the bad news:

The e1000e: eth0 NIC Link is Down and e1000e: EEE TX LPI TIMER: 00000000 lines now show in the output again.
So, the added stanza does not really work.

So I decided to make my shutdown script work a bit more and edited it to this version:

#!/bin/sh
# added to troubleshoot nic related bad shutdown 
PATH=/sbin:/bin:/usr/sbin:/usr/bin:

# sync
# disable onboard eth wol
# remove e1000e module
# shutdown system directly (no shutdownhelper) 
sync && sudo ethtool -s eth0 wol d && sudo rmmod -s -v e1000e && sudo shutdown -h now

A shutdown, reboot and video grab later got me this*:
* times edited for simplicity's sake

Devuan GNU/Linux 3 devuan tty1
devuan login: [        ] EXT4-fs (sda1): re-mounted. Opts: (null)
[        ] kvm: exiting hardware virtualization
[        ] sd 8:0:3:0: [sdg] Syncronizing SCSI cache
[        ] sd 8:0:2:0: [sdf] Syncronizing SCSI cache
[        ] sd 5:0:0:0: [sdb] Syncronizing SCSI cache
[        ] sd 5:0:0:0: [sdb] Stopping disk
[        ] sd 4:0:0:0: [sda] Syncronizing SCSI cache
[        ] sd 4:0:0:0: [sda] Stopping disk
[        ] ACPI: Preparing to enter sleep state S5
[        ] reboot: Power down

Looking back, it makes sense as the NIC driver in use is not the igb driver but the e1000e one.
I'll have to try and see what using the e1000e.EEE=0 stanza gets me.

Edit:

groucho@devuan:~$ sudo dmesg | grep e1000e
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-16-amd64 root=UUID=d6841f29-e39b-4c87-9c52-3a9c3bafe2d3 ro acpi_osi=Linux e1000e.eee=0 agp=off apparmor=0 ipv6.disable=1 enable_mtrr_cleanup nmi_watchdog=0
--- snip ---
[    2.158949] e1000e: unknown parameter 'eee' ignored
--- snip ---
groucho@devuan:~$

Very sorry for the screw up. 8^7

Best,

A.

Altoid · Hardware & System Configuration

Hello:

Altoid wrote:

My Sun Microsystems Ultra24 rig has a problem which up to now I’ve chalked up to a crap BIOS.
It happened with the previous original version it came with and with this one, which is the latest one available.

For an update on the status of this problem, see https://dev1galaxy.org/viewtopic.php?id=4274

tl;dr
Apparently, having EEE enabled on this NICs leaves the EEE TX LPI timer active at shutdown.
EEE works on the basis of auto-negotiation with the device it is connected to and if that device does not support EEE, the timer ends up waiting for a signal it won't receive.
The result is an unresponsive system requiring a hard shutdown.
I have not been able to find out why this happens in a totally aleatory manner and found no reliable way to reproduce it.

ethtool (4.19) is not able to query or access the Intel 82566DM-2 Gigabit NIC's EEE settings because their e1000e driver does not support it.
See the rest in the thread linked above.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... cause of misbehaving intel NICs wrt EEE are (old!) CAT5 network cables.

Would not be at all surprised.
But I don't think the POS router my telco provides has any EEE capability, so the problem is probably (in part) there.

geki wrote:

... try with CAT6/7 (S)FTP cable ...

That could be a solution, if I had any need for EEE, which I don't.
Like I said, I think it is more a hindrance/problem for anything but a portable/non-mains device.
And then, after careful consideration of the pros/cons.

Just how much does a NIC in a portable device use?
How much energy is actually saved by adding this layer of complexity to an already very complex device?

geki wrote:

e1000e EEE driver part seems to need to "see" some link layer state ...

From what I understand, EEE needs both devices (controller and router/switch/other controller) to autonegotiate what/how/when/whatever.
Otherwise it does not work.

More of a problem is (I think) that the Intel's e1000e driver blocks all access to EEE, both for querying status and changing settings.
And that Intel Support makes no mention of it whatsoever and throws Sun under the bus.

Thanks for your input.

Best,

A.

Altoid · Desktop and Multimedia

Hello:

fsmithred wrote:

... hate using 'su -'. Most of the time when I become root, I'm already in the directory where I want to do stuff, and I want to stay there.
I don't want to change to /root.

Ahh ...
And I thought I was the only one. 8^D

I'm trying hard to get use to su - and how it works.
Sometimes I fear that not being where I want to be (and in /root instead) will make/allow me to do something stupid.

fsmithred wrote:

... there's a way to restore the old behavior.
ALWAYS_SET_PATH yes
in /etc/default/su

But you still have to use su -, you just don't get sent to /root.
Right?

Best,

A.

Altoid · Installation

Hello:

Altoid wrote:

I have not yet heard from ethtool's maintainer with respect to that.

I got a reply from him this afternoon.

I had asked:

What does being able to disable the EEE TX LPI timer in my 2566DM-2 Gbe controller actually depend on?
Is it hardwired?
If so, could it be solved with a different firmware?

Here's a transcription of the relevant part of his reply:

In this case, ethtool is almost certainly only a messenger.
A request like this is passed to kernel and it's the NIC driver to either implement it or report that it is not supported.
And in your case it's querying the current setting that fails so it looks like either the device does not support getting and setting EEE parameters or the support in its driver (e1000e) is missing.

As clear as Perrier ...

So it's quite definitely the e1000e driver that is blocking access to both the status and configuration of the EEE settings in the 82566DM-2 GbE controller.
No doubt about that because I have been able to disable it completely via the igb.EEE=0 bit added to the kernel command line.

What is really irritating is that Intel Ethernet Products support insisted from the start that the only way to get it done was either through ethtool or modprobe, something which I repetedly reported as being non-working solutions:

This is the last I heard from them back in 01/2019:

Intel Ethernet Products wrote:

We typically turn off EEE using ethtool. Another method to do this is through modprobe as described in the readme for e1000e (https://downloadmirror.intel.com/15817/eng/readme.txt). It is not normal that EEE cannot be turned off with the previous methods, so it may be a change on the OEM end, and we are not aware of the modifications made by Sun. We strongly recommend to check with them on the root cause of your issue. Best regards ...

Draw your own conclusions.

Best,

A.

Altoid · Off-topic

Hello:

yeti wrote:

... upcoming deprecation of FTP in Firefox

Hmm ...
Many (many) years ago, I was a huge fan of FTP and used a very good FTP application under W95/98.
Like many other things from that age, I cannot remember the name but it was great and a free download for the likes of me.

Edit: it was WS_FTP LE 16-bit from ipswitch

It was back when there was a real/working FTP search engine permanently indexing the web (from Norway?), much better at finding files than any search site today.
I think it was bought out and shut down by Netscape. (')

I really would not going going back to using a separate FTP application, maybe one less thing for Google to spy into.

Just my $0.02

A.

Altoid · Installation

geki wrote:

... above-mentioned link says something different AFAIUI.

I'll have another look.

Maybe I missed something.
Thanks for the heads up.

geki wrote:

... use the latest kernel with its newest e1000e driver ...

From what I have seen, the e1000e driver has always been very problematic, its link to the Intel ME and how it works probably having much to do with that being so.
ie: up to now, an Intel network controller has been essential for IME to work.

The fact that I cannot disable my GbE controller (WTF?) or access all of its settings in Linux via the tool used to that effect (ethtool) speaks volumes.
I have not yet heard from ethtool's maintainer with respect to that.

Problems with this driver are known to exist as far back as 2012 (!) and it seems that, almost 10 years later, things have not changed much.
From where I sit, I don't see any need for the EEE feature in a desktop, workstation or server.
To me it is just another layer of complication, so I want it turned off.

I posted the solution I found to the Intel e1000e support forum.
But haven't heard from them yet and it's quite possible I won't.

In my view, this EEE thing it is only useful (and only to a limited extent) in a portable, battery operated device or one in which the network component tends to run hot.
eg: some SoCs

And the same goes for any other energy saving features they come up with.

If disabling the controller's EEE does effectively do away with the bad shutdown problem I have, that will be it for the time being.
I'll upgrade my kernels conservatively, as I have always done.
Which is why I ultimately chose Debian as my distribution.
And when the developers/maintainers turned into DebHoles, moved to Devuan.

geki wrote:

... and hope for the best.

Hope is in very short supply in this day and age.
I'd rather use the little there is for other, more important things. 8^D!

Thanks a lot for you input, your post steered me in the right direction.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... this kernel (early) log piped to serial line feature.
... works for shutdown, too?

I suppose it would.

But the last information you would get would be the tty1 output, up to where the screen reads reboot: Power down but doesn't power down and stays there.
Like you pointed out, waiting for a signal that it will never receive.

There's nothing more after that because at that stage, the OS is in a frozen and totally unresponsive state.
The only way out of that is a hard shutdown.

Thanks for your input.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... see kernel log of the shutdown.
... needs to be configured to generate them.
... check /var/log/kern.log* ...

Yes, that would be /var/log/kern.log.

But at that stage, (preparing to enter S5), all system files are read only.
ie: all drives have been synced and stopped, the last one being the one with the /boot partition.

I have never found any useful data with respect to this bad shutdown problem in the log files.
It happens when no one is looking and save for the useless and volatile tty1 printout, leaves no trace behind. 8^7

According to /var/log/auth.log, I added the igb.EEE=0 instruction to the kernel command line @08:27.
I then shutdown and got another bad shutdown instance.

But the kernel line addition was not active yet, so that bad shutdown only means that everything done to that point had had no effect.
Nothing new ...

These are the kernel log entries from that time forward, the last one at shutdown and the first one at boot:

groucho@devuan:/var/log$ tail -6000 kern.log
--- snip ---
Apr 19 08:29:27 devuan kernel: [ 3556.858961] e1000e: eth0 NIC Link is Down                                           <--- | fix not active |
Apr 19 08:31:30 devuan kernel: [    0.000000] microcode: microcode updated early to revision 0xa0b, date = 2010-09-28 <--- | fix active |
--- snip ---
Apr 19 08:45:16 devuan kernel: [  858.318960] e1000e: eth0 NIC Link is Down
Apr 19 08:47:43 devuan kernel: [    0.000000] microcode: microcode updated early to revision 0xa0b, date = 2010-09-28
--- snip ---
Apr 19 08:48:06 devuan kernel: [   58.328174] e1000e: eth0 NIC Link is Down
Apr 19 08:50:08 devuan kernel: [    0.000000] microcode: microcode updated early to revision 0xa0b, date = 2010-09-28
--- snip ---
Apr 19 11:42:27 devuan kernel: [10373.487079] e1000e: eth0 NIC Link is Down
Apr 19 12:51:32 devuan kernel: [    0.000000] microcode: microcode updated early to revision 0xa0b, date = 2010-09-28
--- snip ---
Apr 19 16:08:27 devuan kernel: [ 2166.724262] e1000e: eth0 NIC Link is Down
Apr 19 16:10:32 devuan kernel: [    0.000000] microcode: microcode updated early to revision 0xa0b, date = 2010-09-28
--- snip ---
groucho@devuan:/var/log$

Nothing in the log after the NIC is down.
The fact that the tty1 print out has no entries indicating the status of the NIC link or EEE (always present before) would seem to imply that we may be on the right track.

Now I just have to wait 15 days and see if stanza added to the kernel command line actually fixed the problem.
A 'bad shutdown' quarantine if you'll willing to pardon the pun.

Thanks a lot for your input.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

Just the symptoms from the far.

I have found a way to disable this Energy-Efficient Ethernet thing in my Intel e1000e on-board controller.

Where the ethool utility has no access, the kernel does: adding igb.EEE=0 to the kernel cmdline disables EEE during boot.

https://www.toradex.com/community/quest … ernet.html

This EEE issue seems to be something that has been around for a long time:
https://thatbytes.co.uk/posts/fun-with- … ontroller/ <--- 05/2012!

EDIT:
Doing all this again helped me remember that at one time (back in 2019) I tried solving this problem by adding the stanza e1000e.EEE=0 to the kernel command line.
It did not work: the e1000e: EEE TX LPI TIMER: 00000000 bit was also present in the tty1 output in case of a normal shutdown and the bad shutdowns kept happening.

See https://www.linuxquestions.org/question … ost5954899

Basically in order for EEE to kick in both devices need negotiate ...
... but the switch didn’t support this ...

Maybe my ISP provided cheapo router does not (most probably) support EEE.
But I was already having this problem when I was on a shared WiFi connection.

But not everything is good news.
After setting this parameter and rebooting to see the results, I got a bad shudown, albeit without the presence of the e1000e: EEE TX LPI TIMER: 00000000 bit:

Devuan GNU/Linux 3 devuan tty1
devuan login: [   864.785061] EXT4-fs (sda1): re-mounted. Opts: (null)
[   864.824466] kvm: exiting hardware virtualization
[   864.910856] sd 7:0:3:0: [sde] Syncronizing SCSI cache
[   864.911235] sd 7:0:2:0: [sdd] Syncronizing SCSI cache
[   864.911634] sd 5:0:0:0: [sda] Syncronizing SCSI cache
[   864.913092] sd 5:0:0:0: [sda] Stopping disk
[   865.013903] ACPI: Preparing to enter sleep state S5
[   865.014444] reboot: Power down

Note that there are no lines referring to the status of the NIC link or EEE:
ie:
e1000e: eth0 NIC Link is Down or e1000e: EEE TX LPI TIMER: 00000000.

Maybe (?) the bad shutdown was a consequence of having disabled EEE on the controller.
Won't know till at least 15 days go by without another bad shutdown.

Meanwhile, I'll enjoy my having found this tidbit of information.

[rant]
Information I asked for but the DHs at Intel e1000e support were unable to give me.
I'll assume that they did not know what I was talking about.
[/rant]

Best,

A.

Altoid · Installation

Hello:

geki wrote:

That is the nature of "My 2 cents: Delayed work is quite dangerous indeed. "

Ahh ...
Now I (sort of) understand.

geki wrote:

They delay the work item of the watchdog timer ...
... device is unloaded.
... watchdog tries to process the work item and instead of crashing or invalidating, it hangs waiting for the device ...
... no longer there and therefore cannot answer, to answer.

I get where you are going.

But ...
Maybe I did not explain myself correctly.

Without any intervention on my behalf ie: explicitly unloading the module before shutdown (see script posted previously), when I get a bad shutdown instance, the output on the screen is the same.

ie:

Devuan GNU/Linux ascii devuan tty1
devuan login: [483.367459] EXT-fs (sdc1): re-mounted. Opts: (null)
[485.772216] e1000e: eth0 NIC Link is Down
[485.776885] kvm: exiting hardware virtualization
[485.777756] sd 9:0:3:0: [sdf] Synchronizing SCSI cache
[485.778154] sd 9:0:2:0: [sdf] Synchronizing SCSI cache
[485.781519] e1000e: EEE TX LPI TIMER: 00000000
[485.785219] ACPI: Preparing to enter sleep state S5
[485.868007] reboot: Power down    <---- screen freezes at this point

Now, if I understand what you are saying, then I was right from the very start when I went looking for how to disable the #$&@ timer.

When a bad shutdown comes along, the e1000e: EEE TX LPI TIMER: 00000000 bit shows up and the box freezes as previously described.
But when the shudown is normal, the e1000e: EEE TX LPI TIMER: 00000000 bit is also there but the box shuts down properly.

So ...
It is a question of timing, maybe a +/- 0.5s somewhere may be the 'trigger' for the bad shutdown which has proven impossible for me to replicate.
Absolutely unpredictable, I have had more than a fortnight of uneventful shutdowns and as many as three in just a couple of days.
Then, out of nowhere ... 8^7

geki wrote:

Just the symptoms from the far.

Thanks for the heads up.

geki wrote:

... this is what they reverted, not to push delayed work items but process directly or otherwise "simpler".

So it is a kernel problem?
Of course, Intel is doing nobody a favour by making it impossible to disable the EEE Timer.
That would be a very quick fix.

---

geki wrote:

... aided someone else with e1000e issues a decade ago ...
... actually worked then with the e1000e linux developer back then.
... developer of the e1000e actually did not own the hardware, IIRC.

Not too easy to test then ...
Maybe that's why the driver is such a POS?

geki wrote:

... best to avoid that chipset ...

Indeed ...

Unfortunately, it is what came onboard with the U24 which is, even by today's standards, a great piece of kit.
Keep in mind that it was brought to market almost 15 years ago, in mid 2007.

It was a great buy for me, practically brand new.
And all my slots are filled, so I'll have to make do till this glitch finally gets fixed or I can do something kernel-wise.

Because I hope to be using this HW for a few years more, maybe with a faster SAS controller and SSDs.

Thanks a lot for you input.
Finally I have had some light shed on this rather annoying problem.

Best,

A.

Altoid · Installation

Hello:

Please bear with me, I think our posts may have crossed.

geki wrote:

That c19 refers to patch from c11.

Yes.

I was making a note wrt the fact that ime and ime_me were being put into play, for whatever reason.
Like I said, Intel ME and the e1000e controller go hand in hand.
Nasty crap ...

geki wrote:

... important patch is referenced in c33 and c55, which went in upstream kernel 5.5.
... seems to undo a major regression wrt the watchog timer handling.

I see.

geki wrote:

... if you can, you should test kernel 5.5 or newer from beowulf-backports.

I'm usually weary of new kernels, have a tendency to screw up things which have been working perfectly well from a long time back.
I'd feel much more comfortable if there was a patch or a point release eg: Devuan Beowulf 3.2 or whatever.

geki wrote:

... you tested that kernel versions already...

No.
I have tried booting live distros using newer kernels to see what dmesg had to say as compared to what it says on Beowulf.
But that's about it, no long term testing.

In my experience with this problem, I can go as much as 15/18 days without a bad shutdown and maybe as many as three in two days.
My average boot/shutdown cycle count is roughly 5 to 7 a day, ocassionally a couple more.

geki wrote:

See: https://pkginfo.devuan.org/cgi-bin/poli … mage-5.10*
example wrote:
linux-image-5.10.0-0.bpo.4-amd64 5.10.19-1~bpo10+1
http://deb.devuan.org/merged beowulf-backports/main amd64

Thanks for the info.
I'll check it out.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... guess you know ...
... open linux kernel bug[0] for the e1000e watchdog timer.

No.

Had no idea but I am not at all surprised that the bug exists.
Nor am I surprised that it is unsolved almost two years later.

What I do know (own experience, web-wide rants) is that the e1000e, besides being is a real piece of work, is intrinsically linked* to the Intel Management Engine.

In my Ultra 24's BIOS, the GbE entry is greyed out, you cannot disable it.

ie: the box's owner, OS administrator cannot disable the on-board GbE controller or access the settings the entry presumably allows you to change.
I have not been able to find any instructions anywhere on how to do it. Owner's manual, Field manual, etc. have no mention of it.
WTF?

I run Beowulf which uses the 3.2.6-k version, the problems reported seem to be with the version with the upstream kernels.
And the shutdown problem I have is present from when I first installed Linux on this box, around late 2015.

Upgrading to the last BIOS available (1.56) did not solve the problem.

* : see this post from your link to the bug: https://bugzilla.kernel.org/show_bug.cgi?id=205047#c19

Vitaly Lifshits 2019-10-17 10:47:38 UTC
Please try:
1. rmmod mei && rmmod mei_me <----------------------------- | x |
2. removing the if in the patch and moving the call e1000_phy_hw_reset(&adapter->hw) outside of the while loop:
if (!(pcim_state & E1000_STATUS_PCIM_STATE))
e1000_phy_hw_reset(&adapter->hw);

Understand what I am referring to?

My installation has both modules (mei and mei_me) blacklisted in /etc/modprobe.d.
And yes, the problem I have was there before I blacklisted the modules.

geki wrote:

Did you test that Kernel already?

No, not planning to do it.

geki wrote:

... 4.19 still seems to need a reworked patch.

Lockdown has still some time to go, so I may look into trying the DKMS version of the driver.
But I've never done such a thing it and would have to study a bit to see how it is done.

But what really I want to do is to disable the EEE timer.
It was the first thing I thought of doing and have come back to it after trying all sorts of ACPI magic, kernel command entries and DSDT mods, to no avail.

Maybe the email I wrote to the e1000e driver ethtool utility maintainer will get me something.

I fail to understand why e1000e: EEE TX LPI TIMER: 00000000 is still there at shutdown if the module had been previously removed.

Thanks for your input.

Best,

A.

Edited: mail was actually sent to maintainer of ethtool utility - on if it is at all possible to disable EEE Timer / how to do it.
-

Altoid · Installation

Hello:

While (still) trying to track down the origin of the shutdown problem that occasionally affects my U24 box, I have come back to the matter of the infamous Intel e1000e GbE controller.

Randomly, on shutdown the rig will freeze with all fans going at full blast with this output on the TTY1 screen:

Devuan GNU/Linux ascii devuan tty1
devuan login: [483.367459] EXT-fs (sdc1): re-mounted. Opts: (null)
[485.772216] e1000e: eth0 NIC Link is Down
[485.776885] kvm: exiting hardware virtualization
[485.777756] sd 9:0:3:0: [sdf] Synchronizing SCSI cache
[485.778154] sd 9:0:2:0: [sdf] Synchronizing SCSI cache
[485.781519] e1000e: EEE TX LPI TIMER: 00000000
[485.785219] ACPI: Preparing to enter sleep state S5
[485.868007] reboot: Power down    <---- screen freezes at this point

From [485.868007] onwards, the only way out is a hard shutdown.

One idea I picked up during my searches on the web was to disable the EEE TX LPI timer.
Made sense, that was the last thing active, maybe it was not working properly.

But the onboard controller will have none of it:

[root@devuan ~]# ethtool --set-eee eth0 tx-lpi off
Cannot get EEE settings: Operation not supported
[root@devuan ~]#

Nor would it inform me of the actual state of the timer:

[root@devuan]# ethtool --show-eee eth0
Cannot get EEE settings: Operation not supported
[root@devuan]#

This is rather strange as the MSWindows driver allows me to change these parameters:

ie:
---
settings > control panel -> system -> hardware -> device manager -> network adapters
Intel PRO/1000 MT Desktop Adapter

Advanced -> Wait for Link=Off | Wake on Link Settings=Disabled | Wake on Settings=Disabled
Power Management -> Allow the computer to turn off this device to save power -> unchecked
---

These settings survive both the reboot of the VM and a reboot on the host.

The first question would be why this would be so.
Why can't ethtool do the same thing? (v.4.19)

Then I reasoned that the next best thing would be to unload the e1000e module before the shutdown command, so I put together a script which took the place of the absurd xfsm-shutdown-helper bundled along with Xfce:

#!/bin/sh
# shutdown system directly (no shutdownhelper) 
# disable onboard eth wol
# remove e1000e module

PATH=/sbin:/bin:/usr/sbin:/usr/bin:
# sync && sudo ethtool -s eth0 wol d && sudo shutdown -h now
sync && sudo rmmod -s -v e1000e && sudo shutdown -h now

Thinking that with the e1000e module removed, it would be the end of the EEE timer on shutdown, I made a video grab of the shutdown process.
But to my surprise, the damned thing was still there ie: the shutdown screen still included a line for the EEE timer.

Devuan GNU/Linux 3 devuan tty1
devuan login: [   286.719428] e1000e: eth0 NIC Link is Down
--- snip ---
[287.219230] e1000e: EEE TX LPI TIMER: 00000000              <-------------- | x |
[287.223022] ACPI: Preparing to enter sleep state S5
[287.223551] reboot: Power down

Now, if the module was unloaded, why is the EEE timer still around after the fact?
I can confirm the module gets unloaded as the LAN link goes down down immediately, both with rmmod and with modprobe -r.

I once tried to get something useful from the Intel chaps, they really don't have the slightest clue.
ie: a waste of time

I wrote the maintainer of ethtool a couple of days ago but have not had a reply yet.

Any insight on this would be appreciated.

Thanks in advance,

Best,

A.

Altoid · Installation

Hello:

Head_on_a_Stick wrote:

... you must have missed them when you copied it.

Painted short, probably?
Have to be more careful.

Head_on_a_Stick wrote:

... script used bash but didn't contain anything that actually needed bash.

I see.

Head_on_a_Stick wrote:

... prefer /bin/sh over /bin/bash because it's faster, lighter and less buggy.
The Debian developers also prefer /bin/sh for the same reasons and went to quite some effort replacing all of the bash system scripts with /bin/sh versions.

Kudos to them.
Thanks for taking the time to explain.

Best,

A.

Altoid · Installation

Hello:

Head_on_a_Stick wrote:

Use https://www.shellcheck.net/ to test scripts.

Thanks for the heads up.
Will bookmark that one for the next time.

Being a script from github, I assumed an error of some sort at my end.
As it seemed harmess enough, I just copied it, made it executable and tried it.
Never thought it would have a problem.

Head_on_a_Stick wrote:

Better version:

#!/bin/sh
 
if [ $# -gt 1 ] || [ "$1" != enable ] && [ "$1" != disable ]; then
         echo "Usage: $0 <enable|disable>"
         exit 1
fi

if [ "$1" = enable ]; then
        TOGGLE=$(grep '\*disabled' /proc/acpi/wakeup | cut -d ' ' -f1)

else

        TOGGLE=$(grep '\*enabled' /proc/acpi/wakeup | cut -d ' ' -f1)

fi

for DEV in $TOGGLE ; do
        echo "$DEV" 
        echo "$DEV" > /proc/acpi/wakeup

done

Right.

Head_on_a_Stick wrote:

POSIX sh ftw!

Don't quite follow you, but I'll take your word for it. 8^)

Head_on_a_Stick wrote:

... simplified further if all you want to do is disable everything:

Yes, that would be much better.
These /proc/acpi/wakeup settings are from S4, which my box doesn't ever/won't ever go into.
And seeing how flaky ACPI tables can be, it's better to keep this disabled.

Head_on_a_Stick wrote:

#!/bin/sh

dev=$(awk '/*enabled/{print $1}' /proc/acpi/wakeup)

for i in $dev ; do
   echo "$i" > /proc/acpi/wakeup
done

Works a charm! 8^D

groucho@devuan:~$ sudo ./acpi_wakeups.sh
[sudo] password for groucho: 
groucho@devuan:~$

groucho@devuan:~$ cat /proc/acpi/wakeup
Device	S-state	  Status   Sysfs node
USB0	  S4	*disabled  pci:0000:00:1d.0
USB1	  S4	*disabled  pci:0000:00:1d.1
USB2	  S4	*disabled  pci:0000:00:1d.2
USB5	  S4	*disabled
EUSB	  S4	*disabled  pci:0000:00:1d.7
USB3	  S4	*disabled  pci:0000:00:1a.0
USB4	  S4	*disabled  pci:0000:00:1a.1
USB6	  S4	*disabled  pci:0000:00:1a.2
USBE	  S4	*disabled  pci:0000:00:1a.7
P0P1	  S4	*disabled  pci:0000:00:01.0
P0P2	  S4	*disabled  pci:0000:00:06.0
P0P3	  S4	*disabled  pci:0000:00:1c.0
BR11	  S4	*disabled
BR12	  S4	*disabled
BR13	  S4	*disabled
P0P4	  S4	*disabled  pci:0000:00:1c.4
BR15	  S4	*disabled
P0P5	  S4	*disabled  pci:0000:00:1e.0
GBE	  S4	*disabled  pci:0000:00:19.0
SLPB	  S4	*disabled
groucho@devuan:~$

Thank you very much for your input.

Best,

A.

Altoid · Installation

Hello:

chris2be8 wrote:

... use $(...) instead, it's easier to read if you don't get muddled about what sort of brackets to use.

I am not the author of the script.
Unfortunately, I don't have a clue as to how this all works.

Thanks for your input.

Best,

A.

Altoid · Installation

Hello:

dice wrote:

... need to be run as the root user?

Yes, of course.

groucho@devuan:~$ sudo ./acpi_wakeups.sh enable
./acpi_wakeups.sh: line 10: \*disabled: command not found
groucho@devuan:~$

groucho@devuan:~$ sudo ./acpi_wakeups.sh disable
./acpi_wakeups.sh: line 14: \*enabled: command not found
groucho@devuan:~$

Just in case ...

[root@devuan ~]# /home/groucho/acpi_wakeups.sh enable
/home/groucho/acpi_wakeups.sh: line 10: \*disabled: command not found
[root@devuan ~]#

[root@devuan ~]# /home/groucho/acpi_wakeups.sh disable
/home/groucho/acpi_wakeups.sh: line 14: \*enabled: command not found
[root@devuan ~]#

Thanks for your input.

Best,

A.

The officially official Devuan Forum!

#1251 Re: Installation » Modules, modprobe(.d) and kernel command line » 2021-04-23 20:52:05

#1252 Re: Installation » Modules, modprobe(.d) and kernel command line » 2021-04-23 19:01:36

#1253 Installation » Modules, modprobe(.d) and kernel command line » 2021-04-23 17:35:56

#1254 Re: Installation » [SOLVED] Old 4.9.0-8-amd64 modules in /lib/modules » 2021-04-23 15:29:21

#1255 Re: Installation » [SOLVED] Old 4.9.0-8-amd64 modules in /lib/modules » 2021-04-23 14:57:14

#1256 Re: Installation » [SOLVED] Old 4.9.0-8-amd64 modules in /lib/modules » 2021-04-23 14:09:40

#1257 Installation » [SOLVED] Old 4.9.0-8-amd64 modules in /lib/modules » 2021-04-22 23:38:11

#1258 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-22 19:55:46

#1259 Re: Hardware & System Configuration » Shutdown problem - e1000 driver bug? » 2021-04-22 13:39:10

#1260 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-21 09:46:22

#1261 Re: Desktop and Multimedia » Beowulf is broken » 2021-04-20 22:12:36

#1262 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-20 21:19:30

#1263 Re: Off-topic » ${THEY} continue crippling browsers... » 2021-04-20 16:39:13

#1264 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-20 11:42:53

#1265 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-19 22:50:50

#1266 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-19 19:58:21

#1267 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-19 12:34:07

#1268 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-18 23:11:23

#1269 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-18 22:50:12

#1270 Re: Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-18 17:59:50

#1271 Installation » Linux e1000e module removal and e1000e EEE timer » 2021-04-18 13:40:38

#1272 Re: Installation » [SOLVED] Help with script » 2021-04-14 19:25:36

#1273 Re: Installation » [SOLVED] Help with script » 2021-04-14 18:47:08

#1274 Re: Installation » [SOLVED] Help with script » 2021-04-14 18:25:09

#1275 Re: Installation » [SOLVED] Help with script » 2021-04-14 15:47:54

Board footer