Posts by Altoid

Altoid · Installation

Hello:

Note:
Somehow this whole post got lost.
I recovered it but the posting order may have been altered.
Sorry ...

geki wrote:

For disabling CONFIG_PM, you have to build your own kernel.

Hmm ...
Not on my list.

geki wrote:

... seems to be any pm tooling installed.

How is it that it is done in systemd distributions?

There's no pre-systemd / Linux Devuan equivalent to sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target?

I was reading here: https://answers.launchpad.net/acpi-supp … tion/36260
Since my system had no /etc/default/acpi-support file, I added one:

groucho@devuan:~$ cat /etc/default/acpi-support
# Comment the next line to disable ACPI suspend to RAM
ACPI_SLEEP=false

# Comment the next line to disable suspend to disk
ACPI_HIBERNATE=false
groucho@devuan:~$

Don't know if it does anything at all.

geki wrote:

... remind you not to use a 4.x kernel and neither kernel version < 5.5.

I do have your suggestion on my desk.
Not forgotten, just postponed. 8^)

geki wrote:

Unless we see some NIC hang ...
... mostly out of thoughts.

Well ...

Your efforts did unearth a lot of e1000e fun which could be put to good use by the module's maintainers.
If nothing else, the quality of the code will be been significantly improved by your work.
No one can thumb their nose at that.

Now, what could happen?

1. another bad shutdown.
It's been ~10 days since I started using the last version of the e1000e module ie: three patches + a std. shutdown.
The third patch was then edited further to accomodate various debug scenarios, but I understand that it was basically the same as v2.
Up to that point I had been without a bad_shutdown for at least a week.

I think that the bad_shutdown I had is linked to the bad_boot.
And the bad boot may (?) have been linked to not setting WoL to disabled on shutdown.
Like I mentioned previously, my shutdown script now sets WoL to disabled on shutdown but does not remove the e1000e module.

But there's that S3 that is bothering me and I'd like to know how to get rid of.

2. another bad_boot followed by a bad_shutdown.
My money is on getting rid of S3 to prevent that.
Maybe setting WoL to disabled serves the same purpose?

3. none of the above.
If in 30/45 days' time we are still at 3. then it could mean that something you have edited/changed with your patches to the e1000e module has had effect. 8^D!

But for now, we have to wait and see.

geki wrote:

... check /etc/init.d/halt for hddown= and netdown=, which you can disable by respective configuration settings from /etc/default/halt.

Let's see:

groucho@devuan:~$ cat /etc/default/halt
# Default behaviour of shutdown -h / halt. Set to "halt" or "poweroff".
HALT=poweroff
groucho@devuan:~$

groucho@devuan:~$ cat /etc/init.d/halt
#! /bin/sh
### BEGIN INIT INFO
# Provides:          halt
# Required-Start:
# Required-Stop:
# Default-Start:
# Default-Stop:      0
# Short-Description: Execute the halt command.
# Description:
### END INIT INFO

NETDOWN=yes

PATH=/sbin:/usr/sbin:/bin:/usr/bin
[ -f /etc/default/halt ] && . /etc/default/halt

. /lib/lsb/init-functions

do_stop () {
	if [ "$INIT_HALT" = "" ]
	then
		case "$HALT" in
		  [Pp]*)
			INIT_HALT=POWEROFF
			;;
		  [Hh]*)
			INIT_HALT=HALT
			;;
		  *)
			INIT_HALT=POWEROFF
			;;
		esac
	fi

	# See if we need to cut the power.
	if [ "$INIT_HALT" = "POWEROFF" ] && [ -x /etc/init.d/ups-monitor ]
	then
		/etc/init.d/ups-monitor poweroff
	fi

	# Don't shut down drives if we're using RAID.
	hddown="-h"
	if grep -qs '^md.*active' /proc/mdstat
	then
		hddown=""
	fi

	# If INIT_HALT=HALT don't poweroff.
	poweroff="-p"
	if [ "$INIT_HALT" = "HALT" ]
	then
		poweroff=""
	fi

	# Make it possible to not shut down network interfaces,   <-------- | x |               
	# needed to use wake-on-lan                               <-------- | x |                        
	netdown="-i"
	if [ "$NETDOWN" = "no" ]; then
		netdown=""
	fi

	log_action_msg "Will now halt"
	halt -d -f $netdown $poweroff $hddown
}

case "$1" in
  start|status)
	# No-op
	;;
  restart|reload|force-reload)
	echo "Error: argument '$1' not supported" >&2
	exit 3
	;;
  stop)
	do_stop
	;;
  *)
	echo "Usage: $0 start|stop" >&2
	exit 3
	;;
esac

:
groucho@devuan:~$

geki wrote:

In my box, netdown and hdown are set.
You may set that configuration parameters, so that they are disabled.

geki wrote:

I put
read -p "Press enter to halt ($netdown $poweroff $hddown)" reply
before
halt -d -f $netdown $poweroff $hddown
to see what is set.

Right.
I'll edit that into /etc/init.d/halt and see how it behaves.
If I understand correctly, it will shut down on Enter.

Thanks for your input.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... check kernel commandline parameter pcie_port_pm=off.
... disabling sleep states for your pci express slots and NIC.

I think I used something pci=ish without results.
Have to see my notes.

geki wrote:

... also the parameter apm ...
... some suspend software installed in /etc/pm, /etc/apm or /etc/acpi and check for such in /etc/default.

I don't have pm-utils installed, removed it some time ago.

groucho@devuan:~$ apt list | grep -i installed | grep -i pm-utils
--- snip ---
groucho@devuan:~$

Notwithstanding, I do have these:

groucho@devuan:~$ ls -R /etc/apm/
/etc/apm/:
event.d

/etc/apm/event.d:
20hdparm
groucho@devuan:~$

This for spinning down HDDs if not on AC.
Always on AC but I presume it works as intended.

groucho@devuan:~$ ls -R /etc/acpi/
/etc/acpi/:
events  powerbtn-acpi-support.sh

/etc/acpi/events:
powerbtn-acpi-support
groucho@devuan:~$

This to initiate shutdown when the power button is pressed.
I disabled this in /etc/default/acpid because I inadvertently touched the recessed power button more than a few times. 8^/
Also because I'd rather shutdown via terminal or script.

groucho@devuan:~$ ls -R /etc/default/
/etc/default/:
acpid         cacerts        dbus    grub.d         hwclock          locale~               ntpdate        rsyslog          su              useradd
anacron       console-setup  devpts  grub.ucf-dist  intel-microcode  networking            rcS            saned            su~             wicd
autofs        cpufrequtils   exim4   halt           keyboard         networking.dpkg-dist  rcS.dpkg-dist  saned.dpkg-dist  sysstat
avahi-daemon  crda           gdomap  haveged        keyboard~        nfs-common            rkhunter       saned~           timeshift.json
bsdmainutils  cron           grub    hddtemp        locale           nss                   rsync          smartmontools    tmpfs

/etc/default/grub.d:
init-select.cfg
groucho@devuan:~$ ls -R /etc/default/

groucho@devuan:~$ cat /etc/default/acpid
# Options to pass to acpid
#
# OPTIONS are appended to the acpid command-line
# enabled 20181108 to log events to syslog 
OPTIONS="-l"

# Linux kernel modules to load before starting acpid
#
# MODULES is a space separated list of modules to load, or "all" to load all
# acpi drivers, or commented out to load no module
#MODULES="battery ac processor button fan thermal video"
#MODULES="all"
groucho@devuan:~$

I added OPTIONS="-l" back in 2018 to see if I could get anything written to a(ny) log.

Thanks for your input.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... the /sys/power stuff belongs to Kernel CONFIG_PM I guess.
Feel free to disable that.

Hmm ...
Sure.

Q: how do I go about that.
ie: the opposite of # echo mem > /sys/power/state, effectively removing mem?
Cannot find anything specific about that for non-systemd distributions.

My idea is that if I remove anything S3 related from the system, it may (?) keep whatever system state is set in BIOS from activating.

geki wrote:

... also try shutdown -h -P to halt and power off.

Just as you point out:

groucho@devuan:~$ cat /etc/default/halt
# Default behaviour of shutdown -h / halt. Set to "halt" or "poweroff".
HALT=poweroff
groucho@devuan:~$

geki wrote:

... if you got to the frozen "reboot: Power down", press Alt + SysRq (Print Screen key) + o for shutdown

Hmm ...
I'm not sure that I did try it but without the expected result.
I'll remember for next time but I think (?) the kb was totally unresponsive.

geki wrote:

See: http://blog.kember.net/articles/reisub- … x-restart/

Thanks for the heads up.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... does not sound like an NIC (driver) issue.
... if boot already goes noisy.

I agree.

I have always wondered if the bad_boot and the bad_shutdown problems were indpendent of each other or if they shared more than the fans at full blast symptom.

Out of fear of some part of the filesystem getting borked, I had never allowed the boot sequence to go on, aborting it and getting a clean boot afterwards.
It will reboot after a time-out unless you explicitly allow the boot sequence to continue.

This time, not aborting the sequence revealed a bad_shutdown right after a bad_boot.
Then, going back to the Sun Microsystems *.pdf on the matter and seeing the diagnostic put forth (S0, S3, etc.) again made my doubts resurface.

ie:

"If you power on the workstation before the system enters the S3 ... "

Q: why would the system be entering S3 or any other save S5 in the first place?

This probably has to do with my not being able to disable ME "Firmware Power Control" and "Host Sleep States" in BIOS.
It is true that I pressed the power button immediately after power off and blank screen.
But I have not been able to reproduce it.

Like I mentioned, I have edited to shutdown script to set WoL to disabled as before, just not removing the e1000e module.

With respect to system states, dmesg states:

groucho@devuan:~$ sudo dmesg | grep S0
[    0.729378] ACPI: (supports S0 S1 S3 S4 S5)
groucho@devuan:~$

I'm only interested in S0 and S5 but ...

groucho@devuan:/sys/power$ cat /sys/power/state
freeze standby mem disk
groucho@devuan:/sys/power$

I have seen how, if needed, freeze, standby, mem and disk can be added to /sys/power/state to enable power states S0, S1, S3 and S4 respectively.
eg: # echo mem > /sys/power/state

And have found how to disable power states in systemd distributions.
eg: sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target

But I have not found out how to do that in Devuan.
I am guessing that not having those values (specifically S3) could help finding out what is going on.

How to do the opposite of # echo mem > /sys/power/state ?

Thanks in advance.

A.

Altoid · Installation

Hello:

Update

Altoid wrote:

All shutdowns normal up to now. 050520201@21:03 GMT

That has not changed.
But I did get a bad boot which then resulted in a bad shutdown.

A bad boot is when on starting up the system, both CPU and case fans start to run at 100% and the BIOS stops with a "CPU Fan error" notice.
It asks if you want to continue or press ESC or an Fx to abort, can't recall.

Sorry, no video or grab as I was obviously not ready for it.

This has not happened in a while and although the fans running at 100% is what this bad boot has in common with the bad shutdown, I have always thought they were for different causes.
Sun Microsystems had at one time diagnosed (but evidently not bothered to fix) this problem in a Sun Product Notes *.pdf for this WS (2009) where it says it can happen and why:

Sun MS wrote:

CPU Fan Error Might Occur After Power On
If you power on the workstation before the system enters the S3 sleep
state, a CPU fan error might occur.

It also provided a workaround which consists in accessing the Management Engine (ME) BIOS Setup utility to change the power policies.
You have to set ME "Firmware Power Control" to ON and "Host Sleep States" to ON in S0, S3.
I changed "Host Sleep States" from S1, S3 to S0, S3 but every so often the CPU Fan error came around again.

[rant]
But why would I want this?
It is basically allowing Intel ME to start up you workstation remotely.
[/rant]

So I tried to set ME "Firmware Power Control" or "Host Sleep States" to OFF, effectively disabling sleep of any type in my box.
Because ...
WTHF does a server/workstation need a damn S state different than S5 for?

As a result all havok broke loose: on reboot with the box frozen at the start of the BIOS sequence, both CPU and case fans at 100%.
I was scared shitless that my new WS was done for.
Only way out was a hard shutdown, a CMOS clear and a ME BIOS reflash.

I believe that this is closely related to the fact that it is not possible to disable the on-board GbE LAN in the BIOS. (it is greyed out)
That and the Intel e1000 driver is *always* enabling WoL no matter what settings you give it.
Which is why I had WoL set to OFF both at boot and at shutdown via a shutdown script.

In any case, "Host Sleep States" is evidently set to ON and S1, (not S0) and S3.

I insisted with my attempt once again before starting the first part of this thread, with the same results.
Not as scared and more confident around the hardware than the first time I tried it, but in a sweat till I saw a working boot screen come up.

But I digress ...

Instead of aborting the boot sequence I continued to boot into Devuan, which went on without any other problem than the fans blowing continuously at 100%.

I got a copy of dmesg checked that everything was working properly and proceeded to shut down as I am usually doing these days. ie: plain shutdown -h now, no script.

The result was another bad shutdown, like the ones I usually get.
Here's the shutdown screen:

No different than what I am getting these days with a normal shutdown.
ie: contains no debug data.

I will edit the shutdown script to disable WoL as I had been doing to see if there's any change in this behaviour.

Thanks in advance,

A.

Altoid · Desktop and Multimedia

Hello:

Head_on_a_Stick wrote:

... dependencies are mostly from sid/ceres rather than experimental.
... that .deb cannot be installed in a beowulf system without breaking it.

Enough for me not to consider attempting it.

Head_on_a_Stick wrote:

... could try backporting it ...
... not worth the bother.

I agree.

Thanks for your input.

A.

Altoid · Hardware & System Configuration

Hello:

fsmithred wrote:

... individual core temps.

Same here.

Here's the line I'm using, not my work.
I recall getting the format right here at Dev1:

TEMPERATURES
${hr 2}
Core 0: +${hwmon 0 temp 2} C  $alignc Core 1: +${hwmon 0 temp 3} C
Core 2: +${hwmon 0 temp 4} C  $alignc Core 3: +${hwmon 0 temp 5} C
${hr 0.3}

Looks (sort of) like this:

TEMPERATURES
_____________________________
Core 0: +44 C   Core 1: +41 C
Core 2: +40 C   Core 3: +41 C
_____________________________

Best,

A.

Altoid · Desktop and Multimedia

Hello:

Head_on_a_Stick wrote:

... the way I would try it.
... technically "safe" to add the experimental repositories directly ...

Hmm ...
There must be a solid reason for experimental. 8^D

I remembered looking up this once.
It seems that dpkg -i package_file.deb does not take care of dependencies prooperly or at all.

Which apt install ./package_file.deb does:

groucho@devuan:~/Downloads$ sudo apt install ./libgtk-3-0_3.24.29-1_amd64.deb
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libgtk-3-0' instead of './libgtk-3-0_3.24.29-1_amd64.deb'
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libgtk-3-0 : Depends: libatk1.0-0 (>= 2.35.1) but 2.30.0-2 is to be installed
              Depends: libc6 (>= 2.29) but 2.28-10 is to be installed
              Depends: libgdk-pixbuf-2.0-0 (>= 2.40.0) but it is not installable
              Depends: libglib2.0-0 (>= 2.59.0) but 2.58.3-2+deb10u2 is to be installed
              Depends: libjson-glib-1.0-0 (>= 1.5.2) but 1.4.4-2 is to be installed
              Depends: libpango-1.0-0 (>= 1.45.5) but 1.42.4-8~deb10u1 is to be installed
              Depends: libxcomposite1 (>= 1:0.4.5) but 1:0.4.4-2 is to be installed
              Depends: libgtk-3-common (>= 3.24.29-1) but 3.24.5-1 is to be installed
E: Unable to correct problems, you have held broken packages.
groucho@devuan:~/Downloads$

Head_on_a_Stick wrote:

... if you wanted but be sure to play close attention to the suggested course of action before accepting.

My guess is that I'd have to fetch all these unmet dependencies from experimental also.
But I'd rather not dive into dependency hell and be sure.

I think I'll live with the occasional crash and wait.
In any case I'll eventually be ditching Xfce4.
I don't like where it is going, so I may try LXDE, à la Knoppix Live or a #! solution.

Thanks for your input.

Best,

A.

Altoid · Desktop and Multimedia

Hello:

Head_on_a_Stick wrote:

v3.24.29 seems to be available in Debian's experimental repositories:
http://deb.debian.org/debian/pool/main/ … _amd64.deb

Thanks for the heads up. 8^)

So ...
Download the .deb and install with dpkg -i libgtk-3-0_3.24.29-1_amd64.deb?
Or is there a better, more fool-proof way?

Thanks in advance,

A.

Altoid · Desktop and Multimedia

Hello:

Altoid wrote:

The issue is evidently xfce4 inter-component related.

Apparently (?) the problem has been fixed/worked around.

https://gitlab.gnome.org/GNOME/gtk/-/issues/3715

From what I think I understand, a patch has been written up so that the affected applications don't crash.
ie: the root of the problem (in xfce4?) will remain where it is till the next time someone reports something else.

The Devuan Beowulf repository has libgtk-3-0 3.24.5-1 and the Chimaera repository has 3.24.24-3.
The patch is in 3.24.29.

How can I go about checking to see if 3.24.29 fixes the problem?

Thanks in advance,

A.

Altoid · Installation

Hello:

geki wrote:

Then, we need the dmesg output to have a complete view.

dmesg at boot + rmmod e1000e + modprobe -v e1000e, like this?

groucho@devuan:~$ sudo dmesg | grep "e1000e\|00:19.0"
[    0.744873] pci 0000:00:19.0: [8086:10bd] type 00 class 0x020000
[    0.744888] pci 0000:00:19.0: reg 0x10: [mem 0xf5fc0000-0xf5fdffff]
[    0.744894] pci 0000:00:19.0: reg 0x14: [mem 0xf5ffe000-0xf5ffefff]
[    0.744901] pci 0000:00:19.0: reg 0x18: [io  0xac00-0xac1f]
[    0.744948] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
[    1.804885] e1000e: loading out-of-tree module taints kernel.
[    1.865505] e1000e: module verification failed: signature and/or required key missing - tainting kernel
[    2.004406] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[    2.025277] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[    2.042227] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    2.062117] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[    2.072709] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[    2.083251] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[    2.487257] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[    2.487259] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    2.487279] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[   26.640872] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[   26.653013] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
[  738.114040] e1000e: PCI REMOVE PTP                            # here starts rmmod e1000e
[  738.114045] e1000e: PCI REMOVE TIMER
[  738.114048] e1000e: PCI REMOVE CANCEL WORK SYNC
[  738.114050] e1000e: PCI REMOVE HW TIMESTAMP
[  738.114074] e1000e: NETDEV CLOSE ENTERED
[  738.114076] e1000e: NETDEV CLOSE WAIT DONE
[  738.114077] e1000e: NETDEV CLOSE DEV IS PRESENT
[  738.344182] e1000e: NETDEV CLOSE DEV IS DOWN
[  738.344196] e1000e: NETDEV CLOSE FREE IRQ
[  738.344201] e1000e 0000:00:19.0 eth0: NIC Link is Down
[  738.344203] e1000e: NETDEV CLOSE LINK DOWN MSG
[  738.344205] e1000e: NETDEV CLOSE NAPI DISABLED
[  738.344213] e1000e: NETDEV CLOSE FREE TX RES
[  738.344236] e1000e: NETDEV CLOSE FREE RX RES
[  738.344238] e1000e: NETDEV CLOSE VLAN DONE
[  738.344240] e1000e: NETDEV CLOSE HW CTRL RELEASED
[  738.344243] e1000e: NETDEV CLOSE DONE
[  738.364058] e1000e: PCI REMOVE UNREGISTER NETDEV
[  738.364062] e1000e: PCI REMOVE WAKE NO RESUME
[  738.364065] e1000e: PCI REMOVE RELEASE HW CONTROL
[  738.364099] e1000e: PCI REMOVE INT AND TX RX RING
[  738.364112] e1000e: PCI REMOVE SELECTED REGIONS
[  738.380049] e1000e: PCI REMOVE FREE NETDEV
[  738.380052] e1000e: PCI REMOVE DISABLE ERR REPORTING
[  738.380172] e1000e: PCI REMOVE DISABLE DEVICE                 # here starts modprobe -v e1000e 
[  752.604908] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[  752.604913] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[  752.605114] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[  752.605116] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[  752.605118] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[  752.605119] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[  752.924225] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[  752.924230] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[  752.924255] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[  755.756888] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[  755.756997] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
[  756.562154] e1000e: NETDEV CLOSE ENTERED
[  756.562159] e1000e: NETDEV CLOSE WAIT DONE
[  756.562161] e1000e: NETDEV CLOSE DEV IS PRESENT
[  756.792199] e1000e: NETDEV CLOSE DEV IS DOWN
[  756.792212] e1000e: NETDEV CLOSE FREE IRQ
[  756.792216] e1000e 0000:00:19.0 eth0: NIC Link is Down
[  756.792218] e1000e: NETDEV CLOSE LINK DOWN MSG
[  756.792219] e1000e: NETDEV CLOSE NAPI DISABLED
[  756.792227] e1000e: NETDEV CLOSE FREE TX RES
[  756.792251] e1000e: NETDEV CLOSE FREE RX RES
[  756.792253] e1000e: NETDEV CLOSE VLAN DONE
[  756.792255] e1000e: NETDEV CLOSE HW CTRL RELEASED
[  756.792258] e1000e: NETDEV CLOSE DONE
[  757.309716] e1000e: NETDEV CLOSE ENTERED
[  757.309721] e1000e: NETDEV CLOSE WAIT DONE
[  757.309722] e1000e: NETDEV CLOSE DEV IS PRESENT
[  757.540196] e1000e: NETDEV CLOSE DEV IS DOWN
[  757.540212] e1000e: NETDEV CLOSE FREE IRQ
[  757.540217] e1000e 0000:00:19.0 eth0: NIC Link is Down
[  757.540218] e1000e: NETDEV CLOSE LINK DOWN MSG
[  757.540220] e1000e: NETDEV CLOSE NAPI DISABLED
[  757.540228] e1000e: NETDEV CLOSE FREE TX RES
[  757.540250] e1000e: NETDEV CLOSE FREE RX RES
[  757.540251] e1000e: NETDEV CLOSE VLAN DONE
[  757.540253] e1000e: NETDEV CLOSE HW CTRL RELEASED
[  757.540256] e1000e: NETDEV CLOSE DONE
[  759.336885] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[  759.336993] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$

Thanks in advance.

A.

Altoid · Installation

Hello:

Altoid wrote:

I'll get to this asap and report back as soon as I get it done.

Right ...

Followed the same procedure as the previous time.
ie: clean unpack

groucho@devuan:/$ pushd /usr/src/e1000e-3.8.7
/usr/src/e1000e-3.8.7 /
groucho@devuan:/usr/src/e1000e-3.8.7$

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

P1001 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

P1002 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

Patch 1003 done

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages_v2.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages_v2.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

Patch 1004 done.

Now we make:

groucho@devuan:/usr/src/e1000e-3.8.7$ cd src
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo make
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-common'
make[2]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
  CC [M]  /usr/src/e1000e-3.8.7/src/netdev.o
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7398:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   int count = E1000_CHECK_RESET_COUNT;
   ^~~
  CC [M]  /usr/src/e1000e-3.8.7/src/ethtool.o
  CC [M]  /usr/src/e1000e-3.8.7/src/ich8lan.o
  CC [M]  /usr/src/e1000e-3.8.7/src/mac.o
  CC [M]  /usr/src/e1000e-3.8.7/src/nvm.o
  CC [M]  /usr/src/e1000e-3.8.7/src/phy.o
  CC [M]  /usr/src/e1000e-3.8.7/src/manage.o
  CC [M]  /usr/src/e1000e-3.8.7/src/80003es2lan.o
  CC [M]  /usr/src/e1000e-3.8.7/src/82571.o
  CC [M]  /usr/src/e1000e-3.8.7/src/param.o
  CC [M]  /usr/src/e1000e-3.8.7/src/ptp.o
  CC [M]  /usr/src/e1000e-3.8.7/src/kcompat.o
  LD [M]  /usr/src/e1000e-3.8.7/src/e1000e.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /usr/src/e1000e-3.8.7/src/e1000e.mod.o
  LD [M]  /usr/src/e1000e-3.8.7/src/e1000e.ko
make[2]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-common'
groucho@devuan:/usr/src/e1000e-3.8.7/src$

Make done.

groucho@devuan:~$ sudo modinfo e1000e
[sudo] password for groucho: 
filename:       /lib/modules/4.19.0-16-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
version:        3.8.7-NAPI
license:        GPL
description:    Intel(R) PRO/1000 Network Driver
author:         Intel Corporation, <linux.nics@intel.com>
srcversion:     689D224FDE8A2AB5AF9215A
alias:          pci:v00008086d00001A1Dsv*sd*bc*sc*i*
--- snip ---
alias:          pci:v00008086d0000105Esv*sd*bc*sc*i*
depends:        
retpoline:      Y
name:           e1000e
vermagic:       4.19.0-16-amd64 SMP mod_unload modversions 
parm:           copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)
parm:           TxIntDelay:Transmit Interrupt Delay (array of int)
parm:           TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)
parm:           RxIntDelay:Receive Interrupt Delay (array of int)
parm:           RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)
parm:           InterruptThrottleRate:Interrupt Throttling Rate (array of int)
parm:           IntMode:Interrupt Mode (array of int)
parm:           SmartPowerDownEnable:Enable PHY smart power down (array of int)
parm:           KumeranLockLoss:Enable Kumeran lock loss workaround (array of int)
parm:           CrcStripping:Enable CRC Stripping, disable if your BMC needs the CRC (array of int)
parm:           EEE:Enable/disable on parts that support the feature (array of int)
parm:           Node:[ROUTING] Node to allocate memory on, default -1 (array of int)
parm:           debug:Debug level (0=none,...,16=all) (int)
groucho@devuan:~$

groucho@devuan:~$ sudo rmmod e1000e
groucho@devuan:~$ sudo modprobe -v e1000e
insmod /lib/modules/4.19.0-16-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko SmartPowerDownEnable=0 EEE=0 
groucho@devuan:~$

Gets this in dmesg:

[16365.721275] e1000e: PCI REMOVE PTP
[16365.721279] e1000e: PCI REMOVE TIMER
[16365.721283] e1000e: PCI REMOVE CANCEL WORK SYNC
[16365.721284] e1000e: PCI REMOVE HW TIMESTAMP
[16365.721308] e1000e: NETDEV CLOSE ENTERED
[16365.721310] e1000e: NETDEV CLOSE WAIT DONE
[16365.721311] e1000e: NETDEV CLOSE DEV IS PRESENT
[16365.952439] e1000e: NETDEV CLOSE DEV IS DOWN
[16365.952452] e1000e: NETDEV CLOSE FREE IRQ
[16365.952456] e1000e 0000:00:19.0 eth0: NIC Link is Down
[16365.952458] e1000e: NETDEV CLOSE LINK DOWN MSG
[16365.952460] e1000e: NETDEV CLOSE NAPI DISABLED
[16365.952469] e1000e: NETDEV CLOSE FREE TX RES
[16365.952493] e1000e: NETDEV CLOSE FREE RX RES
[16365.952494] e1000e: NETDEV CLOSE VLAN DONE
[16365.952496] e1000e: NETDEV CLOSE HW CTRL RELEASED
[16365.952499] e1000e: NETDEV CLOSE DONE
[16365.972280] e1000e: PCI REMOVE UNREGISTER NETDEV
[16365.972285] e1000e: PCI REMOVE WAKE NO RESUME
[16365.972288] e1000e: PCI REMOVE RELEASE HW CONTROL
[16365.972322] e1000e: PCI REMOVE INT AND TX RX RING
[16365.972334] e1000e: PCI REMOVE SELECTED REGIONS
[16365.992268] e1000e: PCI REMOVE FREE NETDEV
[16365.992271] e1000e: PCI REMOVE DISABLE ERR REPORTING
[16365.992383] e1000e: PCI REMOVE DISABLE DEVICE
[16367.681610] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[16367.681615] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[16367.681843] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[16367.681845] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[16367.681848] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[16367.681850] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[16367.996454] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[16367.996458] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[16367.996485] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[16371.829118] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[16371.829227] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$

And from the video grab on shutdown:

Looks like the PCI REMOVE debug messages only show up in dmesg.
I'll check again -> Confirmed, only in dmesg.

Thanks so much for your help.

A.

Altoid · Installation

Hello:

geki wrote:

Updated patch 1004.

Right ... 8^)

geki wrote:

I hope I did not add a typo.

In that case, it will show up.

geki wrote:

Half asleep...

Don't overdo it. 8^D !

geki wrote:

Edit
Compiled.
https://geki.selfhost.eu/hacks/1001-e10 … bled.patch
https://geki.selfhost.eu/hacks/1002-e10 … ages.patch
https://geki.selfhost.eu/hacks/1003-e10 … eeze.patch
https://geki.selfhost.eu/hacks/1004-e10 … s_v2.patch

Right ...

I'll get to this asap and report back as soon as I get it done.
All shutdowns normal up to now. 050520201@21:03 GMT

Thanks a lot.

Best,

A.

Altoid · Other Issues

Hello:

pcalvert wrote:

... received this notification more than 24 hours ago ...
... problems have been fixed in version 4.92-8+deb10u6.

See this article from The Register.
https://www.theregister.com/2021/05/05/ … exim_mail/

Tim Anderson @TheRegister wrote:

At the time of writing*, the packages for Debian 9 (Stretch), which is end of life but in long term support, had not yet been updated.

* Wed 5 May 2021 // 17:20 UTC

It may shed some light on the reasons for the apparent delay.
It's probably on its way.

groucho@devuan:~$ apt policy exim4
exim4:
  Installed: (none)
  Candidate: 4.92-8+deb10u5
  Version table:
     4.94.2-1~bpo10+1 100
        100 http://deb.devuan.org/merged beowulf-backports/main amd64 Packages
        100 http://deb.devuan.org/merged beowulf-backports/main i386 Packages
     4.92-8+deb10u5 500
        500 http://deb.devuan.org/merged beowulf/main amd64 Packages
        500 http://deb.devuan.org/merged beowulf/main i386 Packages
     4.92-8+deb10u4 500
        500 http://deb.devuan.org/merged beowulf-security/main amd64 Packages
        500 http://deb.devuan.org/merged beowulf-security/main i386 Packages
groucho@devuan:~$

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... the other print function.
... hopeefully prints something on pci remove.

Right.

Let me know.
Besides that, no other news here for the time being.

Thanks a lot.
Best,

A.

Altoid · Installation

Hello:

geki wrote:

... just right, that that message pops up there ...

OK.

geki wrote:

... last screenshot of shutdown did not show any of the "PCI REMOVE" debug messages.

I did didn't compare them. (sorry, typo)

geki wrote:

... screenshot taken with the very latest patched moule build?

Yes.

Just checked.
The time stamp on the video frame is 20210503 at 19:14 local time.
The only patching that day.

The sequence starts at 291.441602 and ends at 292.649208.
Same as what I uploaded to postimages.org

To keep tabs on myself, I posted the sequence of the patching taken directly from the tty1 output and did not change the *.patch file names.
To make sure everything was truly uncontaminated, I used a freshly unpackaged content of e1000e-3.8.7.tar.gz.

Want me to check something in particular?

Thanks in advance.

A.

Altoid · Installation

Hello:

geki wrote:

... check where that EEE TX LPI TIMER message comes from....

I looked into all the files in /src, these had references to LPI:

groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat ethtool.c | grep LPI
	 * on whether Tx or Rx LPI indications have been received.
	if (phy_data & (E1000_EEE_TX_LPI_RCVD | E1000_EEE_RX_LPI_RCVD))
	edata->tx_lpi_timer = er32(LPIC) >> E1000_LPIC_LPIET_SHIFT;
		e_err("Setting EEE Tx LPI timer is not supported\n");
groucho@devuan:/usr/src/e1000e-3.8.7/src$

groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat ich8lan.c | grep LPI
 *  the link and the EEE capabilities of the link partner.  The LPI Control
 *  EEE LPI must not be asserted earlier than one second after link is up.
 *  On 82579, EEE LPI should not be enabled until such time otherwise there
 *  can be link issues with some switches.  Other devices can have EEE LPI
 *  prevents LPI from being asserted too early.
	ret_val = e1e_rphy_locked(hw, I82579_LPI_CTRL, &lpi_ctrl);
	lpi_ctrl &= ~I82579_LPI_CTRL_ENABLE_MASK;
			lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
				lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
		ret_val = e1000_read_emi_reg_locked(hw, I82579_LPI_PLL_SHUT,
		data &= ~I82579_LPI_100_PLL_SHUT;
		ret_val = e1000_write_emi_reg_locked(hw, I82579_LPI_PLL_SHUT,
	/* R/Clr IEEE MMD 3.1 bits 11:10 - Tx/Rx LPI Received */
	ret_val = e1e_wphy_locked(hw, I82579_LPI_CTRL, lpi_ctrl);
		/* Set EEE LPI Update Timer to 200usec */
						     I82579_LPI_UPDATE_TIMER,
			 * link, and enable Auto Enable LPI since there will
			 * be no driver to enable LPI while in Sx.
				/* Set Auto Enable LPI after link up */
						I217_LPI_GPIO_CTRL, &phy_reg);
				phy_reg |= I217_LPI_GPIO_CTRL_AUTO_EN_LPI;
						I217_LPI_GPIO_CTRL, phy_reg);
		 * power good.  LPI (Low Power Idle) state must also reset only
			/* Set bit enable LPI (EEE) to reset only on
			phy_reg |= I217_SxCTRL_ENABLE_LPI_RESET;
		/* Clear Auto Enable LPI after link up */
		e1e_rphy_locked(hw, I217_LPI_GPIO_CTRL, &phy_reg);
		phy_reg &= ~I217_LPI_GPIO_CTRL_AUTO_EN_LPI;
		e1e_wphy_locked(hw, I217_LPI_GPIO_CTRL, phy_reg);
groucho@devuan:/usr/src/e1000e-3.8.7/src$

groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat ich8lan.h | grep LPI
#define I217_LPI_GPIO_CTRL			PHY_REG(772, 18)
#define I217_LPI_GPIO_CTRL_AUTO_EN_LPI		0x0800
#define I82579_LPI_CTRL				PHY_REG(772, 20)
#define I82579_LPI_CTRL_100_ENABLE		0x2000
#define I82579_LPI_CTRL_1000_ENABLE		0x4000
#define I82579_LPI_CTRL_ENABLE_MASK		0x6000
#define I82579_LPI_UPDATE_TIMER	0x4805	/* in 40ns units + 40 ns base value */
#define I82579_LPI_PLL_SHUT		0x4412	/* LPI PLL Shut Enable */
#define I82579_LPI_100_PLL_SHUT	(1 << 2)	/* 100M LPI PLL Shut Enabled */
#define E1000_EEE_RX_LPI_RCVD	0x0400	/* Tx LP idle received */
#define E1000_EEE_TX_LPI_RCVD	0x0800	/* Rx LP idle received */
#define I217_SxCTRL_ENABLE_LPI_RESET	0x1000
groucho@devuan:/usr/src/e1000e-3.8.7/src$

But this was the only one I found with the complete string:
ie: EEE TX LPI TIMER:

groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat netdev.c | grep LPI
	pr_info("EEE TX LPI TIMER: %08X\n",          # <--- | x | 
		er32(LPIC) >> E1000_LPIC_LPIET_SHIFT);
	/* Ensure that the appropriate bits are set in LPI_CTRL
			retval = e1e_rphy_locked(hw, I82579_LPI_CTRL,
					lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
					lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
				retval = e1e_wphy_locked(hw, I82579_LPI_CTRL,
groucho@devuan:/usr/src/e1000e-3.8.7/src$

Looked at with jed to get the line numbers:

7150 to 7172

7150 }
7151                             
7152 static void e1000e_flush_lpic(struct pci_dev *pdev)
7153 {                           
7154         struct net_device *netdev = pci_get_drvdata(pdev);
7155         struct e1000_adapter *adapter = netdev_priv(netdev);
7156         struct e1000_hw *hw = &adapter->hw;
7157         u32 ret_val;
7158 
7159         pm_runtime_get_sync((netdev_to_dev(netdev))->parent);
7160 
7161         ret_val = hw->phy.ops.acquire(hw);
7162         if (ret_val)
7163                 goto fl_out;
7164 
7165         pr_info("EEE TX LPI TIMER: %08X\n",
7166                 er32(LPIC) >> E1000_LPIC_LPIET_SHIFT);
7167 
7168         hw->phy.ops.release(hw);
7169 
7170 fl_out:
7171         pm_runtime_put_sync(netdev->dev.parent);
7172 }

7526 to 7554:

7526         }
7527 
7528         /* Ensure that the appropriate bits are set in LPI_CTRL
7529          * for EEE in Sx
7530          */
7531         if ((hw->phy.type >= e1000_phy_i217) &&
7532             adapter->eee_advert && hw->dev_spec.ich8lan.eee_lp_ability) {
7533                 u16 lpi_ctrl = 0;
7534 
7535                 retval = hw->phy.ops.acquire(hw);
7536                 if (!retval) {
7537                         retval = e1e_rphy_locked(hw, I82579_LPI_CTRL,
7538                                                  &lpi_ctrl);
7539                         if (!retval) {
7540                                 if (adapter->eee_advert &
7541                                     hw->dev_spec.ich8lan.eee_lp_ability &
7542                                     I82579_EEE_100_SUPPORTED)
7543                                         lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
7544                                 if (adapter->eee_advert &
7545                                     hw->dev_spec.ich8lan.eee_lp_ability &
7546                                     I82579_EEE_1000_SUPPORTED)
7547                                         lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
7548 
7549                                 retval = e1e_wphy_locked(hw, I82579_LPI_CTRL,
7550                                                          lpi_ctrl);
7551                         }
7552                 }
7553                 hw->phy.ops.release(hw);
7554         }

Any use?

Thanks in advance,

A.

Altoid · Installation

Hello:

geki wrote:

... that warning is not mine.

Did not think so.
I expect that it is from Makefile.
Because: ISO C90, mixed declarations, code, syntax?

geki wrote:

that EEE TX LPI TIMER message is actually good.
NULL means nothing active.

I thought the value 00000000 meant the timer was active and at that point in the process had reached a value of nought.
Rather misleading.

geki wrote:

... just a debug message ...
... they did not remove yet.

I see ...
Left over from the original primary Intel driver e1000e module code?
Seems very sloppy. 8^/

Right.
If all is well up to now (sort of got the hang of it), booting with e1000e_3.8.7+*.patch 1000-4000 and shutting down in a standard manner, then we just have to wait.

I'll post back as soon as I get something.

Thank you so very much for your help.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... another oversight.

None of that.
The only way to avoid it is to do nothing.
Not your case. ;^D

geki wrote:

And the next round of cleaned patches:
https://geki.selfhost.eu/hacks/1001-e10 … bled.patch
https://geki.selfhost.eu/hacks/1002-e10 … ages.patch
https://geki.selfhost.eu/hacks/1003-e10 … eeze.patch
https://geki.selfhost.eu/hacks/1004-e10 … ages.patch

Right.

Here we go ...

groucho@devuan:/$ pushd /usr/src/e1000e-3.8.7
/usr/src/e1000e-3.8.7 /
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

P1001 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

P1002 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

P1003 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

P1004 done.

Now we make:

groucho@devuan:/usr/src/e1000e-3.8.7$ cd src
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo make
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-common'
make[2]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
  CC [M]  /usr/src/e1000e-3.8.7/src/netdev.o
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7413:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   int count = E1000_CHECK_RESET_COUNT;
   ^~~
  CC [M]  /usr/src/e1000e-3.8.7/src/ethtool.o
  CC [M]  /usr/src/e1000e-3.8.7/src/ich8lan.o
  CC [M]  /usr/src/e1000e-3.8.7/src/mac.o
  CC [M]  /usr/src/e1000e-3.8.7/src/nvm.o
  CC [M]  /usr/src/e1000e-3.8.7/src/phy.o
  CC [M]  /usr/src/e1000e-3.8.7/src/manage.o
  CC [M]  /usr/src/e1000e-3.8.7/src/80003es2lan.o
  CC [M]  /usr/src/e1000e-3.8.7/src/82571.o
  CC [M]  /usr/src/e1000e-3.8.7/src/param.o
  CC [M]  /usr/src/e1000e-3.8.7/src/ptp.o
  CC [M]  /usr/src/e1000e-3.8.7/src/kcompat.o
  LD [M]  /usr/src/e1000e-3.8.7/src/e1000e.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /usr/src/e1000e-3.8.7/src/e1000e.mod.o
  LD [M]  /usr/src/e1000e-3.8.7/src/e1000e.ko
make[2]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-common'
groucho@devuan:/usr/src/e1000e-3.8.7/src$

Looks OK:

groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo modinfo /usr/src/e1000e-3.8.7/src/e1000e.ko 
filename:       /usr/src/e1000e-3.8.7/src/e1000e.ko
version:        3.8.7-NAPI
license:        GPL
description:    Intel(R) PRO/1000 Network Driver
author:         Intel Corporation, <linux.nics@intel.com>
srcversion:     E009D1772E8A46CD7637A2F
alias:          pci:v00008086d00001A1Dsv*sd*bc*sc*i*
--- snip ---
alias:          pci:v00008086d0000105Esv*sd*bc*sc*i*
depends:        
retpoline:      Y
name:           e1000e
vermagic:       4.19.0-16-amd64 SMP mod_unload modversions 
parm:           copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)
parm:           TxIntDelay:Transmit Interrupt Delay (array of int)
parm:           TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)
parm:           RxIntDelay:Receive Interrupt Delay (array of int)
parm:           RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)
parm:           InterruptThrottleRate:Interrupt Throttling Rate (array of int)
parm:           IntMode:Interrupt Mode (array of int)
parm:           SmartPowerDownEnable:Enable PHY smart power down (array of int)
parm:           KumeranLockLoss:Enable Kumeran lock loss workaround (array of int)
parm:           CrcStripping:Enable CRC Stripping, disable if your BMC needs the CRC (array of int)
parm:           EEE:Enable/disable on parts that support the feature (array of int)
parm:           Node:[ROUTING] Node to allocate memory on, default -1 (array of int)
parm:           debug:Debug level (0=none,...,16=all) (int)
groucho@devuan:/usr/src/e1000e-3.8.7/src$

Let me know if all this looks right to you.

Edit:
I realised that I had seen this before.

/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7413:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   int count = E1000_CHECK_RESET_COUNT;
   ^~~

So I installed the patched module.

groucho@devuan:~$ sudo dmesg | grep e1000e
[    2.138286] e1000e: loading out-of-tree module taints kernel.
[    2.138541] e1000e: module verification failed: signature and/or required key missing - tainting kernel
[    2.193603] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[    2.215556] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[    2.226843] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    2.238204] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[    2.260496] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[    2.271540] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[    2.679491] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[    2.679492] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    2.679510] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[   26.936094] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[   26.948223] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$

groucho@devuan:~$ sudo dmesg | grep 00:19.0
[    1.053337] pci 0000:00:19.0: [8086:10bd] type 00 class 0x020000
[    1.053353] pci 0000:00:19.0: reg 0x10: [mem 0xf5fc0000-0xf5fdffff]
[    1.053359] pci 0000:00:19.0: reg 0x14: [mem 0xf5ffe000-0xf5ffefff]
[    1.053365] pci 0000:00:19.0: reg 0x18: [io  0xac00-0xac1f]
[    1.053413] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
[    2.226843] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    2.238204] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[    2.260496] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[    2.271540] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[    2.679491] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[    2.679492] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    2.679510] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[   26.936094] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[   26.948223] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$

And from the video grab on shutdown:

I guess we now wait ....

Thanks in advance,

A.

Altoid · Installation

Hello:

geki wrote:

... added the numbering prefix to the patches.
That is the order to apply.

Right.

There's been a hitch.
Please tell me what/if I've missed something:

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/0001-e1000e_387_param_eee_be_disabled.patch
[sudo] password for groucho: 
checking file src/param.c

Patch 0001 went well, so I ran it:

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/0001-e1000e_387_param_eee_be_disabled.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/0002-e1000e_387_netdev_shutdown_debug_messages_v3.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

Patch 0002 went well, so I ran it:

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/0002-e1000e_387_netdev_shutdown_debug_messages_v3.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/0003-e1000e_387_netdev_shutdown_no_pm_freeze.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

Patch 0003 went well, so I ran it:

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/0003-e1000e_387_netdev_shutdown_no_pm_freeze.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

I then ran make and got this output:

groucho@devuan:/usr/src/e1000e-3.8.7$ cd src
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo make
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-common'
make[2]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
  CC [M]  /usr/src/e1000e-3.8.7/src/netdev.o
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7413:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   int count = E1000_CHECK_RESET_COUNT;
   ^~~
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000_remove':
/usr/src/e1000e-3.8.7/src/netdev.c:8852:18: error: 'pdev' redeclared as different kind of symbol
  struct pci_dev *pdev = adapter->pdev;
                  ^~~~
/usr/src/e1000e-3.8.7/src/netdev.c:8847:42: note: previous definition of 'pdev' was here
 static void e1000_remove(struct pci_dev *pdev)
                          ~~~~~~~~~~~~~~~~^~~~
make[3]: *** [/usr/src/linux-headers-4.19.0-16-common/scripts/Makefile.build:309: /usr/src/e1000e-3.8.7/src/netdev.o] Error 1
make[2]: *** [/usr/src/linux-headers-4.19.0-16-common/Makefile:1562: _module_/usr/src/e1000e-3.8.7/src] Error 2
make[2]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
make[1]: *** [Makefile:146: sub-make] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-common'
make: *** [Makefile:73: default] Error 2
groucho@devuan:/usr/src/e1000e-3.8.7/src$

I think I got it right this time.

Thanks in advance,

A.

Altoid · Installation

Hello:

geki wrote:

... faster than for my own good.

Whatever time zone that is ... 8^D

geki wrote:

Now, better naming and to have a complete list, these files may be applied:
https://geki.selfhost.eu/hacks/0001-e10 … bled.patch
https://geki.selfhost.eu/hacks/0002-e10 … s_v3.patch
https://geki.selfhost.eu/hacks/0003-e10 … eeze.patch

Right ...

Sorry if I seem dumb:

These three patches are to be applied, successively one after the other to the original v. 3.8.7 downloaded from Intel.
Right?

So that ...
3.8.7 + P_param => 3.8.7p
3.8.7p + P_netdev => 3.8.7q
3.8.7q + P_freeze => 3.8.7r

Is this correct or have I missed something?

geki wrote:

... reminder for me:
- Patch 0001 needs default disabled globally, default enabled for EEE featured devices.
- In the end, remove all debug messages again ...

Got it.

Thanks in advance,

A.

Altoid · Installation

Hello:

geki wrote:

... overlooked one function that is called in the shutdown process... in src/netdev.c.

Well ...
Looks fine to me. 8^)

geki wrote:

... these functions involved: e1000e_close (netdev callback seen early in your screen capture), e1000_remove and e1000_shutdown (pci device callbacks).

OK.

geki wrote:

... e1000_shutdown, it seems that the call to e1000e_pm_freeze is just superfluous.
Other shutdown callbacks handle it; without that funny unprotected call to the netdev detach function.
It seems that this call can safely be removed.

So ...
I see you're still discovering e1000e fun.

geki wrote:

... do a V3 for debug messages, tomorrow.

Whenever you can.

Thanks in advance,

Best.

A.

Altoid · Installation

Hello:

geki wrote:

... not happening always ...

No.
I have never seen it happen twice in a row.
And like I mentioned, I have gone for well over a fortnight without one.

geki wrote:

... most likely an issue with concurrent access to one resource.

I see.

geki wrote:

In this case the netdev resource on shutdown.

If you say so.

We'll see when the next bad shutdown comes along.

geki wrote:

... run a newer Kernel >= 5.5.0.
Just found: https://www.spinics.net/lists/stable/msg443520.html
Another netdev resource locking issue fixed.

Well ...

That's interesting.
I wonder if these patches will get backported to Beowulf?

BTW: I just noticed that this thread has had an unusual following.

Thank you very much for your help (and patience) in getting this sorted out.
Much obliged.

Hopefully, the next bad shutdown will give you the clues you are looking for to write a definite patch.
And maybe send it up for it to become e1000e v3.8.8. 8^)

I'll report results the moment I get them.

Best,

A.

Altoid · Installation

Hello:

geki wrote:

... nothing more to test.

Good.

geki wrote:

It works but the confused initialization and shutdown.

Which could eventually be corrected.
eg: EEE being enabled for all hardware / SmartPowerDownEnabled not doing anything but reporting it is.

Q:
I see the e1000e: EEE TX LPI TIMER: 00000000 line is still present at shutdown.
Wasn't EEE disabled with the first patch you wrote?

geki wrote:

Let's see where it hangs in the end.

Yes, actually looking forward to a bad shutdown.
But it is absolutely impredictable and aleatory.
Never been able to reproduce it.

Typical.
When you do want something to happen ...

geki wrote:

I wonder, if the NETIF CLOSE callback is run ...
... sporadically block each other with that stray detach call ...

If I knew what that was all about ...

Thanks in advance,

A.

Altoid · Installation

Hello:

geki wrote:

... procedures are quite similar in many business/work processes.

I think that in the end it is doing things like you are used to doing them.
Either in specific known things or adapting the learnt methods to other unknown things.
Sorry if I sound like I am channeling Rumsfeld ... 8^/

Back in early 2019, posted to a Sourceforge Intel Ethernet Drivers and Utilities page and received this interesting answer: https://sourceforge.net/p/e1000/bugs/635/#d3ba .
Besides the tech's slip, which seems to be much more sincericide than a lapsus linguae, 2012 marked the last of interactive support for this driver, so at Linux and much more so at Devuan, we're on our own.

And whatever RHE does for it, be sure the source code won't be available.

You've discovered interesting/fun facts: EEE seems to be enabled by default for anything and everything using this module and enabling SmartPowerDownEnable reports itself as Enabled in spite of the hardware not supporting it.

Makes me wonder how many more fun tidbits there are lurking inside the code.

Any leeway I can help you make with respect to this module will of great advantage to the Linux community.

Is there any NIC related test you'd need for me to carry out while we wait for a bad shutdown?

Thanks in advance,

A.

Note: check previous post again - uploded the correct tty1 screen grab.

The officially official Devuan Forum!

#1226 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-10 20:16:57

#1227 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-10 14:39:34

#1228 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-09 22:21:02

#1229 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-09 16:21:05

#1230 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-09 01:02:04

#1231 Re: Desktop and Multimedia » Thunar crashes on "Drag and Drop" » 2021-05-07 15:16:17

#1232 Re: Hardware & System Configuration » Conky + sensors temp settings ? » 2021-05-07 15:13:13

#1233 Re: Desktop and Multimedia » Thunar crashes on "Drag and Drop" » 2021-05-07 14:43:40

#1234 Re: Desktop and Multimedia » Thunar crashes on "Drag and Drop" » 2021-05-07 13:55:02

#1235 Re: Desktop and Multimedia » Thunar crashes on "Drag and Drop" » 2021-05-06 23:42:03

#1236 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-06 08:23:35

#1237 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-05 22:56:21

#1238 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-05 21:06:45

#1239 Re: Other Issues » [SOLVED] Security update delays (again) » 2021-05-05 18:54:15

#1240 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-05 10:56:45

#1241 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-04 20:25:07

#1242 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-04 11:15:43

#1243 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-04 08:00:38

#1244 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-03 21:02:17

#1245 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-03 12:54:30

#1246 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-02 22:52:08

#1247 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-02 22:14:52

#1248 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-02 21:33:00

#1249 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-02 19:06:51

#1250 Re: Installation » Linux e1000e module removal and e1000e EEE timer - Part II » 2021-05-02 11:38:31

Board footer