You are not logged in.
And remember to run a newer Kernel >= 5.5.0. Just found: https://www.spinics.net/lists/stable/msg443520.html
Another netdev resource locking issue fixed.
Last edited by geki (2021-05-02 21:15:03)
Offline
Hello:
... not happening always ...
No.
I have never seen it happen twice in a row.
And like I mentioned, I have gone for well over a fortnight without one.
... most likely an issue with concurrent access to one resource.
I see.
In this case the netdev resource on shutdown.
If you say so.
We'll see when the next bad shutdown comes along.
... run a newer Kernel >= 5.5.0.
Just found: https://www.spinics.net/lists/stable/msg443520.html
Another netdev resource locking issue fixed.
Well ...
That's interesting.
I wonder if these patches will get backported to Beowulf?
BTW: I just noticed that this thread has had an unusual following.
Thank you very much for your help (and patience) in getting this sorted out.
Much obliged.
Hopefully, the next bad shutdown will give you the clues you are looking for to write a definite patch.
And maybe send it up for it to become e1000e v3.8.8. 8^)
I'll report results the moment I get them.
Best,
A.
Offline
Sadly, I overlooked one function that is called in the shutdown process... in src/netdev.c. There are these functions involved: e1000e_close (netdev callback seen early in your screen capture), e1000_remove and e1000_shutdown (pci device callbacks). In e1000_shutdown, it seems that the call to e1000e_pm_freeze is just superfluous. Other shutdown callbacks handle it; without that funny unprotected call to the netdev detach function. It seems that this call can safely be removed. I will do a V3 for debug messages, tomorrow.
Offline
geki wrote:... run a newer Kernel >= 5.5.0.
Just found: https://www.spinics.net/lists/stable/msg443520.html
Another netdev resource locking issue fixed.Well ...
That's interesting.
I wonder if these patches will get backported to Beowulf?
Just use the kernel from beowulf-backports and you are good.
Offline
Hello:
... overlooked one function that is called in the shutdown process... in src/netdev.c.
Well ...
Looks fine to me. 8^)
... these functions involved: e1000e_close (netdev callback seen early in your screen capture), e1000_remove and e1000_shutdown (pci device callbacks).
OK.
... e1000_shutdown, it seems that the call to e1000e_pm_freeze is just superfluous.
Other shutdown callbacks handle it; without that funny unprotected call to the netdev detach function.
It seems that this call can safely be removed.
So ...
I see you're still discovering e1000e fun.
... do a V3 for debug messages, tomorrow.
Whenever you can.
Thanks in advance,
Best.
A.
Offline
Yah, faster than for my own good.
Now, better naming and to have a complete list, these files may be applied:
https://geki.selfhost.eu/hacks/0001-e10 … bled.patch
https://geki.selfhost.eu/hacks/0002-e10 … s_v3.patch
https://geki.selfhost.eu/hacks/0003-e10 … eeze.patch
As a reminder for me:
- Patch 0001 needs default disabled globally, default enabled for EEE featured devices.
- In the end, remove all debug messages again, e heh.
Offline
Hello:
... faster than for my own good.
Whatever time zone that is ... 8^D
Now, better naming and to have a complete list, these files may be applied:
https://geki.selfhost.eu/hacks/0001-e10 … bled.patch
https://geki.selfhost.eu/hacks/0002-e10 … s_v3.patch
https://geki.selfhost.eu/hacks/0003-e10 … eeze.patch
Right ...
Sorry if I seem dumb:
These three patches are to be applied, successively one after the other to the original v. 3.8.7 downloaded from Intel.
Right?
So that ...
3.8.7 + P_param => 3.8.7p
3.8.7p + P_netdev => 3.8.7q
3.8.7q + P_freeze => 3.8.7r
Is this correct or have I missed something?
... reminder for me:
- Patch 0001 needs default disabled globally, default enabled for EEE featured devices.
- In the end, remove all debug messages again ...
Got it.
Thanks in advance,
A.
Offline
Yes, therefore, I added the numbering prefix to the patches. That is the order to apply.
Offline
Hello:
... added the numbering prefix to the patches.
That is the order to apply.
Right.
There's been a hitch.
Please tell me what/if I've missed something:
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/0001-e1000e_387_param_eee_be_disabled.patch
[sudo] password for groucho:
checking file src/param.c
Patch 0001 went well, so I ran it:
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/0001-e1000e_387_param_eee_be_disabled.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/0002-e1000e_387_netdev_shutdown_debug_messages_v3.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
Patch 0002 went well, so I ran it:
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/0002-e1000e_387_netdev_shutdown_debug_messages_v3.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/0003-e1000e_387_netdev_shutdown_no_pm_freeze.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
Patch 0003 went well, so I ran it:
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/0003-e1000e_387_netdev_shutdown_no_pm_freeze.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
I then ran make and got this output:
groucho@devuan:/usr/src/e1000e-3.8.7$ cd src
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo make
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-common'
make[2]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
CC [M] /usr/src/e1000e-3.8.7/src/netdev.o
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7413:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
int count = E1000_CHECK_RESET_COUNT;
^~~
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000_remove':
/usr/src/e1000e-3.8.7/src/netdev.c:8852:18: error: 'pdev' redeclared as different kind of symbol
struct pci_dev *pdev = adapter->pdev;
^~~~
/usr/src/e1000e-3.8.7/src/netdev.c:8847:42: note: previous definition of 'pdev' was here
static void e1000_remove(struct pci_dev *pdev)
~~~~~~~~~~~~~~~~^~~~
make[3]: *** [/usr/src/linux-headers-4.19.0-16-common/scripts/Makefile.build:309: /usr/src/e1000e-3.8.7/src/netdev.o] Error 1
make[2]: *** [/usr/src/linux-headers-4.19.0-16-common/Makefile:1562: _module_/usr/src/e1000e-3.8.7/src] Error 2
make[2]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
make[1]: *** [Makefile:146: sub-make] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-common'
make: *** [Makefile:73: default] Error 2
groucho@devuan:/usr/src/e1000e-3.8.7/src$
I think I got it right this time.
Thanks in advance,
A.
Offline
Yes, indeed, another oversight.
And the next round of cleaned patches:
https://geki.selfhost.eu/hacks/1001-e10 … bled.patch
https://geki.selfhost.eu/hacks/1002-e10 … ages.patch
https://geki.selfhost.eu/hacks/1003-e10 … eeze.patch
https://geki.selfhost.eu/hacks/1004-e10 … ages.patch
Offline
Hello:
... another oversight.
None of that.
The only way to avoid it is to do nothing.
Not your case. ;^D
Right.
Here we go ...
groucho@devuan:/$ pushd /usr/src/e1000e-3.8.7
/usr/src/e1000e-3.8.7 /
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$
P1001 done.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$
P1002 done.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
P1003 done.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
P1004 done.
Now we make:
groucho@devuan:/usr/src/e1000e-3.8.7$ cd src
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo make
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-common'
make[2]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
CC [M] /usr/src/e1000e-3.8.7/src/netdev.o
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7413:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
int count = E1000_CHECK_RESET_COUNT;
^~~
CC [M] /usr/src/e1000e-3.8.7/src/ethtool.o
CC [M] /usr/src/e1000e-3.8.7/src/ich8lan.o
CC [M] /usr/src/e1000e-3.8.7/src/mac.o
CC [M] /usr/src/e1000e-3.8.7/src/nvm.o
CC [M] /usr/src/e1000e-3.8.7/src/phy.o
CC [M] /usr/src/e1000e-3.8.7/src/manage.o
CC [M] /usr/src/e1000e-3.8.7/src/80003es2lan.o
CC [M] /usr/src/e1000e-3.8.7/src/82571.o
CC [M] /usr/src/e1000e-3.8.7/src/param.o
CC [M] /usr/src/e1000e-3.8.7/src/ptp.o
CC [M] /usr/src/e1000e-3.8.7/src/kcompat.o
LD [M] /usr/src/e1000e-3.8.7/src/e1000e.o
Building modules, stage 2.
MODPOST 1 modules
CC /usr/src/e1000e-3.8.7/src/e1000e.mod.o
LD [M] /usr/src/e1000e-3.8.7/src/e1000e.ko
make[2]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-common'
groucho@devuan:/usr/src/e1000e-3.8.7/src$
Looks OK:
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo modinfo /usr/src/e1000e-3.8.7/src/e1000e.ko
filename: /usr/src/e1000e-3.8.7/src/e1000e.ko
version: 3.8.7-NAPI
license: GPL
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <linux.nics@intel.com>
srcversion: E009D1772E8A46CD7637A2F
alias: pci:v00008086d00001A1Dsv*sd*bc*sc*i*
--- snip ---
alias: pci:v00008086d0000105Esv*sd*bc*sc*i*
depends:
retpoline: Y
name: e1000e
vermagic: 4.19.0-16-amd64 SMP mod_unload modversions
parm: copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)
parm: TxIntDelay:Transmit Interrupt Delay (array of int)
parm: TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)
parm: RxIntDelay:Receive Interrupt Delay (array of int)
parm: RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)
parm: InterruptThrottleRate:Interrupt Throttling Rate (array of int)
parm: IntMode:Interrupt Mode (array of int)
parm: SmartPowerDownEnable:Enable PHY smart power down (array of int)
parm: KumeranLockLoss:Enable Kumeran lock loss workaround (array of int)
parm: CrcStripping:Enable CRC Stripping, disable if your BMC needs the CRC (array of int)
parm: EEE:Enable/disable on parts that support the feature (array of int)
parm: Node:[ROUTING] Node to allocate memory on, default -1 (array of int)
parm: debug:Debug level (0=none,...,16=all) (int)
groucho@devuan:/usr/src/e1000e-3.8.7/src$
Let me know if all this looks right to you.
Edit:
I realised that I had seen this before.
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7413:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
int count = E1000_CHECK_RESET_COUNT;
^~~
So I installed the patched module.
groucho@devuan:~$ sudo dmesg | grep e1000e
[ 2.138286] e1000e: loading out-of-tree module taints kernel.
[ 2.138541] e1000e: module verification failed: signature and/or required key missing - tainting kernel
[ 2.193603] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[ 2.215556] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[ 2.226843] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[ 2.238204] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[ 2.260496] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[ 2.271540] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[ 2.679491] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[ 2.679492] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[ 2.679510] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[ 26.936094] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[ 26.948223] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$
groucho@devuan:~$ sudo dmesg | grep 00:19.0
[ 1.053337] pci 0000:00:19.0: [8086:10bd] type 00 class 0x020000
[ 1.053353] pci 0000:00:19.0: reg 0x10: [mem 0xf5fc0000-0xf5fdffff]
[ 1.053359] pci 0000:00:19.0: reg 0x14: [mem 0xf5ffe000-0xf5ffefff]
[ 1.053365] pci 0000:00:19.0: reg 0x18: [io 0xac00-0xac1f]
[ 1.053413] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
[ 2.226843] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[ 2.238204] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[ 2.260496] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[ 2.271540] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[ 2.679491] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[ 2.679492] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[ 2.679510] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[ 26.936094] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[ 26.948223] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$
And from the video grab on shutdown:
I guess we now wait ....
Thanks in advance,
A.
Last edited by Altoid (2021-05-03 22:33:53)
Offline
JFYI, that warning is not mine. And that EEE TX LPI TIMER message is actually good. NULL means nothing active. That is just a debug message, e heh, they did not remove yet.
Offline
Hello:
... that warning is not mine.
Did not think so.
I expect that it is from Makefile.
Because: ISO C90, mixed declarations, code, syntax?
that EEE TX LPI TIMER message is actually good.
NULL means nothing active.
I thought the value 00000000 meant the timer was active and at that point in the process had reached a value of nought.
Rather misleading.
... just a debug message ...
... they did not remove yet.
I see ...
Left over from the original primary Intel driver e1000e module code?
Seems very sloppy. 8^/
Right.
If all is well up to now (sort of got the hang of it), booting with e1000e_3.8.7+*.patch 1000-4000 and shutting down in a standard manner, then we just have to wait.
I'll post back as soon as I get something.
Thank you so very much for your help.
Best,
A.
Offline
Humm, I'll have to check where that EEE TX LPI TIMER message comes from.... not from the pci shutdown callback.
Offline
Hello:
... check where that EEE TX LPI TIMER message comes from....
I looked into all the files in /src, these had references to LPI:
groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat ethtool.c | grep LPI
* on whether Tx or Rx LPI indications have been received.
if (phy_data & (E1000_EEE_TX_LPI_RCVD | E1000_EEE_RX_LPI_RCVD))
edata->tx_lpi_timer = er32(LPIC) >> E1000_LPIC_LPIET_SHIFT;
e_err("Setting EEE Tx LPI timer is not supported\n");
groucho@devuan:/usr/src/e1000e-3.8.7/src$
groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat ich8lan.c | grep LPI
* the link and the EEE capabilities of the link partner. The LPI Control
* EEE LPI must not be asserted earlier than one second after link is up.
* On 82579, EEE LPI should not be enabled until such time otherwise there
* can be link issues with some switches. Other devices can have EEE LPI
* prevents LPI from being asserted too early.
ret_val = e1e_rphy_locked(hw, I82579_LPI_CTRL, &lpi_ctrl);
lpi_ctrl &= ~I82579_LPI_CTRL_ENABLE_MASK;
lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
ret_val = e1000_read_emi_reg_locked(hw, I82579_LPI_PLL_SHUT,
data &= ~I82579_LPI_100_PLL_SHUT;
ret_val = e1000_write_emi_reg_locked(hw, I82579_LPI_PLL_SHUT,
/* R/Clr IEEE MMD 3.1 bits 11:10 - Tx/Rx LPI Received */
ret_val = e1e_wphy_locked(hw, I82579_LPI_CTRL, lpi_ctrl);
/* Set EEE LPI Update Timer to 200usec */
I82579_LPI_UPDATE_TIMER,
* link, and enable Auto Enable LPI since there will
* be no driver to enable LPI while in Sx.
/* Set Auto Enable LPI after link up */
I217_LPI_GPIO_CTRL, &phy_reg);
phy_reg |= I217_LPI_GPIO_CTRL_AUTO_EN_LPI;
I217_LPI_GPIO_CTRL, phy_reg);
* power good. LPI (Low Power Idle) state must also reset only
/* Set bit enable LPI (EEE) to reset only on
phy_reg |= I217_SxCTRL_ENABLE_LPI_RESET;
/* Clear Auto Enable LPI after link up */
e1e_rphy_locked(hw, I217_LPI_GPIO_CTRL, &phy_reg);
phy_reg &= ~I217_LPI_GPIO_CTRL_AUTO_EN_LPI;
e1e_wphy_locked(hw, I217_LPI_GPIO_CTRL, phy_reg);
groucho@devuan:/usr/src/e1000e-3.8.7/src$
groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat ich8lan.h | grep LPI
#define I217_LPI_GPIO_CTRL PHY_REG(772, 18)
#define I217_LPI_GPIO_CTRL_AUTO_EN_LPI 0x0800
#define I82579_LPI_CTRL PHY_REG(772, 20)
#define I82579_LPI_CTRL_100_ENABLE 0x2000
#define I82579_LPI_CTRL_1000_ENABLE 0x4000
#define I82579_LPI_CTRL_ENABLE_MASK 0x6000
#define I82579_LPI_UPDATE_TIMER 0x4805 /* in 40ns units + 40 ns base value */
#define I82579_LPI_PLL_SHUT 0x4412 /* LPI PLL Shut Enable */
#define I82579_LPI_100_PLL_SHUT (1 << 2) /* 100M LPI PLL Shut Enabled */
#define E1000_EEE_RX_LPI_RCVD 0x0400 /* Tx LP idle received */
#define E1000_EEE_TX_LPI_RCVD 0x0800 /* Rx LP idle received */
#define I217_SxCTRL_ENABLE_LPI_RESET 0x1000
groucho@devuan:/usr/src/e1000e-3.8.7/src$
But this was the only one I found with the complete string:
ie: EEE TX LPI TIMER:
groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat netdev.c | grep LPI
pr_info("EEE TX LPI TIMER: %08X\n", # <--- | x |
er32(LPIC) >> E1000_LPIC_LPIET_SHIFT);
/* Ensure that the appropriate bits are set in LPI_CTRL
retval = e1e_rphy_locked(hw, I82579_LPI_CTRL,
lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
retval = e1e_wphy_locked(hw, I82579_LPI_CTRL,
groucho@devuan:/usr/src/e1000e-3.8.7/src$
Looked at with jed to get the line numbers:
7150 to 7172
7150 }
7151
7152 static void e1000e_flush_lpic(struct pci_dev *pdev)
7153 {
7154 struct net_device *netdev = pci_get_drvdata(pdev);
7155 struct e1000_adapter *adapter = netdev_priv(netdev);
7156 struct e1000_hw *hw = &adapter->hw;
7157 u32 ret_val;
7158
7159 pm_runtime_get_sync((netdev_to_dev(netdev))->parent);
7160
7161 ret_val = hw->phy.ops.acquire(hw);
7162 if (ret_val)
7163 goto fl_out;
7164
7165 pr_info("EEE TX LPI TIMER: %08X\n",
7166 er32(LPIC) >> E1000_LPIC_LPIET_SHIFT);
7167
7168 hw->phy.ops.release(hw);
7169
7170 fl_out:
7171 pm_runtime_put_sync(netdev->dev.parent);
7172 }
7526 to 7554:
7526 }
7527
7528 /* Ensure that the appropriate bits are set in LPI_CTRL
7529 * for EEE in Sx
7530 */
7531 if ((hw->phy.type >= e1000_phy_i217) &&
7532 adapter->eee_advert && hw->dev_spec.ich8lan.eee_lp_ability) {
7533 u16 lpi_ctrl = 0;
7534
7535 retval = hw->phy.ops.acquire(hw);
7536 if (!retval) {
7537 retval = e1e_rphy_locked(hw, I82579_LPI_CTRL,
7538 &lpi_ctrl);
7539 if (!retval) {
7540 if (adapter->eee_advert &
7541 hw->dev_spec.ich8lan.eee_lp_ability &
7542 I82579_EEE_100_SUPPORTED)
7543 lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
7544 if (adapter->eee_advert &
7545 hw->dev_spec.ich8lan.eee_lp_ability &
7546 I82579_EEE_1000_SUPPORTED)
7547 lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
7548
7549 retval = e1e_wphy_locked(hw, I82579_LPI_CTRL,
7550 lpi_ctrl);
7551 }
7552 }
7553 hw->phy.ops.release(hw);
7554 }
Any use?
Thanks in advance,
A.
Last edited by Altoid (2021-05-04 12:41:53)
Offline
Humm, I got confused. It is just right, that that message pops up there.... Though, your last screenshot of shutdown did not show any of the "PCI REMOVE" debug messages. Is that screenshot taken with the very latest patched moule build?
Offline
Hello:
... just right, that that message pops up there ...
OK.
... last screenshot of shutdown did not show any of the "PCI REMOVE" debug messages.
I did didn't compare them. (sorry, typo)
... screenshot taken with the very latest patched moule build?
Yes.
Just checked.
The time stamp on the video frame is 20210503 at 19:14 local time.
The only patching that day.
The sequence starts at 291.441602 and ends at 292.649208.
Same as what I uploaded to postimages.org
To keep tabs on myself, I posted the sequence of the patching taken directly from the tty1 output and did not change the *.patch file names.
To make sure everything was truly uncontaminated, I used a freshly unpackaged content of e1000e-3.8.7.tar.gz.
Want me to check something in particular?
Thanks in advance.
A.
Last edited by Altoid (2021-05-04 22:43:52)
Offline
Then I better use the other print function. That one hopeefully prints something on pci remove.
Offline
Hello:
... the other print function.
... hopeefully prints something on pci remove.
Right.
Let me know.
Besides that, no other news here for the time being.
Thanks a lot.
Best,
A.
Offline
Updated patch 1004. I hope I did not add a typo. Half asleep... Edit Compiled.
https://geki.selfhost.eu/hacks/1001-e10 … bled.patch
https://geki.selfhost.eu/hacks/1002-e10 … ages.patch
https://geki.selfhost.eu/hacks/1003-e10 … eeze.patch
https://geki.selfhost.eu/hacks/1004-e10 … s_v2.patch
Last edited by geki (2021-05-05 20:58:48)
Offline
Hello:
Updated patch 1004.
Right ... 8^)
I hope I did not add a typo.
In that case, it will show up.
Half asleep...
Don't overdo it. 8^D !
Right ...
I'll get to this asap and report back as soon as I get it done.
All shutdowns normal up to now. 050520201@21:03 GMT
Thanks a lot.
Best,
A.
Offline
Hello:
I'll get to this asap and report back as soon as I get it done.
Right ...
Followed the same procedure as the previous time.
ie: clean unpack
groucho@devuan:/$ pushd /usr/src/e1000e-3.8.7
/usr/src/e1000e-3.8.7 /
groucho@devuan:/usr/src/e1000e-3.8.7$
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$
P1001 done.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$
P1002 done.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
Patch 1003 done
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages_v2.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
No complaints.
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages_v2.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$
Patch 1004 done.
Now we make:
groucho@devuan:/usr/src/e1000e-3.8.7$ cd src
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo make
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-common'
make[2]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
CC [M] /usr/src/e1000e-3.8.7/src/netdev.o
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7398:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
int count = E1000_CHECK_RESET_COUNT;
^~~
CC [M] /usr/src/e1000e-3.8.7/src/ethtool.o
CC [M] /usr/src/e1000e-3.8.7/src/ich8lan.o
CC [M] /usr/src/e1000e-3.8.7/src/mac.o
CC [M] /usr/src/e1000e-3.8.7/src/nvm.o
CC [M] /usr/src/e1000e-3.8.7/src/phy.o
CC [M] /usr/src/e1000e-3.8.7/src/manage.o
CC [M] /usr/src/e1000e-3.8.7/src/80003es2lan.o
CC [M] /usr/src/e1000e-3.8.7/src/82571.o
CC [M] /usr/src/e1000e-3.8.7/src/param.o
CC [M] /usr/src/e1000e-3.8.7/src/ptp.o
CC [M] /usr/src/e1000e-3.8.7/src/kcompat.o
LD [M] /usr/src/e1000e-3.8.7/src/e1000e.o
Building modules, stage 2.
MODPOST 1 modules
CC /usr/src/e1000e-3.8.7/src/e1000e.mod.o
LD [M] /usr/src/e1000e-3.8.7/src/e1000e.ko
make[2]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-common'
groucho@devuan:/usr/src/e1000e-3.8.7/src$
Make done.
groucho@devuan:~$ sudo modinfo e1000e
[sudo] password for groucho:
filename: /lib/modules/4.19.0-16-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
version: 3.8.7-NAPI
license: GPL
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <linux.nics@intel.com>
srcversion: 689D224FDE8A2AB5AF9215A
alias: pci:v00008086d00001A1Dsv*sd*bc*sc*i*
--- snip ---
alias: pci:v00008086d0000105Esv*sd*bc*sc*i*
depends:
retpoline: Y
name: e1000e
vermagic: 4.19.0-16-amd64 SMP mod_unload modversions
parm: copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)
parm: TxIntDelay:Transmit Interrupt Delay (array of int)
parm: TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)
parm: RxIntDelay:Receive Interrupt Delay (array of int)
parm: RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)
parm: InterruptThrottleRate:Interrupt Throttling Rate (array of int)
parm: IntMode:Interrupt Mode (array of int)
parm: SmartPowerDownEnable:Enable PHY smart power down (array of int)
parm: KumeranLockLoss:Enable Kumeran lock loss workaround (array of int)
parm: CrcStripping:Enable CRC Stripping, disable if your BMC needs the CRC (array of int)
parm: EEE:Enable/disable on parts that support the feature (array of int)
parm: Node:[ROUTING] Node to allocate memory on, default -1 (array of int)
parm: debug:Debug level (0=none,...,16=all) (int)
groucho@devuan:~$
groucho@devuan:~$ sudo rmmod e1000e
groucho@devuan:~$ sudo modprobe -v e1000e
insmod /lib/modules/4.19.0-16-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko SmartPowerDownEnable=0 EEE=0
groucho@devuan:~$
Gets this in dmesg:
[16365.721275] e1000e: PCI REMOVE PTP
[16365.721279] e1000e: PCI REMOVE TIMER
[16365.721283] e1000e: PCI REMOVE CANCEL WORK SYNC
[16365.721284] e1000e: PCI REMOVE HW TIMESTAMP
[16365.721308] e1000e: NETDEV CLOSE ENTERED
[16365.721310] e1000e: NETDEV CLOSE WAIT DONE
[16365.721311] e1000e: NETDEV CLOSE DEV IS PRESENT
[16365.952439] e1000e: NETDEV CLOSE DEV IS DOWN
[16365.952452] e1000e: NETDEV CLOSE FREE IRQ
[16365.952456] e1000e 0000:00:19.0 eth0: NIC Link is Down
[16365.952458] e1000e: NETDEV CLOSE LINK DOWN MSG
[16365.952460] e1000e: NETDEV CLOSE NAPI DISABLED
[16365.952469] e1000e: NETDEV CLOSE FREE TX RES
[16365.952493] e1000e: NETDEV CLOSE FREE RX RES
[16365.952494] e1000e: NETDEV CLOSE VLAN DONE
[16365.952496] e1000e: NETDEV CLOSE HW CTRL RELEASED
[16365.952499] e1000e: NETDEV CLOSE DONE
[16365.972280] e1000e: PCI REMOVE UNREGISTER NETDEV
[16365.972285] e1000e: PCI REMOVE WAKE NO RESUME
[16365.972288] e1000e: PCI REMOVE RELEASE HW CONTROL
[16365.972322] e1000e: PCI REMOVE INT AND TX RX RING
[16365.972334] e1000e: PCI REMOVE SELECTED REGIONS
[16365.992268] e1000e: PCI REMOVE FREE NETDEV
[16365.992271] e1000e: PCI REMOVE DISABLE ERR REPORTING
[16365.992383] e1000e: PCI REMOVE DISABLE DEVICE
[16367.681610] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[16367.681615] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[16367.681843] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[16367.681845] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[16367.681848] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[16367.681850] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[16367.996454] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[16367.996458] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[16367.996485] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[16371.829118] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[16371.829227] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$
And from the video grab on shutdown:
Looks like the PCI REMOVE debug messages only show up in dmesg.
I'll check again -> Confirmed, only in dmesg.
Thanks so much for your help.
A.
Last edited by Altoid (2021-05-05 23:34:29)
Offline
Then, we need the dmesg output to have a complete view.
Offline
Hello:
Then, we need the dmesg output to have a complete view.
dmesg at boot + rmmod e1000e + modprobe -v e1000e, like this?
groucho@devuan:~$ sudo dmesg | grep "e1000e\|00:19.0"
[ 0.744873] pci 0000:00:19.0: [8086:10bd] type 00 class 0x020000
[ 0.744888] pci 0000:00:19.0: reg 0x10: [mem 0xf5fc0000-0xf5fdffff]
[ 0.744894] pci 0000:00:19.0: reg 0x14: [mem 0xf5ffe000-0xf5ffefff]
[ 0.744901] pci 0000:00:19.0: reg 0x18: [io 0xac00-0xac1f]
[ 0.744948] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
[ 1.804885] e1000e: loading out-of-tree module taints kernel.
[ 1.865505] e1000e: module verification failed: signature and/or required key missing - tainting kernel
[ 2.004406] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[ 2.025277] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[ 2.042227] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[ 2.062117] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[ 2.072709] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[ 2.083251] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[ 2.487257] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[ 2.487259] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[ 2.487279] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[ 26.640872] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[ 26.653013] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
[ 738.114040] e1000e: PCI REMOVE PTP # here starts rmmod e1000e
[ 738.114045] e1000e: PCI REMOVE TIMER
[ 738.114048] e1000e: PCI REMOVE CANCEL WORK SYNC
[ 738.114050] e1000e: PCI REMOVE HW TIMESTAMP
[ 738.114074] e1000e: NETDEV CLOSE ENTERED
[ 738.114076] e1000e: NETDEV CLOSE WAIT DONE
[ 738.114077] e1000e: NETDEV CLOSE DEV IS PRESENT
[ 738.344182] e1000e: NETDEV CLOSE DEV IS DOWN
[ 738.344196] e1000e: NETDEV CLOSE FREE IRQ
[ 738.344201] e1000e 0000:00:19.0 eth0: NIC Link is Down
[ 738.344203] e1000e: NETDEV CLOSE LINK DOWN MSG
[ 738.344205] e1000e: NETDEV CLOSE NAPI DISABLED
[ 738.344213] e1000e: NETDEV CLOSE FREE TX RES
[ 738.344236] e1000e: NETDEV CLOSE FREE RX RES
[ 738.344238] e1000e: NETDEV CLOSE VLAN DONE
[ 738.344240] e1000e: NETDEV CLOSE HW CTRL RELEASED
[ 738.344243] e1000e: NETDEV CLOSE DONE
[ 738.364058] e1000e: PCI REMOVE UNREGISTER NETDEV
[ 738.364062] e1000e: PCI REMOVE WAKE NO RESUME
[ 738.364065] e1000e: PCI REMOVE RELEASE HW CONTROL
[ 738.364099] e1000e: PCI REMOVE INT AND TX RX RING
[ 738.364112] e1000e: PCI REMOVE SELECTED REGIONS
[ 738.380049] e1000e: PCI REMOVE FREE NETDEV
[ 738.380052] e1000e: PCI REMOVE DISABLE ERR REPORTING
[ 738.380172] e1000e: PCI REMOVE DISABLE DEVICE # here starts modprobe -v e1000e
[ 752.604908] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[ 752.604913] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[ 752.605114] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[ 752.605116] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[ 752.605118] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[ 752.605119] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[ 752.924225] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[ 752.924230] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[ 752.924255] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[ 755.756888] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[ 755.756997] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
[ 756.562154] e1000e: NETDEV CLOSE ENTERED
[ 756.562159] e1000e: NETDEV CLOSE WAIT DONE
[ 756.562161] e1000e: NETDEV CLOSE DEV IS PRESENT
[ 756.792199] e1000e: NETDEV CLOSE DEV IS DOWN
[ 756.792212] e1000e: NETDEV CLOSE FREE IRQ
[ 756.792216] e1000e 0000:00:19.0 eth0: NIC Link is Down
[ 756.792218] e1000e: NETDEV CLOSE LINK DOWN MSG
[ 756.792219] e1000e: NETDEV CLOSE NAPI DISABLED
[ 756.792227] e1000e: NETDEV CLOSE FREE TX RES
[ 756.792251] e1000e: NETDEV CLOSE FREE RX RES
[ 756.792253] e1000e: NETDEV CLOSE VLAN DONE
[ 756.792255] e1000e: NETDEV CLOSE HW CTRL RELEASED
[ 756.792258] e1000e: NETDEV CLOSE DONE
[ 757.309716] e1000e: NETDEV CLOSE ENTERED
[ 757.309721] e1000e: NETDEV CLOSE WAIT DONE
[ 757.309722] e1000e: NETDEV CLOSE DEV IS PRESENT
[ 757.540196] e1000e: NETDEV CLOSE DEV IS DOWN
[ 757.540212] e1000e: NETDEV CLOSE FREE IRQ
[ 757.540217] e1000e 0000:00:19.0 eth0: NIC Link is Down
[ 757.540218] e1000e: NETDEV CLOSE LINK DOWN MSG
[ 757.540220] e1000e: NETDEV CLOSE NAPI DISABLED
[ 757.540228] e1000e: NETDEV CLOSE FREE TX RES
[ 757.540250] e1000e: NETDEV CLOSE FREE RX RES
[ 757.540251] e1000e: NETDEV CLOSE VLAN DONE
[ 757.540253] e1000e: NETDEV CLOSE HW CTRL RELEASED
[ 757.540256] e1000e: NETDEV CLOSE DONE
[ 759.336885] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[ 759.336993] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$
Thanks in advance.
A.
Last edited by Altoid (2021-05-06 08:36:08)
Offline
Hello:
Update
All shutdowns normal up to now. 050520201@21:03 GMT
That has not changed.
But I did get a bad boot which then resulted in a bad shutdown.
A bad boot is when on starting up the system, both CPU and case fans start to run at 100% and the BIOS stops with a "CPU Fan error" notice.
It asks if you want to continue or press ESC or an Fx to abort, can't recall.
Sorry, no video or grab as I was obviously not ready for it.
This has not happened in a while and although the fans running at 100% is what this bad boot has in common with the bad shutdown, I have always thought they were for different causes.
Sun Microsystems had at one time diagnosed (but evidently not bothered to fix) this problem in a Sun Product Notes *.pdf for this WS (2009) where it says it can happen and why:
CPU Fan Error Might Occur After Power On
If you power on the workstation before the system enters the S3 sleep
state, a CPU fan error might occur.
It also provided a workaround which consists in accessing the Management Engine (ME) BIOS Setup utility to change the power policies.
You have to set ME "Firmware Power Control" to ON and "Host Sleep States" to ON in S0, S3.
I changed "Host Sleep States" from S1, S3 to S0, S3 but every so often the CPU Fan error came around again.
[rant]
But why would I want this?
It is basically allowing Intel ME to start up you workstation remotely.
[/rant]
So I tried to set ME "Firmware Power Control" or "Host Sleep States" to OFF, effectively disabling sleep of any type in my box.
Because ...
WTHF does a server/workstation need a damn S state different than S5 for?
As a result all havok broke loose: on reboot with the box frozen at the start of the BIOS sequence, both CPU and case fans at 100%.
I was scared shitless that my new WS was done for.
Only way out was a hard shutdown, a CMOS clear and a ME BIOS reflash.
I believe that this is closely related to the fact that it is not possible to disable the on-board GbE LAN in the BIOS. (it is greyed out)
That and the Intel e1000 driver is *always* enabling WoL no matter what settings you give it.
Which is why I had WoL set to OFF both at boot and at shutdown via a shutdown script.
In any case, "Host Sleep States" is evidently set to ON and S1, (not S0) and S3.
I insisted with my attempt once again before starting the first part of this thread, with the same results.
Not as scared and more confident around the hardware than the first time I tried it, but in a sweat till I saw a working boot screen come up.
But I digress ...
Instead of aborting the boot sequence I continued to boot into Devuan, which went on without any other problem than the fans blowing continuously at 100%.
I got a copy of dmesg checked that everything was working properly and proceeded to shut down as I am usually doing these days. ie: plain shutdown -h now, no script.
The result was another bad shutdown, like the ones I usually get.
Here's the shutdown screen:
No different than what I am getting these days with a normal shutdown.
ie: contains no debug data.
I will edit the shutdown script to disable WoL as I had been doing to see if there's any change in this behaviour.
Thanks in advance,
A.
Offline