The officially official Devuan Forum!

You are not logged in.

#51 2021-05-02 21:14:49

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

And remember to run a newer Kernel >= 5.5.0. Just found: https://www.spinics.net/lists/stable/msg443520.html
Another netdev resource locking issue fixed. big_smile

Last edited by geki (2021-05-02 21:15:03)

Offline

#52 2021-05-02 21:33:00

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

... not happening always ...

No.
I have never seen it happen twice in a row.
And like I mentioned, I have gone for well over a fortnight without one.

geki wrote:

... most likely an issue with concurrent access to one resource.

I see.

geki wrote:

In this case the netdev resource on shutdown.

If you say so.

We'll see when the next bad shutdown comes along.

geki wrote:

... run a newer Kernel >= 5.5.0.
Just found: https://www.spinics.net/lists/stable/msg443520.html
Another netdev resource locking issue fixed.

Well ...

That's interesting.
I wonder if these patches will get backported to Beowulf?

BTW: I just noticed that this thread has had an unusual following.

Thank you very much for your help (and patience) in getting this sorted out.
Much obliged.

Hopefully, the next bad shutdown will give you the clues you are looking for to write a definite patch.
And maybe send it up for it to become e1000e v3.8.8.   8^)

I'll report results the moment I get them.

Best,

A.

Offline

#53 2021-05-02 22:00:49

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Sadly, I overlooked one function that is called in the shutdown process... in src/netdev.c. There are these functions involved: e1000e_close (netdev callback seen early in your screen capture), e1000_remove and e1000_shutdown (pci device callbacks). In e1000_shutdown, it seems that the call to e1000e_pm_freeze is just superfluous. Other shutdown callbacks handle it; without that funny unprotected call to the netdev detach function. It seems that this call can safely be removed. I will do a V3 for debug messages, tomorrow. big_smile

Offline

#54 2021-05-02 22:12:22

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Altoid wrote:
geki wrote:

... run a newer Kernel >= 5.5.0.
Just found: https://www.spinics.net/lists/stable/msg443520.html
Another netdev resource locking issue fixed.

Well ...

That's interesting.
I wonder if these patches will get backported to Beowulf?

Just use the kernel from beowulf-backports and you are good. smile

Offline

#55 2021-05-02 22:14:52

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

... overlooked one function that is called in the shutdown process... in src/netdev.c.

Well ...
Looks fine to me.  8^)

geki wrote:

... these functions involved: e1000e_close (netdev callback seen early in your screen capture), e1000_remove and e1000_shutdown (pci device callbacks).

OK.

geki wrote:

... e1000_shutdown, it seems that the call to e1000e_pm_freeze is just superfluous.
Other shutdown callbacks handle it; without that funny unprotected call to the netdev detach function.
It seems that this call can safely be removed.

So ...
I see you're still discovering e1000e fun.

geki wrote:

... do a V3 for debug messages, tomorrow.

Whenever you can.

Thanks in advance,

Best.

A.

Offline

#56 2021-05-02 22:39:55

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Yah, faster than for my own good. cool

Now, better naming and to have a complete list, these files may be applied:
https://geki.selfhost.eu/hacks/0001-e10 … bled.patch
https://geki.selfhost.eu/hacks/0002-e10 … s_v3.patch
https://geki.selfhost.eu/hacks/0003-e10 … eeze.patch

As a reminder for me:
- Patch 0001 needs default disabled globally, default enabled for EEE featured devices.
- In the end, remove all debug messages again, e heh.

Offline

#57 2021-05-02 22:52:08

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

... faster than for my own good.

Whatever time zone that is ... 8^D

Right ...

Sorry if I seem dumb:

These three patches are to be applied, successively one after the other to the original v. 3.8.7 downloaded from Intel.
Right?

So that ...
3.8.7 + P_param => 3.8.7p
3.8.7p + P_netdev => 3.8.7q
3.8.7q + P_freeze => 3.8.7r

Is this correct or have I missed something?

geki wrote:

... reminder for me:
- Patch 0001 needs default disabled globally, default enabled for EEE featured devices.
- In the end, remove all debug messages again ...

Got it.

Thanks in advance,

A.

Offline

#58 2021-05-03 05:54:29

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Yes, therefore, I added the numbering prefix to the patches. That is the order to apply. big_smile

Offline

#59 2021-05-03 12:54:30

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

... added the numbering prefix to the patches.
That is the order to apply.

Right.

There's been a hitch.
Please tell me what/if I've missed something:

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/0001-e1000e_387_param_eee_be_disabled.patch
[sudo] password for groucho: 
checking file src/param.c

Patch 0001 went well, so I ran it:

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/0001-e1000e_387_param_eee_be_disabled.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/0002-e1000e_387_netdev_shutdown_debug_messages_v3.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

Patch 0002 went well, so I ran it:

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/0002-e1000e_387_netdev_shutdown_debug_messages_v3.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/0003-e1000e_387_netdev_shutdown_no_pm_freeze.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

Patch 0003 went well, so I ran it:

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/0003-e1000e_387_netdev_shutdown_no_pm_freeze.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

No complaints.

I then ran make and got this output:

groucho@devuan:/usr/src/e1000e-3.8.7$ cd src
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo make
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-common'
make[2]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
  CC [M]  /usr/src/e1000e-3.8.7/src/netdev.o
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7413:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   int count = E1000_CHECK_RESET_COUNT;
   ^~~
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000_remove':
/usr/src/e1000e-3.8.7/src/netdev.c:8852:18: error: 'pdev' redeclared as different kind of symbol
  struct pci_dev *pdev = adapter->pdev;
                  ^~~~
/usr/src/e1000e-3.8.7/src/netdev.c:8847:42: note: previous definition of 'pdev' was here
 static void e1000_remove(struct pci_dev *pdev)
                          ~~~~~~~~~~~~~~~~^~~~
make[3]: *** [/usr/src/linux-headers-4.19.0-16-common/scripts/Makefile.build:309: /usr/src/e1000e-3.8.7/src/netdev.o] Error 1
make[2]: *** [/usr/src/linux-headers-4.19.0-16-common/Makefile:1562: _module_/usr/src/e1000e-3.8.7/src] Error 2
make[2]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
make[1]: *** [Makefile:146: sub-make] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-common'
make: *** [Makefile:73: default] Error 2
groucho@devuan:/usr/src/e1000e-3.8.7/src$ 

I think I got it right this time.

Thanks in advance,

A.

Offline

#60 2021-05-03 20:05:01

geki
Member
Registered: 2019-02-04
Posts: 103  

Offline

#61 2021-05-03 21:02:17

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

... another oversight.

None of that.
The only way to avoid it is to do nothing.
Not your case.  ;^D

Right.

Here we go ...

groucho@devuan:/$ pushd /usr/src/e1000e-3.8.7
/usr/src/e1000e-3.8.7 /
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

P1001 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

P1002 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

P1003 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

P1004 done.

Now we make:

groucho@devuan:/usr/src/e1000e-3.8.7$ cd src
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo make
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-common'
make[2]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
  CC [M]  /usr/src/e1000e-3.8.7/src/netdev.o
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7413:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   int count = E1000_CHECK_RESET_COUNT;
   ^~~
  CC [M]  /usr/src/e1000e-3.8.7/src/ethtool.o
  CC [M]  /usr/src/e1000e-3.8.7/src/ich8lan.o
  CC [M]  /usr/src/e1000e-3.8.7/src/mac.o
  CC [M]  /usr/src/e1000e-3.8.7/src/nvm.o
  CC [M]  /usr/src/e1000e-3.8.7/src/phy.o
  CC [M]  /usr/src/e1000e-3.8.7/src/manage.o
  CC [M]  /usr/src/e1000e-3.8.7/src/80003es2lan.o
  CC [M]  /usr/src/e1000e-3.8.7/src/82571.o
  CC [M]  /usr/src/e1000e-3.8.7/src/param.o
  CC [M]  /usr/src/e1000e-3.8.7/src/ptp.o
  CC [M]  /usr/src/e1000e-3.8.7/src/kcompat.o
  LD [M]  /usr/src/e1000e-3.8.7/src/e1000e.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /usr/src/e1000e-3.8.7/src/e1000e.mod.o
  LD [M]  /usr/src/e1000e-3.8.7/src/e1000e.ko
make[2]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-common'
groucho@devuan:/usr/src/e1000e-3.8.7/src$ 

Looks OK:

groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo modinfo /usr/src/e1000e-3.8.7/src/e1000e.ko 
filename:       /usr/src/e1000e-3.8.7/src/e1000e.ko
version:        3.8.7-NAPI
license:        GPL
description:    Intel(R) PRO/1000 Network Driver
author:         Intel Corporation, <linux.nics@intel.com>
srcversion:     E009D1772E8A46CD7637A2F
alias:          pci:v00008086d00001A1Dsv*sd*bc*sc*i*
--- snip ---
alias:          pci:v00008086d0000105Esv*sd*bc*sc*i*
depends:        
retpoline:      Y
name:           e1000e
vermagic:       4.19.0-16-amd64 SMP mod_unload modversions 
parm:           copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)
parm:           TxIntDelay:Transmit Interrupt Delay (array of int)
parm:           TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)
parm:           RxIntDelay:Receive Interrupt Delay (array of int)
parm:           RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)
parm:           InterruptThrottleRate:Interrupt Throttling Rate (array of int)
parm:           IntMode:Interrupt Mode (array of int)
parm:           SmartPowerDownEnable:Enable PHY smart power down (array of int)
parm:           KumeranLockLoss:Enable Kumeran lock loss workaround (array of int)
parm:           CrcStripping:Enable CRC Stripping, disable if your BMC needs the CRC (array of int)
parm:           EEE:Enable/disable on parts that support the feature (array of int)
parm:           Node:[ROUTING] Node to allocate memory on, default -1 (array of int)
parm:           debug:Debug level (0=none,...,16=all) (int)
groucho@devuan:/usr/src/e1000e-3.8.7/src$ 

Let me know if all this looks right to you.

Edit:
I realised that I had seen this before.

/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7413:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   int count = E1000_CHECK_RESET_COUNT;
   ^~~

So I installed the patched module.

groucho@devuan:~$ sudo dmesg | grep e1000e
[    2.138286] e1000e: loading out-of-tree module taints kernel.
[    2.138541] e1000e: module verification failed: signature and/or required key missing - tainting kernel
[    2.193603] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[    2.215556] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[    2.226843] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    2.238204] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[    2.260496] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[    2.271540] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[    2.679491] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[    2.679492] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    2.679510] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[   26.936094] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[   26.948223] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$

groucho@devuan:~$ sudo dmesg | grep 00:19.0
[    1.053337] pci 0000:00:19.0: [8086:10bd] type 00 class 0x020000
[    1.053353] pci 0000:00:19.0: reg 0x10: [mem 0xf5fc0000-0xf5fdffff]
[    1.053359] pci 0000:00:19.0: reg 0x14: [mem 0xf5ffe000-0xf5ffefff]
[    1.053365] pci 0000:00:19.0: reg 0x18: [io  0xac00-0xac1f]
[    1.053413] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
[    2.226843] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    2.238204] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[    2.260496] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[    2.271540] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[    2.679491] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[    2.679492] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    2.679510] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[   26.936094] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[   26.948223] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$

And from the video grab on shutdown:

e1000e-387.png

I guess we now wait  ....

Thanks in advance,

A.

Last edited by Altoid (2021-05-03 22:33:53)

Offline

#62 2021-05-04 06:16:12

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

JFYI, that  warning is not mine. big_smile And that EEE TX LPI TIMER message is actually good. NULL means nothing active. cool That is just a debug message, e heh, they did not remove yet.

Offline

#63 2021-05-04 08:00:38

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

... that warning is not mine.

Did not think so.
I expect that it is from Makefile.
Because: ISO C90, mixed declarations, code, syntax?

geki wrote:

that EEE TX LPI TIMER message is actually good.
NULL means nothing active.

I thought the value 00000000 meant the timer was active and at that point in the process had reached a value of nought.
Rather misleading.

geki wrote:

... just a debug message ...
... they did not remove yet.

I see ...
Left over from the original primary Intel driver e1000e module code?
Seems very sloppy. 8^/

Right.
If all is well up to now (sort of got the hang of it), booting with e1000e_3.8.7+*.patch 1000-4000 and shutting down in a standard manner, then we just have to wait.

I'll post back as soon as I get something.

Thank you so very much for your help.

Best,

A.

Offline

#64 2021-05-04 09:11:38

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Humm, I'll have to check where  that EEE TX LPI TIMER message comes from.... not from the pci shutdown callback.

Offline

#65 2021-05-04 11:15:43

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

... check where  that EEE TX LPI TIMER message comes from....

I looked into all the files in /src, these had references to LPI:

groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat ethtool.c | grep LPI
	 * on whether Tx or Rx LPI indications have been received.
	if (phy_data & (E1000_EEE_TX_LPI_RCVD | E1000_EEE_RX_LPI_RCVD))
	edata->tx_lpi_timer = er32(LPIC) >> E1000_LPIC_LPIET_SHIFT;
		e_err("Setting EEE Tx LPI timer is not supported\n");
groucho@devuan:/usr/src/e1000e-3.8.7/src$ 
groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat ich8lan.c | grep LPI
 *  the link and the EEE capabilities of the link partner.  The LPI Control
 *  EEE LPI must not be asserted earlier than one second after link is up.
 *  On 82579, EEE LPI should not be enabled until such time otherwise there
 *  can be link issues with some switches.  Other devices can have EEE LPI
 *  prevents LPI from being asserted too early.
	ret_val = e1e_rphy_locked(hw, I82579_LPI_CTRL, &lpi_ctrl);
	lpi_ctrl &= ~I82579_LPI_CTRL_ENABLE_MASK;
			lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
				lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
		ret_val = e1000_read_emi_reg_locked(hw, I82579_LPI_PLL_SHUT,
		data &= ~I82579_LPI_100_PLL_SHUT;
		ret_val = e1000_write_emi_reg_locked(hw, I82579_LPI_PLL_SHUT,
	/* R/Clr IEEE MMD 3.1 bits 11:10 - Tx/Rx LPI Received */
	ret_val = e1e_wphy_locked(hw, I82579_LPI_CTRL, lpi_ctrl);
		/* Set EEE LPI Update Timer to 200usec */
						     I82579_LPI_UPDATE_TIMER,
			 * link, and enable Auto Enable LPI since there will
			 * be no driver to enable LPI while in Sx.
				/* Set Auto Enable LPI after link up */
						I217_LPI_GPIO_CTRL, &phy_reg);
				phy_reg |= I217_LPI_GPIO_CTRL_AUTO_EN_LPI;
						I217_LPI_GPIO_CTRL, phy_reg);
		 * power good.  LPI (Low Power Idle) state must also reset only
			/* Set bit enable LPI (EEE) to reset only on
			phy_reg |= I217_SxCTRL_ENABLE_LPI_RESET;
		/* Clear Auto Enable LPI after link up */
		e1e_rphy_locked(hw, I217_LPI_GPIO_CTRL, &phy_reg);
		phy_reg &= ~I217_LPI_GPIO_CTRL_AUTO_EN_LPI;
		e1e_wphy_locked(hw, I217_LPI_GPIO_CTRL, phy_reg);
groucho@devuan:/usr/src/e1000e-3.8.7/src$ 
groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat ich8lan.h | grep LPI
#define I217_LPI_GPIO_CTRL			PHY_REG(772, 18)
#define I217_LPI_GPIO_CTRL_AUTO_EN_LPI		0x0800
#define I82579_LPI_CTRL				PHY_REG(772, 20)
#define I82579_LPI_CTRL_100_ENABLE		0x2000
#define I82579_LPI_CTRL_1000_ENABLE		0x4000
#define I82579_LPI_CTRL_ENABLE_MASK		0x6000
#define I82579_LPI_UPDATE_TIMER	0x4805	/* in 40ns units + 40 ns base value */
#define I82579_LPI_PLL_SHUT		0x4412	/* LPI PLL Shut Enable */
#define I82579_LPI_100_PLL_SHUT	(1 << 2)	/* 100M LPI PLL Shut Enabled */
#define E1000_EEE_RX_LPI_RCVD	0x0400	/* Tx LP idle received */
#define E1000_EEE_TX_LPI_RCVD	0x0800	/* Rx LP idle received */
#define I217_SxCTRL_ENABLE_LPI_RESET	0x1000
groucho@devuan:/usr/src/e1000e-3.8.7/src$ 

But this was the only one I found with the complete string:
ie: EEE TX LPI TIMER:

groucho@devuan:/usr/src/e1000e-3.8.7/src$ cat netdev.c | grep LPI
	pr_info("EEE TX LPI TIMER: %08X\n",          # <--- | x | 
		er32(LPIC) >> E1000_LPIC_LPIET_SHIFT);
	/* Ensure that the appropriate bits are set in LPI_CTRL
			retval = e1e_rphy_locked(hw, I82579_LPI_CTRL,
					lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
					lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
				retval = e1e_wphy_locked(hw, I82579_LPI_CTRL,
groucho@devuan:/usr/src/e1000e-3.8.7/src$ 

Looked at with jed to get the line numbers:

7150 to 7172

7150 }
7151                             
7152 static void e1000e_flush_lpic(struct pci_dev *pdev)
7153 {                           
7154         struct net_device *netdev = pci_get_drvdata(pdev);
7155         struct e1000_adapter *adapter = netdev_priv(netdev);
7156         struct e1000_hw *hw = &adapter->hw;
7157         u32 ret_val;
7158 
7159         pm_runtime_get_sync((netdev_to_dev(netdev))->parent);
7160 
7161         ret_val = hw->phy.ops.acquire(hw);
7162         if (ret_val)
7163                 goto fl_out;
7164 
7165         pr_info("EEE TX LPI TIMER: %08X\n",
7166                 er32(LPIC) >> E1000_LPIC_LPIET_SHIFT);
7167 
7168         hw->phy.ops.release(hw);
7169 
7170 fl_out:
7171         pm_runtime_put_sync(netdev->dev.parent);
7172 }

7526 to 7554:

7526         }
7527 
7528         /* Ensure that the appropriate bits are set in LPI_CTRL
7529          * for EEE in Sx
7530          */
7531         if ((hw->phy.type >= e1000_phy_i217) &&
7532             adapter->eee_advert && hw->dev_spec.ich8lan.eee_lp_ability) {
7533                 u16 lpi_ctrl = 0;
7534 
7535                 retval = hw->phy.ops.acquire(hw);
7536                 if (!retval) {
7537                         retval = e1e_rphy_locked(hw, I82579_LPI_CTRL,
7538                                                  &lpi_ctrl);
7539                         if (!retval) {
7540                                 if (adapter->eee_advert &
7541                                     hw->dev_spec.ich8lan.eee_lp_ability &
7542                                     I82579_EEE_100_SUPPORTED)
7543                                         lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
7544                                 if (adapter->eee_advert &
7545                                     hw->dev_spec.ich8lan.eee_lp_ability &
7546                                     I82579_EEE_1000_SUPPORTED)
7547                                         lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
7548 
7549                                 retval = e1e_wphy_locked(hw, I82579_LPI_CTRL,
7550                                                          lpi_ctrl);
7551                         }
7552                 }
7553                 hw->phy.ops.release(hw);
7554         }

Any use?

Thanks in advance,

A.

Last edited by Altoid (2021-05-04 12:41:53)

Offline

#66 2021-05-04 19:35:03

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Humm, I got confused. big_smile It is just right, that that message pops up there.... Though, your last screenshot of shutdown did not show any of the "PCI REMOVE" debug messages. Is that screenshot taken with the very latest patched moule build?

Offline

#67 2021-05-04 20:25:07

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

... just right, that that message pops up there ...

OK.

geki wrote:

... last screenshot of shutdown did not show any of the "PCI REMOVE" debug messages.

I did didn't compare them. (sorry, typo)

geki wrote:

... screenshot taken with the very latest patched moule build?

Yes.

Just checked.
The time stamp on the video frame is 20210503 at 19:14 local time.
The only patching that day.

The sequence starts at 291.441602 and ends at 292.649208.
Same as what I uploaded to postimages.org

To keep tabs on myself, I posted the sequence of the patching taken directly from the tty1 output and did not change the *.patch file names.
To make sure everything was truly uncontaminated, I used a freshly unpackaged content of e1000e-3.8.7.tar.gz.

Want me to check something in particular?

Thanks in advance.

A.

Last edited by Altoid (2021-05-04 22:43:52)

Offline

#68 2021-05-05 06:24:21

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Then I better use the other print function. That one hopeefully prints something on pci remove.

Offline

#69 2021-05-05 10:56:45

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

... the other print function.
... hopeefully prints something on pci remove.

Right.

Let me know.
Besides that, no other news here for the time being.

Thanks a lot.
Best,

A.

Offline

#70 2021-05-05 20:48:34

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Offline

#71 2021-05-05 21:06:45

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

Updated patch 1004.

Right ...  8^)

geki wrote:

I hope I did not add a typo.

In that case, it will show up.

geki wrote:

Half asleep...

Don't overdo it.  8^D !

Right ...

I'll get to this asap and report back as soon as I get it done.
All shutdowns normal up to now. 050520201@21:03 GMT

Thanks a lot.

Best,

A.

Offline

#72 2021-05-05 22:56:21

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

Altoid wrote:

I'll get to this asap and report back as soon as I get it done.

Right ...

Followed the same procedure as the previous time.
ie: clean unpack

groucho@devuan:/$ pushd /usr/src/e1000e-3.8.7
/usr/src/e1000e-3.8.7 /
groucho@devuan:/usr/src/e1000e-3.8.7$ 
groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1001-e1000e_387_param_eee_be_disabled.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

P1001 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
checking file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1002-e1000e_387_param_eee_debug_messages.patch
patching file src/param.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

P1002 done.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1003-e1000e_387_shutdown_superfluous_pm_freeze.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

Patch 1003 done

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch --dry-run -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages_v2.patch
checking file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$

No complaints.

groucho@devuan:/usr/src/e1000e-3.8.7$ sudo patch -p0 -i /usr/src/e1000e-patch/1004-e1000e_387_shutdown_debug_messages_v2.patch
patching file src/netdev.c
groucho@devuan:/usr/src/e1000e-3.8.7$ 

Patch 1004 done.

Now we make:

groucho@devuan:/usr/src/e1000e-3.8.7$ cd src
groucho@devuan:/usr/src/e1000e-3.8.7/src$ sudo make
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-common'
make[2]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
  CC [M]  /usr/src/e1000e-3.8.7/src/netdev.o
/usr/src/e1000e-3.8.7/src/netdev.c: In function 'e1000e_pm_freeze':
/usr/src/e1000e-3.8.7/src/netdev.c:7398:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   int count = E1000_CHECK_RESET_COUNT;
   ^~~
  CC [M]  /usr/src/e1000e-3.8.7/src/ethtool.o
  CC [M]  /usr/src/e1000e-3.8.7/src/ich8lan.o
  CC [M]  /usr/src/e1000e-3.8.7/src/mac.o
  CC [M]  /usr/src/e1000e-3.8.7/src/nvm.o
  CC [M]  /usr/src/e1000e-3.8.7/src/phy.o
  CC [M]  /usr/src/e1000e-3.8.7/src/manage.o
  CC [M]  /usr/src/e1000e-3.8.7/src/80003es2lan.o
  CC [M]  /usr/src/e1000e-3.8.7/src/82571.o
  CC [M]  /usr/src/e1000e-3.8.7/src/param.o
  CC [M]  /usr/src/e1000e-3.8.7/src/ptp.o
  CC [M]  /usr/src/e1000e-3.8.7/src/kcompat.o
  LD [M]  /usr/src/e1000e-3.8.7/src/e1000e.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /usr/src/e1000e-3.8.7/src/e1000e.mod.o
  LD [M]  /usr/src/e1000e-3.8.7/src/e1000e.ko
make[2]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-common'
groucho@devuan:/usr/src/e1000e-3.8.7/src$ 

Make done.

groucho@devuan:~$ sudo modinfo e1000e
[sudo] password for groucho: 
filename:       /lib/modules/4.19.0-16-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
version:        3.8.7-NAPI
license:        GPL
description:    Intel(R) PRO/1000 Network Driver
author:         Intel Corporation, <linux.nics@intel.com>
srcversion:     689D224FDE8A2AB5AF9215A
alias:          pci:v00008086d00001A1Dsv*sd*bc*sc*i*
--- snip ---
alias:          pci:v00008086d0000105Esv*sd*bc*sc*i*
depends:        
retpoline:      Y
name:           e1000e
vermagic:       4.19.0-16-amd64 SMP mod_unload modversions 
parm:           copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)
parm:           TxIntDelay:Transmit Interrupt Delay (array of int)
parm:           TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)
parm:           RxIntDelay:Receive Interrupt Delay (array of int)
parm:           RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)
parm:           InterruptThrottleRate:Interrupt Throttling Rate (array of int)
parm:           IntMode:Interrupt Mode (array of int)
parm:           SmartPowerDownEnable:Enable PHY smart power down (array of int)
parm:           KumeranLockLoss:Enable Kumeran lock loss workaround (array of int)
parm:           CrcStripping:Enable CRC Stripping, disable if your BMC needs the CRC (array of int)
parm:           EEE:Enable/disable on parts that support the feature (array of int)
parm:           Node:[ROUTING] Node to allocate memory on, default -1 (array of int)
parm:           debug:Debug level (0=none,...,16=all) (int)
groucho@devuan:~$ 
groucho@devuan:~$ sudo rmmod e1000e
groucho@devuan:~$ sudo modprobe -v e1000e
insmod /lib/modules/4.19.0-16-amd64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko SmartPowerDownEnable=0 EEE=0 
groucho@devuan:~$ 

Gets this in dmesg:

[16365.721275] e1000e: PCI REMOVE PTP
[16365.721279] e1000e: PCI REMOVE TIMER
[16365.721283] e1000e: PCI REMOVE CANCEL WORK SYNC
[16365.721284] e1000e: PCI REMOVE HW TIMESTAMP
[16365.721308] e1000e: NETDEV CLOSE ENTERED
[16365.721310] e1000e: NETDEV CLOSE WAIT DONE
[16365.721311] e1000e: NETDEV CLOSE DEV IS PRESENT
[16365.952439] e1000e: NETDEV CLOSE DEV IS DOWN
[16365.952452] e1000e: NETDEV CLOSE FREE IRQ
[16365.952456] e1000e 0000:00:19.0 eth0: NIC Link is Down
[16365.952458] e1000e: NETDEV CLOSE LINK DOWN MSG
[16365.952460] e1000e: NETDEV CLOSE NAPI DISABLED
[16365.952469] e1000e: NETDEV CLOSE FREE TX RES
[16365.952493] e1000e: NETDEV CLOSE FREE RX RES
[16365.952494] e1000e: NETDEV CLOSE VLAN DONE
[16365.952496] e1000e: NETDEV CLOSE HW CTRL RELEASED
[16365.952499] e1000e: NETDEV CLOSE DONE
[16365.972280] e1000e: PCI REMOVE UNREGISTER NETDEV
[16365.972285] e1000e: PCI REMOVE WAKE NO RESUME
[16365.972288] e1000e: PCI REMOVE RELEASE HW CONTROL
[16365.972322] e1000e: PCI REMOVE INT AND TX RX RING
[16365.972334] e1000e: PCI REMOVE SELECTED REGIONS
[16365.992268] e1000e: PCI REMOVE FREE NETDEV
[16365.992271] e1000e: PCI REMOVE DISABLE ERR REPORTING
[16365.992383] e1000e: PCI REMOVE DISABLE DEVICE
[16367.681610] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[16367.681615] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[16367.681843] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[16367.681845] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[16367.681848] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[16367.681850] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[16367.996454] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[16367.996458] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[16367.996485] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[16371.829118] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[16371.829227] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$ 

And from the video grab on shutdown:

3-8-7-NAPI-050521.png

Looks like the PCI REMOVE debug messages only show up in dmesg.
I'll check again -> Confirmed, only in dmesg.

Thanks so much for your help.

A.

Last edited by Altoid (2021-05-05 23:34:29)

Offline

#73 2021-05-06 06:25:22

geki
Member
Registered: 2019-02-04
Posts: 103  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Then, we need the dmesg output to have a complete view. cool

Offline

#74 2021-05-06 08:23:35

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

geki wrote:

Then, we need the dmesg output to have a complete view.

dmesg at boot + rmmod e1000e + modprobe -v e1000e, like this?

groucho@devuan:~$ sudo dmesg | grep "e1000e\|00:19.0"
[    0.744873] pci 0000:00:19.0: [8086:10bd] type 00 class 0x020000
[    0.744888] pci 0000:00:19.0: reg 0x10: [mem 0xf5fc0000-0xf5fdffff]
[    0.744894] pci 0000:00:19.0: reg 0x14: [mem 0xf5ffe000-0xf5ffefff]
[    0.744901] pci 0000:00:19.0: reg 0x18: [io  0xac00-0xac1f]
[    0.744948] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
[    1.804885] e1000e: loading out-of-tree module taints kernel.
[    1.865505] e1000e: module verification failed: signature and/or required key missing - tainting kernel
[    2.004406] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[    2.025277] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[    2.042227] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    2.062117] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[    2.072709] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[    2.083251] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[    2.487257] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[    2.487259] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    2.487279] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[   26.640872] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[   26.653013] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
[  738.114040] e1000e: PCI REMOVE PTP                            # here starts rmmod e1000e
[  738.114045] e1000e: PCI REMOVE TIMER
[  738.114048] e1000e: PCI REMOVE CANCEL WORK SYNC
[  738.114050] e1000e: PCI REMOVE HW TIMESTAMP
[  738.114074] e1000e: NETDEV CLOSE ENTERED
[  738.114076] e1000e: NETDEV CLOSE WAIT DONE
[  738.114077] e1000e: NETDEV CLOSE DEV IS PRESENT
[  738.344182] e1000e: NETDEV CLOSE DEV IS DOWN
[  738.344196] e1000e: NETDEV CLOSE FREE IRQ
[  738.344201] e1000e 0000:00:19.0 eth0: NIC Link is Down
[  738.344203] e1000e: NETDEV CLOSE LINK DOWN MSG
[  738.344205] e1000e: NETDEV CLOSE NAPI DISABLED
[  738.344213] e1000e: NETDEV CLOSE FREE TX RES
[  738.344236] e1000e: NETDEV CLOSE FREE RX RES
[  738.344238] e1000e: NETDEV CLOSE VLAN DONE
[  738.344240] e1000e: NETDEV CLOSE HW CTRL RELEASED
[  738.344243] e1000e: NETDEV CLOSE DONE
[  738.364058] e1000e: PCI REMOVE UNREGISTER NETDEV
[  738.364062] e1000e: PCI REMOVE WAKE NO RESUME
[  738.364065] e1000e: PCI REMOVE RELEASE HW CONTROL
[  738.364099] e1000e: PCI REMOVE INT AND TX RX RING
[  738.364112] e1000e: PCI REMOVE SELECTED REGIONS
[  738.380049] e1000e: PCI REMOVE FREE NETDEV
[  738.380052] e1000e: PCI REMOVE DISABLE ERR REPORTING
[  738.380172] e1000e: PCI REMOVE DISABLE DEVICE                 # here starts modprobe -v e1000e 
[  752.604908] e1000e: Intel(R) PRO/1000 Network Driver - 3.8.7-NAPI
[  752.604913] e1000e: Copyright(c) 1999 - 2020 Intel Corporation.
[  752.605114] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[  752.605116] e1000e 0000:00:19.0: PHY Smart Power Down Disabled
[  752.605118] e1000e 0000:00:19.0: EEE Support was initialized to be enabled
[  752.605119] e1000e 0000:00:19.0: EEE Support has been reset to be disabled
[  752.924225] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:14:4f:4a:a2:81
[  752.924230] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[  752.924255] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 6, PBA No: FFFFFF-0FF
[  755.756888] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[  755.756997] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
[  756.562154] e1000e: NETDEV CLOSE ENTERED
[  756.562159] e1000e: NETDEV CLOSE WAIT DONE
[  756.562161] e1000e: NETDEV CLOSE DEV IS PRESENT
[  756.792199] e1000e: NETDEV CLOSE DEV IS DOWN
[  756.792212] e1000e: NETDEV CLOSE FREE IRQ
[  756.792216] e1000e 0000:00:19.0 eth0: NIC Link is Down
[  756.792218] e1000e: NETDEV CLOSE LINK DOWN MSG
[  756.792219] e1000e: NETDEV CLOSE NAPI DISABLED
[  756.792227] e1000e: NETDEV CLOSE FREE TX RES
[  756.792251] e1000e: NETDEV CLOSE FREE RX RES
[  756.792253] e1000e: NETDEV CLOSE VLAN DONE
[  756.792255] e1000e: NETDEV CLOSE HW CTRL RELEASED
[  756.792258] e1000e: NETDEV CLOSE DONE
[  757.309716] e1000e: NETDEV CLOSE ENTERED
[  757.309721] e1000e: NETDEV CLOSE WAIT DONE
[  757.309722] e1000e: NETDEV CLOSE DEV IS PRESENT
[  757.540196] e1000e: NETDEV CLOSE DEV IS DOWN
[  757.540212] e1000e: NETDEV CLOSE FREE IRQ
[  757.540217] e1000e 0000:00:19.0 eth0: NIC Link is Down
[  757.540218] e1000e: NETDEV CLOSE LINK DOWN MSG
[  757.540220] e1000e: NETDEV CLOSE NAPI DISABLED
[  757.540228] e1000e: NETDEV CLOSE FREE TX RES
[  757.540250] e1000e: NETDEV CLOSE FREE RX RES
[  757.540251] e1000e: NETDEV CLOSE VLAN DONE
[  757.540253] e1000e: NETDEV CLOSE HW CTRL RELEASED
[  757.540256] e1000e: NETDEV CLOSE DONE
[  759.336885] e1000e 0000:00:19.0 eth0: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[  759.336993] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
groucho@devuan:~$ 

Thanks in advance.

A.

Last edited by Altoid (2021-05-06 08:36:08)

Offline

#75 2021-05-09 01:02:04

Altoid
Member
Registered: 2017-05-07
Posts: 1,415  

Re: Linux e1000e module removal and e1000e EEE timer - Part II

Hello:

Update

Altoid wrote:

All shutdowns normal up to now. 050520201@21:03 GMT

That has not changed.
But I did get a bad boot which then resulted in a bad shutdown.

A bad boot is when on starting up the system, both CPU and case fans start to run at 100% and the BIOS stops with a "CPU Fan error" notice.
It asks if you want to continue or press ESC or an Fx to abort, can't recall.

Sorry, no video or grab as I was obviously not ready for it.

This has not happened in a while and although the fans running at 100% is what this bad boot has in common with the bad shutdown, I have always thought they were for different causes.
Sun Microsystems had at one time diagnosed (but evidently not bothered to fix) this problem in a Sun Product Notes *.pdf for this WS (2009) where it says it can happen and why:

Sun MS wrote:

CPU Fan Error Might Occur After Power On
If you power on the workstation before the system enters the S3 sleep
state, a CPU fan error might occur.

It also provided a workaround which consists in accessing the Management Engine (ME) BIOS Setup utility to change the power policies.
You have to set ME "Firmware Power Control" to ON and "Host Sleep States" to ON in S0, S3.
I changed "Host Sleep States" from S1, S3 to S0, S3 but every so often the CPU Fan error came around again.

[rant]
But why would I want this?
It is basically allowing Intel ME to start up you workstation remotely.
[/rant]

So I tried to set ME "Firmware Power Control" or "Host Sleep States" to OFF, effectively disabling sleep of any type in my box.
Because ...
WTHF does a server/workstation need a damn S state different than S5 for? 

As a result all havok broke loose: on reboot with the box frozen at the start of the BIOS sequence, both CPU and case fans at 100%.
I was scared shitless that my new WS was done for.
Only way out was a hard shutdown, a CMOS clear and a ME BIOS reflash.

I believe that this is closely related to the fact that it is not possible to disable the on-board GbE LAN in the BIOS. (it is greyed out)
That and the Intel e1000 driver is *always* enabling WoL no matter what settings you give it.
Which is why I had WoL set to OFF both at boot and at shutdown via a shutdown script.

In any case, "Host Sleep States" is evidently set to ON and S1, (not S0) and S3.

I insisted with my attempt once again before starting the first part of this thread, with the same results.
Not as scared and more confident around the hardware than the first time I tried it, but in a sweat till I saw a working boot screen come up.

But I digress ...

Instead of aborting the boot sequence I continued to boot into Devuan, which went on without any other problem than the fans blowing continuously at 100%.

I got a copy of dmesg checked that everything was working properly and proceeded to shut down as I am usually doing these days. ie: plain shutdown -h now, no script.

The result was another bad shutdown, like the ones I usually get.
Here's the shutdown screen:

bad-boot.png

No different than what I am getting these days with a normal shutdown.
ie: contains no debug data.

I will edit the shutdown script to disable WoL as I had been doing to see if there's any change in this behaviour.

Thanks in advance,

A.

Offline

Board footer