You are not logged in.
Then I better use the other print function. That one hopeefully prints something on pci remove.
Humm, I got confused. It is just right, that that message pops up there.... Though, your last screenshot of shutdown did not show any of the "PCI REMOVE" debug messages. Is that screenshot taken with the very latest patched moule build?
Humm, I'll have to check where that EEE TX LPI TIMER message comes from.... not from the pci shutdown callback.
JFYI, that warning is not mine. And that EEE TX LPI TIMER message is actually good. NULL means nothing active.
That is just a debug message, e heh, they did not remove yet.
Yes, indeed, another oversight.
And the next round of cleaned patches:
https://geki.selfhost.eu/hacks/1001-e10 … bled.patch
https://geki.selfhost.eu/hacks/1002-e10 … ages.patch
https://geki.selfhost.eu/hacks/1003-e10 … eeze.patch
https://geki.selfhost.eu/hacks/1004-e10 … ages.patch
Yes, therefore, I added the numbering prefix to the patches. That is the order to apply.
Yah, faster than for my own good.
Now, better naming and to have a complete list, these files may be applied:
https://geki.selfhost.eu/hacks/0001-e10 … bled.patch
https://geki.selfhost.eu/hacks/0002-e10 … s_v3.patch
https://geki.selfhost.eu/hacks/0003-e10 … eeze.patch
As a reminder for me:
- Patch 0001 needs default disabled globally, default enabled for EEE featured devices.
- In the end, remove all debug messages again, e heh.
geki wrote:... run a newer Kernel >= 5.5.0.
Just found: https://www.spinics.net/lists/stable/msg443520.html
Another netdev resource locking issue fixed.Well ...
That's interesting.
I wonder if these patches will get backported to Beowulf?
Just use the kernel from beowulf-backports and you are good.
Sadly, I overlooked one function that is called in the shutdown process... in src/netdev.c. There are these functions involved: e1000e_close (netdev callback seen early in your screen capture), e1000_remove and e1000_shutdown (pci device callbacks). In e1000_shutdown, it seems that the call to e1000e_pm_freeze is just superfluous. Other shutdown callbacks handle it; without that funny unprotected call to the netdev detach function. It seems that this call can safely be removed. I will do a V3 for debug messages, tomorrow.
And remember to run a newer Kernel >= 5.5.0. Just found: https://www.spinics.net/lists/stable/msg443520.html
Another netdev resource locking issue fixed.
Well, if it is not happening always, it is most likely an issue with concurrent access to one resource. In this case the netdev resource on shutdown.
This may help, also mentioned in that mailing list thread: https://www.kernel.org/doc/htmldocs/net … -lock.html
Actually, nothing more to test. It works but the confused initialization and shutdown. Let's see where it hangs in the end. I wonder, if the NETIF CLOSE callback is run in another thread than the PM FREEZE callback. And that they sporadically block each other with that stray detach call, which is obviously wrong in that scope.
geki wrote:... failures freely translated into sketching:
Cut a region out of sketch A, throw away the rest of sketch A ...Won't work.
Unless you paste it over another piece of paper. 8^)
Yeah, I skipped some details. The nomenclature is off, too, I guess.
Though, the procedures are quite similar in many business/work processes. And I guess its common pattern to start from the (local) root. Therefore, you see in good examples usually a cd /path/to/shelf/floor/sketch/ as first command. And since we are lazy, we base all changes onto that root. Otherwise things mix up and fail. Only thing we may have to adjust is the parameter -pN of patch for some flavors like Git Patch or extracted libraries of big projects, like a detail-sketch of a part of the overview-sketch. But then you have to extract patches respectively. For me, its all the same.
Yes, in the end it applied. But before, you wanted to apply the patch being in sub directory src. That just fails. Then, you told patch explicitly which file to apply the content onto. That just fails. I did not do that in my commands. That's all.
Compare below to what you tried before. And to my commands I posted. You may see the difference.
groucho@devuan:/$ sudo patch -p0 -i /usr/src/e1000e-patch/e1000e_387_v2.patch -d /usr/src/e1000e-3.8.7q/
patching file src/netdev.c
groucho@devuan:/$
In the end it applied. Therefore we can go on.
Edit The failures freely translated into sketching:
Cut a region out of sketch A, throw away the rest of sketch A, and then want to make corrections outside of that region into sketch B. It is just not there.
Edit The (ascii-art) sketch we work with here is the content of my above-printed output of ls -R. The files and directories are the parts and regions of the sketch.
Edit Repetition:
- patch parameter -p0 -> Sketch-wide, -p1 -> move into 1. level of regions, -pN -> move into N. level of sub regions. You may notice the implication, that if you want to modify multiple regions it fails.
- patch parameter "destination file to apply content onto" tells what part in a region to correct, but you be in a wrong region or part, oups you go https://www.youtube.com/watch?v=CduA0TULnow
Yep, you mix it up. And you do not follow my commands to the letter. There is more than src to it.
die@deiwl:~/Downloads/e1000e-3.8.7$ ls -R
.:
COPYING e1000e_387_param_eee_be_disabled.patch pci.updates SUMS
e1000e_387_netdev_shutdown_debug_messages.patch e1000e.7 README
e1000e_387_netdev_shutdown_debug_messages_v2.patch e1000e.spec src
./src:
80003es2lan.c common.mk hw.h kcompat_ethtool.c mac.h Module.supported nvm.h phy.h
80003es2lan.h defines.h ich8lan.c kcompat.h Makefile netdev.c param.c ptp.c
82571.c e1000.h ich8lan.h kcompat_overflow.h manage.c netdev.c.orig param.c.orig regs.h
82571.h ethtool.c kcompat.c mac.c manage.h nvm.c phy.c
Ado
$ grep ^\+\+\+ e1000e_387_netdev_shutdown_debug_messages_v2.patch
+++ src/netdev.c 2021-04-29 23:39:38.044088043 +0200
And
$ patch -p0 -i e1000e_387_netdev_shutdown_debug_messages_v2.patch -d Downloads/e1000e-3.8.7/
Now, what file will be patched? Right:
Downloads/e1000e-3.8.7/src/netdev.c
There is no magic involved. Just get your paths right.
Would this command set be correct? ie: applying the v2 patch to the copy of the e1000e-3.7.8p module.
# cd /usr/src/e1000e-3.8.7q/src/ && patch --dry-run -p0 -i /usr/src/e1000e-patch/e1000e_387_v2.patch
Close call. Would be instead:
$ pushd /usr/src/e1000e-3.8.7q/ # <<<-- That is the root of your driver's source code ^_-_^
$ patch -p0 -i /usr/src/e1000e-patch/e1000e_387_v2.patch
$ popd
Remember, if you enter sub directories like src, the -pN param increases respectively.
And take the pieces for hunks of the patch not being within that sub directory.
Alternative:
$ patch -p0 -i /usr/src/e1000e-patch/e1000e_387_v2.patch -d /usr/src/e1000e-3.8.7q/
That rocks.
And as a note: Do not mix source code directory and the actual c code directories. Source code is more than just the .{c*,h*} files. I guess you mix these.
Your patch command makes no sense. Copy my commands. And you succeed.
Edit If you want to be so smart I recommend you not to add the file, to forcefully try to apply patchfile content onto. But rather check out patch parameter -d. E heh.
Yes, issue of either incremental or absolute.
1. Use the commands I used being in root of your source tree. See my example work-flow. That is how everyone expects it to be used.
2. Do not confuse yourself with making different copies of the source tree. Keep just vanilla being patched and apply/unapply there.
3. If you need to do it that way: p -> q obviously needs only netdev shutown v2 patch (incremental)
4. Or somewhat sane: vanilla -> ${STRANGE_VERSIONING} obviously needs all, param and netdev shutdown v2 patch (absolute)
Now, I hope I got you some more confused. Keep it simple instead.
If you copy the param patched source, the param patch must not be applied again. That is all. And again, please use v2 of the netdev shutdown patch! Otherwise false positive hang or crash may happen.
Good that you ask. I acually reworked the second patch to keep a local pdev pointer from adapter->pdev, since we are shutting down. Who knows how long adapter->pdev is alive....
Patch v2: https://geki.selfhost.eu/hacks/e1000e_3 … s_v2.patch
Yes, please apply param patch and the v2 of netdev shutdown patch. Example: unapply e1000e_387_netdev_shutdown_debug_messages.patch and apply e1000e_387_netdev_shutdown_debug_messages_v2.patch instead, if you got that v1 applied already. That one hopefully does not crash unexpectedly.... May still crash... But we get v1still some clues!
Ask again, if you need.
Yes. Please apply both patches. We will get a better result.
Well, it is okay for the most parts, but hangs or crashes in corner-cases. Suspend and shutdown is similar in the driver, same functions used. This has nothing to do with EEE actually, now.
Hokay, this is quite tricky, therefore we do print debug messages to see where it hopefully goes wrong. But you have to hang the shutdown for this again and show the last messages printed on screen....
Patch: https://geki.selfhost.eu/hacks/e1000e_3 … ages.patch
I hope, that the message PM FREEZE NETIF RUNNING will not print. That is the obvious issue also discussed in the above-mentioned link. They mentioned e1000e driver, too.
Hokay, that stray EEE TX LPI TIMER: ... comes from function e1000e_flush_lpic in src/netdev.c and seems to be a harmless print. Who dares to care?! The more fun I see in the function e1000e_pm_freeze. Both functions are called last in shutdown/reboot. Here is a discussion about the sequence: https://netdev.vger.kernel.narkive.com/ … esume-flow
present = netif_device_present(netdev);
netif_device_detach(netdev);
if (present && netif_running(netdev)) {
int count = E1000_CHECK_RESET_COUNT;
while (test_bit(__E1000_RESETTING, &adapter->state) && count--)
usleep_range(10000, 11000);
WARN_ON(test_bit(__E1000_RESETTING, &adapter->state));
/* Quiesce the device without resetting the hardware */
e1000e_down(adapter, false);
e1000_free_irq(adapter);
}
Just another dumb man's thought, should that detach not be within the if-block?
It seems you just have to wait for the counter+sleep to finish?
geki wrote:Hooray for ...
... you.
Nay, these guys: https://en.wikipedia.org/wiki/Hooray_for_Boobies
If I understand correctly, the module has has EEE set to Enabled by default on all the devices it is used on, irrespective of the hardware supporting EEE.
Yes, that is the fun - in context to the above-mentioned songs.
If this is so, how can we be sure that some routine/code within the module is not broadcasting something EEEish and causing the freeze?
eg: the autonegotiation part of the code that is needed for EEE to work.Make sense?
Yes, looking for eee and lpi used without the catch of if (!eee_disable) ....