The officially official Devuan Forum!

You are not logged in.

#1 2018-10-14 23:18:42

ghaverla
Member
From: Dawson Creek, BC, Canada
Registered: 2017-06-19
Posts: 111  

Ryzen idle lockup

Greetings.

I have two computers now, running Ryzen processors - 1600X and a 1600.  I don't believe the 1600X has this CPU hardware bug, but the 1600 does.  Just running BOINC jobs, I can have the system lockup  even having 6 active threads (so 6 of 12 CPU threads).

I have a new enough BIOS that I can set the Power Idle State (or however it is described) to something like typical.  There are 2 newer BIOS available, that I haven't installed (I have the 4.6 BIOS? installed, as it has NVME support).  ASRock motherboard in this thing.

I booted a while ago, with idle=nomwait and after a couple of days it locked up (6 BOINC jobs).  I had before and after that, booted with rcu_nocbs=0-11.  For Devuan kernels, I believe that instruction is ignored because some CONFIG_RCU thing isn't set.  Debian has a bug thread, where the last entry I believe is February of 2018, asking for kernels to be compiled with that config variable set, to which there has been no response.

I just rebooted with
rcu_nocbs=0-11 processor.max_cstate=5 idle=nomwait
and reduced the BOINC jobs to 3.

All this computer is doing, is BOINC jobs.  So nothing important.  If people at Devuan are interested in this bug, and want me to try things, I can do that.  I believe it is currently running the 4.18.0-2 kernel package (which is most current).  There are rumours that the 4.19 kernel may have some fixes for this bug.  There were rumours in the March to May timeframe of 2018, that AMD had release updates to the BIOS which cover this.

I did install the zenstates.py program from github, but so far I have only used it to list things.

Offline

#2 2018-10-15 14:36:25

Centurion Dan
Member
Registered: 2016-12-06
Posts: 9  

Re: Ryzen idle lockup

@ghaverla, the obvious thing to do is update the bios to the latest version and let us know if the issue persists.

Offline

#3 2018-10-16 15:55:51

ghaverla
Member
From: Dawson Creek, BC, Canada
Registered: 2017-06-19
Posts: 111  

Re: Ryzen idle lockup

I had read about people having filesystem issues when these idle CPUs just go to sleep, I hadn't experienced one.  Yesterday, this machine with now only 3 BOINC jobs running froze.

So, this morning I installed the latest (4.80) BIOS, and went looking for this power supply idle control setting, to change it to "typical" (or common).  When I went to reboot (from the "disks" in the system), the kernel panicked.  As rEFInd is my bootloader on this machine, I put a rEFInd DVD in the machine, and that was able to boot to my root partition.  I may have some work to do with fs repair.  But, the machine is busy doing almost nothing again, to see if this newer BIOS fixes the idle freeze problem.

How I am monitoring this, is to ssh -X into that machine, and run xsensors in the background.  When the machine locks up, the xsensors display on this machine either dies or goes strange.

Offline

#4 2018-10-16 20:00:34

ghaverla
Member
From: Dawson Creek, BC, Canada
Registered: 2017-06-19
Posts: 111  

Re: Ryzen idle lockup

I guess there was no "real" problem with filesystems.  The only thing that seems to have happened, is the "dirty bit" was set on the VFAT mounted at /boot/efi.

Fixing that "dirty bit", it boots now.

However, that new BIOS doesn't fix the problem.  At least not without kernel command line arguments.  I again added

rcu_nocbs=0-11 processor.max_cstate=5 idle=nomwait

to the kernel command line.  We'll see if that helps with the freezing in idle.

Offline

#5 2018-10-17 13:46:43

ghaverla
Member
From: Dawson Creek, BC, Canada
Registered: 2017-06-19
Posts: 111  

Re: Ryzen idle lockup

It still freezes in idle.

The 4.19 kernel is rumoured to have some chance of fixing this, but I've also seen articles say similar things about earlier kernels.  Wishful thinking?  I've seen articles talking about overclocking being another way to avoid this freeze in idle, but I have never gotten into overclocking.  Are there any good documents which describe the various cstates and pstates and overclocking?

Offline

#6 2018-10-29 02:24:59

ghaverla
Member
From: Dawson Creek, BC, Canada
Registered: 2017-06-19
Posts: 111  

Re: Ryzen idle lockup

I restarted 7 BOINC jobs with the kernel command line trying to help, and after about 3 days it found a long enough idle to freeze.

I restarted (not getting the kernel command line mods in), so I used that zen program to disable cstate6.  The power went out before it crashed.  I guess I forgot to tell the power company that I was going to do this test.  :-)

I think I will wait for 4.19 before I start this again.

Offline

#7 2019-03-25 04:13:02

ghaverla
Member
From: Dawson Creek, BC, Canada
Registered: 2017-06-19
Posts: 111  

Re: Ryzen idle lockup

Too many posts under my name at the top of the list.

As it appears that the original ASRock motherboard was at least partially at fault for why the Ryzen 1600 would not work, it seems possible that that CPU is not a dud.

---

I picked up a A320 motherboard on sale.  If this Ryzen 1600 works, that's wonderful.  But I doubt the application requires a 12 core CPU.

Long ago, I heard of the stealth logserver.

I suppose the original stealth logserver was a printer attached to the console, and it just printed everything that happened.  If someone broke in, they could do nothing about the log entries already printed.  Regardless of kernel messages about printer on fire, which was always a "joke".

But the stealth logserver I remember from way back when; was a computer on a LAN, where all non-stealth computers were configured to send their logs to a public logserver.  And it logged everything.  But on the LAN was a computer with a NIC that had no assigned IP address, and the NIC was in promiscuous mode.  And so, it could also log everything.  As it had no IP address, there is no (easy) way to get to it, to erase logs.

More recently, writeups about stealth logservers, don't talk about a stealth (promiscuous) machine paralleling logs.  All the machines on the LAN are configured to send log messages to a machine which doesn't exist.

Is it better that the stealth logserver is paralleling a real logserver, or that the config of all the other machines makes it obvious that the logs are on a "stealth" machine?

Recent articles on this concept talk about running a network intrusion detector on a stealth machine.  Like snort.  The idea of a stealth machine, is that there is no way to get to it; it has no IP.  So, if the intrusion detection machine detects that an intrusion has happened, how does this message get to the LAN server and/or the router, to stop the intrusion?

Offline

#8 2019-03-25 08:14:29

ToxicExMachina
Member
Registered: 2019-03-11
Posts: 210  

Re: Ryzen idle lockup

ghaverla wrote:

Too many posts under my name at the top of the list.

As it appears that the original ASRock motherboard was at least partially at fault for why the Ryzen 1600 would not work, it seems possible that that CPU is not a dud.

Taichi series are the only good motherboards from asrock. Other ones from this vendor have tons of issues.

ghaverla wrote:

I picked up a A320 motherboard on sale.  If this Ryzen 1600 works, that's wonderful.  But I doubt the application requires a 12 core CPU.

Motherboard based on low-end chipset is grave mistake for powerful pre-hi-end CPU. You need at least B350 based motherboard for Ryzen 5 1600. A320 is an office-PC grade chipset designed for first generation Ryzen 3 CPUs exclusively. Usually such motherboards can't provide enough energy for powerful CPUs.

Some problems may also be caused by opcache. In firmware setup you should turn it off because in early revisions of first generation Ryzen CPUs there is a hardware bug in opcache.

Offline

Board footer