The officially official Devuan Forum!

You are not logged in.

#1 2022-04-22 15:38:36

hall00ween
Member
Registered: 2022-04-21
Posts: 3  

Info gathering for mce: [Hardware error]

I was running IPython and an error string began to populate in the terminal from outside of IPython, with the terminal becoming non-responsive. I don't have those error messages, but after two or three times closing terminal, opening IPython, receiving those errors - the pc crashed. After reboots, the desktop environment would load and run for ~30 seconds, then crash. It happened a handful of times. Eventually only the error would populate in the syslogs (see below) but the computer would no longer crash. After the DE stabilized, I ran `memtester` for 24 hours and I haven't received the above error today.

From /var/sys/syslog:

Apr 21 09:43:36 kernel: [ 1561.536220] mce: [Hardware Error]: Machine check events logged
Apr 21 09:43:36 kernel: [ 1561.536226] [Hardware Error]: Corrected error, no action required.
Apr 21 09:43:36 kernel: [ 1561.536228] [Hardware Error]: CPU:0 (17:71:0) MC14_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000010b
Apr 21 09:43:36 kernel: [ 1561.536231] [Hardware Error]: Error Addr: 0x00000000000050c0
Apr 21 09:43:36 kernel: [ 1561.536232] [Hardware Error]: IPID: 0x000700b020750300, Syndrome: 0x0000001f2a1f0207
Apr 21 09:43:36 kernel: [ 1561.536234] [Hardware Error]: L3 Cache Ext. Error Code: 0, Shadow Tag Macro ECC Error.
Apr 21 09:43:36 kernel: [ 1561.536235] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: GEN

Hardware:

jamie@rorschach:~$ inxi -F
System:    Kernel: 5.10.0-13-amd64 x86_64 bits: 64 Desktop: Xfce 4.16.0 Distro: Devuan GNU/Linux 4 (chimaera) 
Machine:   Type: Desktop Mobo: Micro-Star model: MEG X570 UNIFY (MS-7C35) v: 2.0 serial: <superuser required> 
           UEFI [Legacy]: American Megatrends v: A.20 date: 11/06/2019 
CPU:       Info: 12-Core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP L2 cache: 6 MiB 
           Speed: 2807 MHz min/max: 2200/5796 MHz Core speeds (MHz): 1: 2807 2: 1866 3: 2200 4: 2198 5: 2799 6: 1863 7: 1867 
           8: 2200 9: 2197 10: 2199 11: 4107 12: 2003 13: 1866 14: 1998 15: 2197 16: 2191 17: 2192 18: 2200 19: 2196 20: 2198 
           21: 3599 22: 2007 23: 2007 24: 2200 
Graphics:  Device-1: NVIDIA GP107GL [Quadro P400] driver: nvidia v: 460.91.03 
           Display: x11 server: X.Org 1.20.11 driver: loaded: nvidia unloaded: fbdev,modesetting,nouveau,vesa resolution: 
           1: 1200x1920~60Hz 2: 2560x1440~60Hz 
           OpenGL: renderer: Quadro P400/PCIe/SSE2 v: 4.6.0 NVIDIA 460.91.03 
Audio:     Device-1: NVIDIA GP107GL High Definition Audio driver: snd_hda_intel 
           Device-2: Advanced Micro Devices [AMD] Starship/Matisse HD Audio driver: snd_hda_intel 
           Device-3: Logitech Logitech Webcam C925e type: USB driver: snd-usb-audio,uvcvideo 
           Sound Server: ALSA v: k5.10.0-13-amd64 
Network:   Device-1: Realtek RTL8125 2.5GbE driver: r8169 
           IF: eth0 state: down mac: 2c:f0:5d:08:35:a5 
           Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi 
           IF: wlan0 state: up mac: 04:ed:33:e0:10:c3 
Bluetooth: Device-1: Intel type: USB driver: btusb 
           Message: Required tool hciconfig not installed. Check --recommends 
Drives:    Local Storage: total: 4.55 TiB used: 36.29 GiB (0.8%) 
           ID-1: /dev/sda vendor: Seagate model: ST4000DM004-2CV104 size: 3.64 TiB 
           ID-2: /dev/sdb vendor: SanDisk model: SSD PLUS 1000GB size: 931.52 GiB 
Partition: ID-1: / size: 27.33 GiB used: 9.48 GiB (34.7%) fs: ext4 dev: /dev/sdb1 
           ID-2: /home size: 887.38 GiB used: 26.82 GiB (3.0%) fs: ext4 dev: /dev/sdb6 
Swap:      ID-1: swap-1 type: partition size: 976 MiB used: 0 KiB (0.0%) dev: /dev/sdb5 
Sensors:   System Temperatures: cpu: 44.1 C mobo: N/A gpu: nvidia temp: 45 C 
           Fan Speeds (RPM): N/A gpu: nvidia fan: 34% 
Info:      Processes: 389 Uptime: 4m Memory: 62.82 GiB used: 2.27 GiB (3.6%) Shell: Bash inxi: 3.3.01 

I gather this/these are likely a hardware issues. Any insight into which components? Web-searches have come up with little for me.

Also, one other question - will installing `mcelog` on to Devuan 4.0 be problematic? It seems available according to the website https://www.mcelog.org/installation.html though I am uncertain about this line: "If your distribution has a old crontab based mcelog disable it to avoid conflicts. The easiest way is to delete the mcelog cronjob file in /etc/cron.*" Thanks

Offline

#2 2022-04-23 15:37:28

hall00ween
Member
Registered: 2022-04-21
Posts: 3  

Re: Info gathering for mce: [Hardware error]

I went for it, installing mcelog -- following these instructions, not to completion though. I copied `mcelog` to `/etc/init.d/`. I learned from here how to start `mcelog` without `chkconfig`.

Though, I've reached another hurdle: starting the process

jamie@rorschach:~$ /etc/init.d/mcelog start
/dev/mcelog not active
Starting mcelog daemon
/etc/init.d/mcelog: 58: startproc: not found

What is the Devuan equivalent for `startproc`, `killproc`, and `checkproc`?

Offline

#3 2022-04-25 14:13:53

Head_on_a_Stick
Member
From: London
Registered: 2019-03-24
Posts: 2,772  

Re: Info gathering for mce: [Hardware error]

hall00ween wrote:

installing mcelog

No point doing that, it doesn't support your processor.

Have you installed the AMD µcode package? I think Ryzens fall over without the fixes.


"Who's the idiot in charge?" — ralph.ronnquist

Offline

#4 2022-10-22 19:35:42

hall00ween
Member
Registered: 2022-04-21
Posts: 3  

Re: Info gathering for mce: [Hardware error]

Apologies for the lengthy delay in reply. I installed the AMD ucode package. But the system crashes still continue. I am in the process now of talking to AMD about it.

Offline

#5 2022-10-23 23:33:25

GlennW
Member
Registered: 2019-07-18
Posts: 302  

Re: Info gathering for mce: [Hardware error]

I think you need a newer kernel version... greater than 5.18

Last edited by GlennW (2022-10-23 23:34:11)

Offline

#6 2022-10-24 06:12:06

Head_on_a_Stick
Member
From: London
Registered: 2019-03-24
Posts: 2,772  

Re: Info gathering for mce: [Hardware error]

I would try nouveau. Bloody NVIDIA.


"Who's the idiot in charge?" — ralph.ronnquist

Offline

Board footer