You are not logged in.
I was running IPython and an error string began to populate in the terminal from outside of IPython, with the terminal becoming non-responsive. I don't have those error messages, but after two or three times closing terminal, opening IPython, receiving those errors - the pc crashed. After reboots, the desktop environment would load and run for ~30 seconds, then crash. It happened a handful of times. Eventually only the error would populate in the syslogs (see below) but the computer would no longer crash. After the DE stabilized, I ran `memtester` for 24 hours and I haven't received the above error today.
From /var/sys/syslog:
Apr 21 09:43:36 kernel: [ 1561.536220] mce: [Hardware Error]: Machine check events logged
Apr 21 09:43:36 kernel: [ 1561.536226] [Hardware Error]: Corrected error, no action required.
Apr 21 09:43:36 kernel: [ 1561.536228] [Hardware Error]: CPU:0 (17:71:0) MC14_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000010b
Apr 21 09:43:36 kernel: [ 1561.536231] [Hardware Error]: Error Addr: 0x00000000000050c0
Apr 21 09:43:36 kernel: [ 1561.536232] [Hardware Error]: IPID: 0x000700b020750300, Syndrome: 0x0000001f2a1f0207
Apr 21 09:43:36 kernel: [ 1561.536234] [Hardware Error]: L3 Cache Ext. Error Code: 0, Shadow Tag Macro ECC Error.
Apr 21 09:43:36 kernel: [ 1561.536235] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: GEN
Hardware:
jamie@rorschach:~$ inxi -F
System: Kernel: 5.10.0-13-amd64 x86_64 bits: 64 Desktop: Xfce 4.16.0 Distro: Devuan GNU/Linux 4 (chimaera)
Machine: Type: Desktop Mobo: Micro-Star model: MEG X570 UNIFY (MS-7C35) v: 2.0 serial: <superuser required>
UEFI [Legacy]: American Megatrends v: A.20 date: 11/06/2019
CPU: Info: 12-Core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP L2 cache: 6 MiB
Speed: 2807 MHz min/max: 2200/5796 MHz Core speeds (MHz): 1: 2807 2: 1866 3: 2200 4: 2198 5: 2799 6: 1863 7: 1867
8: 2200 9: 2197 10: 2199 11: 4107 12: 2003 13: 1866 14: 1998 15: 2197 16: 2191 17: 2192 18: 2200 19: 2196 20: 2198
21: 3599 22: 2007 23: 2007 24: 2200
Graphics: Device-1: NVIDIA GP107GL [Quadro P400] driver: nvidia v: 460.91.03
Display: x11 server: X.Org 1.20.11 driver: loaded: nvidia unloaded: fbdev,modesetting,nouveau,vesa resolution:
1: 1200x1920~60Hz 2: 2560x1440~60Hz
OpenGL: renderer: Quadro P400/PCIe/SSE2 v: 4.6.0 NVIDIA 460.91.03
Audio: Device-1: NVIDIA GP107GL High Definition Audio driver: snd_hda_intel
Device-2: Advanced Micro Devices [AMD] Starship/Matisse HD Audio driver: snd_hda_intel
Device-3: Logitech Logitech Webcam C925e type: USB driver: snd-usb-audio,uvcvideo
Sound Server: ALSA v: k5.10.0-13-amd64
Network: Device-1: Realtek RTL8125 2.5GbE driver: r8169
IF: eth0 state: down mac: 2c:f0:5d:08:35:a5
Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi
IF: wlan0 state: up mac: 04:ed:33:e0:10:c3
Bluetooth: Device-1: Intel type: USB driver: btusb
Message: Required tool hciconfig not installed. Check --recommends
Drives: Local Storage: total: 4.55 TiB used: 36.29 GiB (0.8%)
ID-1: /dev/sda vendor: Seagate model: ST4000DM004-2CV104 size: 3.64 TiB
ID-2: /dev/sdb vendor: SanDisk model: SSD PLUS 1000GB size: 931.52 GiB
Partition: ID-1: / size: 27.33 GiB used: 9.48 GiB (34.7%) fs: ext4 dev: /dev/sdb1
ID-2: /home size: 887.38 GiB used: 26.82 GiB (3.0%) fs: ext4 dev: /dev/sdb6
Swap: ID-1: swap-1 type: partition size: 976 MiB used: 0 KiB (0.0%) dev: /dev/sdb5
Sensors: System Temperatures: cpu: 44.1 C mobo: N/A gpu: nvidia temp: 45 C
Fan Speeds (RPM): N/A gpu: nvidia fan: 34%
Info: Processes: 389 Uptime: 4m Memory: 62.82 GiB used: 2.27 GiB (3.6%) Shell: Bash inxi: 3.3.01
I gather this/these are likely a hardware issues. Any insight into which components? Web-searches have come up with little for me.
Also, one other question - will installing `mcelog` on to Devuan 4.0 be problematic? It seems available according to the website https://www.mcelog.org/installation.html though I am uncertain about this line: "If your distribution has a old crontab based mcelog disable it to avoid conflicts. The easiest way is to delete the mcelog cronjob file in /etc/cron.*" Thanks
Offline
I went for it, installing mcelog -- following these instructions, not to completion though. I copied `mcelog` to `/etc/init.d/`. I learned from here how to start `mcelog` without `chkconfig`.
Though, I've reached another hurdle: starting the process
jamie@rorschach:~$ /etc/init.d/mcelog start
/dev/mcelog not active
Starting mcelog daemon
/etc/init.d/mcelog: 58: startproc: not found
What is the Devuan equivalent for `startproc`, `killproc`, and `checkproc`?
Offline
installing mcelog
No point doing that, it doesn't support your processor.
Have you installed the AMD µcode package? I think Ryzens fall over without the fixes.
Brianna Ghey — Rest In Power
Offline
Apologies for the lengthy delay in reply. I installed the AMD ucode package. But the system crashes still continue. I am in the process now of talking to AMD about it.
Offline
I think you need a newer kernel version... greater than 5.18
Last edited by GlennW (2022-10-23 23:34:11)
pic from 1993, new guitar day.
Offline
I would try nouveau. Bloody NVIDIA.
Brianna Ghey — Rest In Power
Offline