Dying machine?

TCH · 2023-01-31 21:53:35

Today afternoon my machine suddenly stopped to boot into desktop. During boot, it was stuck when this was printed:

[   14.526805] device-mapper: uevent: version 1.0.3
[   14.526976] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@redhat.com

After it was printed out, the HDD led was blinking for awhile, but then the system hanged. Three consecutive times. I booted into recovery mode and that worked, i even was able to run startx and get a desktop. Then i powered down the machine and started it again. This time it got to the desktop.

I did not change anything today in my system, i just wrote (purely notes) into some txt-s and downloaded one or two videos, but not to the system disk.

What was that? Is my machine dying? I did the following commands:

dmesg > dmesg.log
dmidecode > dmidecode.log
hdsentinel > hdsentinel.log
hwinfo > hwinfo.log
lspci -vvv -nn > lspci.log
smartctl -a /dev/sda > smartctl.log && smartctl -a /dev/sdb >> smartctl.log && smartctl -a /dev/sdc >> smartctl.log && smartctl -a /dev/sdd >> smartctl.log

and along with boot.log, kern.log, messages, syslog in /var/log i packed in and put it up there: http://oscomp.hu/depot/syslogs.zip

Can anyone tell me, how can find possible hardware faults in these logs? What to look, where to look, how to look...?

The OS is Devuan 4.

Thanks in advance...

Dutch_Master · 2023-01-31 22:45:17

The AM2+ platform is about a decade old now, so I'm not surprised if your hardware is indeed on its way out. You may want to research upgrading your system to AM4, even if AM5 has launched. AM4 is relatively cheap (especially against AM5 and current gen Intel stuff) and I'd recommend a B450 mainboard, a 3000 series GPU and 2x 16GB DDR4 RAM, while reusing the GPU, case and PSU. Alternatively, obtain a R5 4600G APU and leave the GPU out, it has better video output anyway. Your existing SATA drives will still work, but the average AM4 mainboard only has 4 SATA ports, so it's wise to invest in a 1TB NVMe drive (those are now below 100USD as well as <100€) for the OS. In fact, I just purchased a 2TB NVMe drive (Chinese, of course) for just over 110€. More the adequate space for your regular desktop and you can use any HDD's from your current system for backups.

As for the logs: they repeat the same stuff several times. But you didn't include the lshw output. And store these files as text (.txt), the .log is pretty much useless (they're text files anyway, so why not designate them as such)

HTH!

delgado · 2023-01-31 23:19:36

Assuming it Is it the same machine from the "Block device detection" thread https://dev1galaxy.org/viewtopic.php?id=5508: The answer is probably yes.

Randomly changing block device names are not good, there should be a reason. One possible is defective hardware (I remember a dying machine, which started printing "updating dmi data pool ... success" on almost every boot)

Altoid · 2023-02-01 00:08:33

Hello:

Dutch_Master wrote:

... AM2+ platform is about a decade old now ...

“Reports of my death have been greatly exaggerated” Wrongly attributed to Samuel Clemens (1835-1910).

I run Devuan Beowulf on a Sun Microsystems Ultra 24 WS purchased in late 2015, second hand with ~ 4 years' use already on it.
It has been running without issues for the past eight years on an Intel Q9550, 8.0Gb RAM, 4xSAS drives and a pair of Nvidia FX580s.

So it is at least 13 year old rig.

I also run Devuan Beowulf on an Asus 1000HE w/2Gb RAM and the original HDD, also purchased second hand in 2010 with less than a years use on it.
Some plastics and a USB port went bad a couple of years ago so I snapped up a twin for US$50, transplanted healthy plastics and motherboard, sold the twin's lid/screen making a nifty US$30 profit and never had to look back. Only problem is that the battery pack/s are rather flaky.

Not bad for a 14 year old economy netbook ...

Many years ago I purchased a box of eight 9.1Gb Ultra SCSI IBM drives from a chap who had been tasked with physically destroying some hardware that was being retired. Of the eight drives, only one had issues, the rest went on to work for me for another 5 years.

My longest lasting harware?
A Umax S-6E SCSI scanner, which I purchased new in 1996 and works perfectly well ...

Moral of the story?

Decent hardware can and often will last way beyond what people, driven by the dazzle and hype of the newest and fastest expect.

Best,

A.

TCH · 2023-02-01 00:34:55

Dutch_Master wrote:

But you didn't include the lshw output. And store these files as text (.txt), the .log is pretty much useless (they're text files anyway, so why not designate them as such)

lshw added, thanks. As for the filename: we're under UNIX. It does not matter if the file ends with .txt or .log.

delgado wrote:

Assuming it Is it the same machine from the "Block device detection" thread https://dev1galaxy.org/viewtopic.php?id=5508: The answer is probably yes.
Randomly changing block device names are not good, there should be a reason. One possible is defective hardware (I remember a dying machine, which started printing "updating dmi data pool ... success" on almost every boot)

It is the same hardware, but the changing order only appeared after i upgraded to Devuan 4. It is most probably a kernel change.

But how can i debug if the hardware is erroneous?

steve_v · 2023-02-01 06:36:39

TCH wrote:

how can i debug if the hardware is erroneous?

I'd start by passing 'debug' on the kernel command line and seeing if it spits out anything more useful, then proceed to booting a different operating system to eliminate software entirely.
Most anything will likely do for that, but since the hardware was almost certainly designed to run Windows, as distasteful as it may be that's not a completely terrible option for testing.

As for isolating a hardware fault, the obvious answer would be to try to reproduce the problem in as minimal a configuration as you can. Remove expansion cards and extraneous peripherals, swap or replace PSU, memory modules, that kind of thing.
I don't see a smoking gun in your logs (though I do wonder what exactly pppd is up to at the end there), so a process of elimination would be the next logical step.

Aside, what Altoid said. I have plenty of old hardware, some of it going back to the mid '90s, and it still works just fine.
Assuming something is no good simply because it's old is kinda silly (as is insisting on DOS filename extensions when we have perfectly good magic for that matter).

aluma · 2023-02-01 12:00:10

Just a similar case that I encountered a month ago.

Ancient motherboard GA-945GCM-S2L.
In BIOS access mode Ide Channel "Auto".
In fact, a 300GB disk when loading "LBA", when rebooting "CHS".
Setting the BIOS to "LBA" solved the problem.
For me, the conclusion is that old things can be weird.

delgado · 2023-02-01 12:38:10

steve_v wrote:

As for isolating a hardware fault, the obvious answer would be to try to reproduce the problem in as minimal a configuration as you can. Remove expansion cards and extraneous peripherals, swap or replace PSU, memory modules, that kind of thing.

I would like to add:
A bad contact may/can be cured by pulling the connectors off and on again.
Unpluging and plug in again any reacheable cable (and card) may fix the problem. If you don't know exactly what you are doing, just be careful with electrostatic charges, don't use too much force on connectors; it's a bit like Lego, use your brain, unplug main power of course; everything should be straight and fitting, otherwise it is incorect.
(I still think the parrot is probably dead https://www.youtube.com/watch?v=vZw35VUBdzo , but) It's worth a try, if you feel comfortable doing so.

TCH · 2023-02-01 12:51:24

steve_v wrote:

I'd start by passing 'debug' on the kernel command line and seeing if it spits out anything more useful, then proceed to booting a different operating system to eliminate software entirely.

Thanks for the tip. If this occur again, i'll try the debug argument and a live Linux or BSD.

steve_v wrote:

Most anything will likely do for that, but since the hardware was almost certainly designed to run Windows, as distasteful as it may be that's not a completely terrible option for testing.

I don't have any windows. (Aside from an xp in VBox, but i do not have the install media for it, i just borrowed it from my brother.) I only have UNIX-es, "alternative" OS-es and retro stuff here.

steve_v wrote:

As for isolating a hardware fault, the obvious answer would be to try to reproduce the problem in as minimal a configuration as you can. Remove expansion cards and extraneous peripherals, swap or replace PSU, memory modules, that kind of thing.

Now, that you've mentioned memory...i think it is time for another long Memtest86+ session; thanks for the tip.

steve_v wrote:

I don't see a smoking gun in your logs (though I do wonder what exactly pppd is up to at the end there), so a process of elimination would be the next logical step.

Thanks for checking them. nipos from DaemonForums said the same as you, so it is reassuring. As for pppd, when i changed my ISP (Deutsche Telekom was terrible), the new ISP (RCS&RDS) did not provide a "router-modem", just a modem, so i connected to the net for awhile with pon. Then i bought a TP-Link router and it seems i forgot to disable pppd; thanks for pointing out.

steve_v wrote:

Aside, what Altoid said. I have plenty of old hardware, some of it going back to the mid '90s, and it still works just fine.
Assuming something is no good simply because it's old is kinda silly (as is insisting on DOS filename extensions when we have perfectly good magic for that matter).

I did not assumed it is not good, because it is old, i use this 11 years old machine, because i am content with it. Besides i love old machines; check my "desktop": http://oscomp.hu/depot/tch_desktop.jpg
(Although since then, i switched to LCD from CRT.)

aluma wrote:

Just a similar case that I encountered a month ago.
Ancient motherboard GA-945GCM-S2L.
In BIOS access mode Ide Channel "Auto".
In fact, a 300GB disk when loading "LBA", when rebooting "CHS".
Setting the BIOS to "LBA" solved the problem.
For me, the conclusion is that old things can be weird.

Stupid question: i have SATA disks; is a setting for IDE related? Also, if this would be the cause, then would not it cause problem all the time?

delgado wrote:

I would like to add:
A bad contact may/can be cured by pulling the connectors off and on again.
Unpluging and plug in again any reacheable cable (and card) may fix the problem. If you don't know exactly what you are doing, just be careful with electrostatic charges, don't use too much force on connectors; it's a bit like Lego, use your brain, unplug main power of course; everything should be straight and fitting, otherwise it is incorect.
(I still think the parrot is probably dead https://www.youtube.com/watch?v=vZw35VUBdzo , but) It's worth a try, if you feel comfortable doing so.

This parrot is still alive, it may only have rested a bit. Two guys already said, the logs shows no signs of failure.
But thanks for sayin', i think the machine needs some cleanup and contact-spray...

aluma · 2023-02-01 13:08:31

@TCH

Stupid question: i have SATA disks; is a setting for IDE related? Also, if this would be the cause, then would not it cause problem all the time?

My 300 GB drive is also SATA.
This problem was discovered by accident. The updated openSuse did not load the /home partition at startup with the fdisk error "...does not match the number of blocks..." and started without errors on reboot.
Checking the disk with gparted-live-1.1.0-8-i686.iso did not reveal any errors.
But that's just my case, yours may be different.

P.S. Subjective opinion of the owner of two computers older than 10 years and a fan of TDE.
1. They are not interesting to developers. It is practically impossible to test them for compliance with the new soft.
2. I can give examples when the same USB flash drive with the image recorded by the dd command was normally loaded on one and crashed with an error on the other. In my case, with openSuse installed on another exegnu partition, it worked without problems.
3. There are not so many systems without systemd with TDE, exegnu is one of them. Another option is Pslinuxos, but this is a rolling release with all its "charms".
4. Practically, I would try a liveCD or just install it all on a separate "/" partition and play around with it for a while.
5. In any case, it's your computer and your decision, if you want to look for software bugs, why not?
6. Regarding hardware errors. The most unreliable thing is electrolytic capacitors in power circuits with their service life of 2-4 thousand hours.
If there are swollen ones, with the presence of a conventional 60 W sander and the ability to hold it, this is not a problem.
By the way, I had to replace two on the GA-945GCM-S2L motherboard a few months ago.

Last edited by aluma (2023-02-01 20:23:28)

TCH · 2023-02-01 21:30:45

Well, okay, if this occurs again, i'll check this BIOS setting, thanks for the tip.

As for the capacitors, i do not have an equipment for soldering/desoldering SMD capacitors. It would be different, if they would be PTH ones as they are in my A500+. (I've already recapped that machine...)

The officially official Devuan Forum!

#1 2023-01-31 21:53:35

Dying machine?

#2 2023-01-31 22:45:17

Re: Dying machine?

#3 2023-01-31 23:19:36

Re: Dying machine?

#4 2023-02-01 00:08:33

Re: Dying machine?

#5 2023-02-01 00:34:55

Re: Dying machine?

#6 2023-02-01 06:36:39

Re: Dying machine?

#7 2023-02-01 12:00:10

Re: Dying machine?

#8 2023-02-01 12:38:10

Re: Dying machine?

#9 2023-02-01 12:51:24

Re: Dying machine?

#10 2023-02-01 13:08:31

Re: Dying machine?

#11 2023-02-01 21:30:45

Re: Dying machine?

Board footer