You are not logged in.
I assumed that I could safely boot into the live desktop on any PC, and it would not cause any changes to the PC unless I explicitly ran commands to change something on the PC. But I encountered two instances where the live desktop broke something on a PC.
* PC1
- OS: ubuntu 10.04
- RAID setup:
- /dev/md0 - RAID1: 2xSSD for the OS files. Metadata version: 0.90
- /dev/md1 - RAID1: 2xHDD for data files. Metadata version: 0.90
A few months back, I booted the Devuan Daedalus live desktop on this PC. After rebooting, my PC failed to boot after the initial grub menu. The error was something about not finding the disk. After some troubleshooting, I figured out that /dev/md0 failed to start, and it was caused by the Preferred Minor being changed from 0 to a large number, I think 127. The solution was to use a live desktop environment, assemble the disks in a raid using /dev/md0, while updating the Preferred Minor number. Something like this:
mdadm --assemble /dev/mdx --update=super-minor --uuid=<RAID UUID>
After booting back into the OS, the data RAID1 also failed to start for the same reason, and I fixed it using the same solution.
* PC2
- OS: devuan jessie
- RAID setup:
- /dev/md0 - RAID1: 2xSSD for the OS files. Metadata version: 0.90
- /dev/md1 - RAID5: 4xSSD for data files. Metadata version: 0.90
Last week, while testing the Devuan Excalibur Preview's memtest, I decided to boot into the live desktop just to see if there were any obvious problems to report. During the boot, the console output showed that it had autostarted the arrays, but the RAID5 was started with only THREE disks instead of 4. I have no idea why, and at the time, I didn't investigate because I was doing the memtest legacy vs UEFI boot tests.
After completing the memtest boot tests, I booted back into the PC. Unlike the first PC, this PC correctly started the OS RAID /dev/md0, and I'm not sure why. Maybe jessie's RAID driver is smarter than the ubuntu 10.04 RAID driver and it doesn't depend on the preferred minor number.
However, it reported that the /dev/md1 array had totally failed. In hindsight, what might have happened was that the RAID driver tried to start the 4 SSDs (which all had the same RAID UUID), but it detected that 3 of the disks were "out of sync" with the 4th, and it somehow decided to fail the other 3 (or maybe they were "removed") and keep the 4th. So /dev/md1 was started with 1 active disk, thus a failed array.
To use dummy driver letters w-z, the RAID 5 disks looked like this:
/dev/sdw - Preferred Minor 1
/dev/sdx - Preferred Minor 126
/dev/sdy - Preferred Minor 126
/dev/sdz - Preferred Minor 126
At first, I thought I had lost the data on the array. After thinking about it, I realized that the 3 disks x/y/z could work as a functioning (but degraded) array, and would still have the data. The solution was to stop the currently running failed md1 array containing just sdw, wipe out the RAID info on sdw, start the RAID array with the other 3 disks (while fixing the preferred minor number), and add sdw back into the RAID. Something like this:
mdadm --stop /dev/md1
mdadm --zero-superblock /dev/sdw1
mdadm --assemble /dev/md1 --update=super-minor --uuid=<the RAID5's RAID UUID>
mdadm --manage /dev/md1 --add /dev/sdw1
And that worked. As far as I can tell, I didn't lose anything on the RAID5 array.
Also, I'm pretty sure that the RAID5 wasn't already degraded, because I wrote my own RAID monitoring script that checks the status every 5 minutes. If the RAID5 was degraded, the script would have sent out a multicast, and every PC on my LAN has another script to listen for this multicast and display an error notification on the desktop (using the notify-send). I would have noticed if the RAID5 array was already degraded.
* Thoughts for discussion
I understand that metadata 0.90 might be an obsolete format today and might not be used much. Maybe that's why I couldn't find any stories about live desktop environments ruining a RAID array. But it still seems dangerous for the live desktop to blindly autostart all RAID arrays during boot. Example: What if the array was already degraded, and the owner wanted to "freeze" the array to avoid any chance of losing another disk and the whole array?
So, should the live desktop be auto starting RAID arrays?
Last edited by Eeqmcsq (2025-05-26 20:20:21)
Offline
I think you make a good case for turning mdadm off in the live isos. Easy enough to do.
In /etc/default/mdadm
# START_DAEMON:
# should mdadm start the MD monitoring daemon during boot?
START_DAEMON=false
If you need mdadm in the live session, turn it on with /etc/init.d/mdadm start
Now all I gotta do is remember to put it in the release notes.
Might be better to make a hook script so it could be turned on or off at the boot command. (talking to myself now)
Offline
Thanks. I'll marked this as solved.
Offline