You are not logged in.
I launched Sigil (displayed in full screen) to write something, then I get an error message in a dialog box, saying something like "Sigil cannot not run". So I clicked on "close" and noticed that the background image was gone…
I understood that my pictures have gone… :s I checked my home folder and indeed, my pictures have gone… seems few other files too, gone… Luckily, I did backup my system some days ago.
Now how can I track what happened? Nothing is logged in /var, and dmesg yeld nothing. Moreover, Smartctl reports nothing nor Samsung Magician.
And how can I track this in the future?
---EDIT---
Files in an other drive are missing. Possibly an EXT4 BUG??? How can I check this?
Last edited by zero (2019-04-13 12:04:08)
Offline
I am pretty sure that ext4 does not have any significant bug. Like many others I am using it since many years and can't complain. You should look at your HW and this program you are talking about.
Run a full check of your drives with smartmontools. Maybe gsmartcontrol makes things easier to display.
Rolf
Last edited by rolfie (2019-04-09 19:58:16)
Offline
definitely not ext4. Report back with the results of rolfies suggestion(s), if all looks good then you might need to clean out the dust, check the connections etc...
Offline
Summary from bug#313
There are 4 drives, two different brands, 3 different technologies (SSD, Hard Drive, and nvme), and data loss occurred on the HDD and the nvme… So different technologies, "different controllers", different brands but they all have in common EXT4.
The data vanished in few seconds…
I tried photorec without success. There are not recoverable.
I could add that it's a "relatively brand new pc—2 months old" and that I have tried
nvme-cli
smartmontools
Samsung Magician
All these tools say that EVERYTHING is OK, and strangely, Photorec do not find those files.
Finally, and this is I think very important, as I was not logged in any way as root (terminal, etc… or even used sudo…). Files that belonged to root user or other user/group have also vanished. Yet my user did not have any "super rights". So more than 100GB in two disks vanished instantly, not possible and too fast for a "rm" command—that I never did).
Last edited by zero (2019-04-09 12:02:55)
Offline
Let's start with the basics:
as root:
apt install inxi
then post output of:
inxi -Fr
Offline
System: Host: devuan Kernel: 4.19.0-0.bpo.4-amd64 x86_64 (64 bit) Desktop: MATE 1.20.4
Distro: Devuan GNU/Linux ascii
Machine: Device: desktop Mobo: Micro-Star model: B450M PRO-VDH (MS-7A38) v: 4.0
UEFI: American Megatrends v: M.40 date: 01/25/2019
CPU: Hexa core AMD Ryzen 5 2600 Six-Core (-HT-MCP-) cache: 3072 KB
clock speeds: max: 3400 MHz 1: 1390 MHz 2: 1518 MHz 3: 1383 MHz 4: 1398 MHz 5: 1379 MHz 6: 1399 MHz
7: 1549 MHz 8: 1592 MHz 9: 1356 MHz 10: 1323 MHz 11: 1457 MHz 12: 1386 MHz
Graphics: Card: Advanced Micro Devices [AMD/ATI] Oland PRO [Radeon R7 240/340]
Display Server: X.Org 1.19.2 drivers: ati,radeon (unloaded: modesetting,fbdev,vesa)
Resolution: 1920x1200@59.95hz
GLX Renderer: AMD OLAND (DRM 2.50.0, 4.19.0-0.bpo.4-amd64, LLVM 7.0.1)
GLX Version: 4.5 (Compatibility Profile) Mesa 18.3.6
Audio: Card-1 Advanced Micro Devices [AMD] Family 17h (Models 00h-0fh) HD Audio Controller
driver: snd_hda_intel
Card-2 Advanced Micro Devices [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
driver: snd_hda_intel
Sound: Advanced Linux Sound Architecture v: k4.19.0-0.bpo.4-amd64
Network: Card: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller driver: r8169
IF: eth0 state: up speed: 100 Mbps duplex: full mac: 00:xx:xx:xx:xx:xx
Drives: HDD Total Size: 1132.3GB (79.0% used)
ID-1: /dev/nvme0n1 model: N/A size: 500.1GB
ID-2: /dev/sda model: SAMSUNG_SSD_830 size: 128.0GB
ID-3: /dev/sdb model: WDC_WD10EZEX size: 1000.2GB
ID-4: USB /dev/sdc model: USB_2.0_FD size: 4.0GB
Partition: ID-1: / size: 9.8G used: 1.2G (13%) fs: ext4 dev: /dev/dm-0
ID-2: /usr size: 32G used: 6.9G (23%) fs: ext4 dev: /dev/dm-1
ID-3: /boot size: 3.7G used: 175M (5%) fs: ext4 dev: /dev/sdc1
ID-4: /var size: 14G used: 1.4G (11%) fs: ext4 dev: /dev/dm-2
ID-5: /tmp size: 5.5G used: 25M (1%) fs: ext4 dev: /dev/dm-3
ID-6: /home size: 458G used: 153G (36%) fs: ext4 dev: /dev/dm-4
Sensors: System Temperatures: cpu: No active sensors found. Have you configured your sensors yet? mobo: N/A gpu: 33.0
Repos: Active apt sources in file: /etc/apt/sources.list
deb http://pkgmaster.devuan.org/merged ascii main contrib non-free
deb http://pkgmaster.devuan.org/merged ascii-updates main
deb http://pkgmaster.devuan.org/merged ascii-security main
deb http://pkgmaster.devuan.org/merged ascii-backports main contrib non-free
Active apt sources in file: /etc/apt/sources.list.d/ceres.list
deb http://fr.deb.devuan.org/merged ceres main non-free contrib
Info: Processes: 393 Uptime: 12:10 Memory: 19808.3/32183.9MB Client: Shell (bash) inxi: 2.3.5
The nvme is the Samsung SSD 970 EVO 500GB.
Offline
Any noises coming from the drive, is it running hot? Are you able to open the case and see if it is full of dust, loose connections etc...? Is the fan working? Also note, just because its a new drive does mean it was not faulty.
Are you pulling packages from ceres into stable/ascii?
Offline
@OP: Don't jump to conclusions very easily. To 99.9999% ext4 isn't your problem.
Obviously you run ASCII with kernel and Mate desktop from backports. Similar to my setup except Mate, I have a X470pro chipset and a Ryzen7 working on an nvme and only ext4 as file system.
Since the PC is new, make sure all power supply and SATA connections are safely seated. Is the nvme screwed down ok?
Have you run extended self test with smartmontools? smartmontools should also indicate if there are issue with dodgy sata cables.
I also would look at Sigil. Its a HTML editor for ebooks as far as I found from Wikipedia. Are you sure that the version you are using is free of bugs and that the HTML code you handle is free of strange/faulty code?
I think there is a lot of things that need to be checked, including your RAM. Faulty RAM can also cause data loss.
Good luck, Rolf
Last edited by rolfie (2019-04-21 19:12:19)
Offline
Everything is "perfectly seated, and so on".
No strange noises, no dust (very clean inside), more than enough ventilated. I repeat for all tools, smartmontools included, everything is OK (even multiple/long tests).
Yes, I use some Ceres packages (mainly for lxc).
I have installed sigil from Devuan Ascii repository (0.9.7+dfsg-1). I did not used it before. Indeed, I switched from Debian to Devuan, after ~15 years, using mainly SID. (I used to use FreeBSD too).
I only try to start Sigil for the first time on Devuan. But it complains that it cannot run. As if it was unable to write anything on the nvme—when the bug occurred?
I suppose, but I do not affirm that EXT4 is the problem. Maybe the RAM is faulty, but seems strange, as:
two diffrent disks
more than 100GB have been loss, while the system has max theoretically up to 32GB of memory
To get corrupt, these data (files; folders; etc…), from these two disks, should have been "randomly loaded in the RAM" (say max 30GB) in more than three times (3*30 = 90GB), to get corrupted each time. How and why being loaded in the RAM?
Anyway, I will test the RAM as soon as I can.
Last edited by zero (2019-04-10 08:15:02)
Offline
I think it should be better to display lsblk:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 119.2G 0 disk
├─sda1 8:1 0 1.9G 0 part /boot/efi
├─sda2 8:2 0 41.9G 0 part
├─sda3 8:3 0 10G 0 part
│ └─devuan_root 254:0 0 10G 0 crypt /
├─sda4 8:4 0 14G 0 part
├─sda5 8:5 0 14G 0 part
│ └─devuan_var 254:2 0 14G 0 crypt /var
├─sda6 8:6 0 5.6G 0 part
│ └─devuan_tmp 254:3 0 5.6G 0 crypt /tmp
└─sda7 8:7 0 31.9G 0 part
└─devuan_usr 254:1 0 31.9G 0 crypt /usr
sdb 8:16 0 931.5G 0 disk
├─sdb1 8:17 0 922G 0 part
│ └─sde1_crypt 254:5 0 922G 0 crypt /home/zero/public
├─sdb2 8:18 0 1K 0 part
├─sdb5 8:21 0 7.7G 0 part
└─sdb6 8:22 0 1.9G 0 part
sdc 8:32 1 3.8G 0 disk
└─sdc1 8:33 1 3.8G 0 part /boot
nvme0n1 259:0 0 465.8G 0 disk
└─nvme_crypt 254:4 0 465.8G 0 crypt /home
/dev/sda is the SAMSUNG_SSD_830 (no data loss)
/dev/sdb is the WDC_WD10EZEX (data loss)
/dev/nvme0n1 is the Samsung SSD 970 EVO 500GB (data loss)
Offline
Well, here the fstab (I have removed the UUID):
UUID= /boot ext4 defaults 0 2
UUID= /boot/efi vfat umask=0077 0 1
#/dev/mapper/devuan_root / ext4 errors=remount-ro 0 1
UUID= / ext4 errors=remount-ro 0 1
#/dev/mapper/devuan_usr /usr ext4 defaults 0 2
UUID= /usr ext4 defaults 0 2
#/dev/mapper/devuan_var /var ext4 defaults 0 2
UUID= /var ext4 defaults 0 2
#/dev/mapper/devuan_tmp /tmp ext4 defaults 0 2
UUID= /tmp ext4 defaults 0 2
#/dev/mapper/nvme_crypt /home ext4 defaults 0 2
UUID= /home ext4 defaults 0 2
#/dev/mapper/sde1_crypt /home/zero/public ext4 defaults 0 2
UUID= /home/zero/public ext4 defaults 0 2
Indeed, the two "faulty drives" are mount under /home—while the other drives are much "read-only".
As for wearing out my SSD. A new and better SSD like the Kingston A400 120GB will cost ~20€… while the SAMSUNG SSD 830 have cost me, more than 7 years ago, ~90€.
As I recall, the TBW of the SAMSUNG SSD 830 120GB is around 30—unfortunately, seems the specs file of this drive was among the vanished files that I definitively lost :s.
After more than 7 years, my SAMSUNG SSD 830 displays ~11GB of total writes… So, theoretically, it should last 14 more years…
So, wearing out my SSD is not a real problem.
Last edited by zero (2019-04-13 12:03:14)
Offline
Well I have finally tested the RAM; all memtest86+ tests are OK (4/4 pass).
Offline
Did this happen again or did someone find the cause? I mean, the bug report still exists. But what are the chances of it ever going anywhere?
I am not especially interested in this particular bug, but I have always wondered what policies distros have for hanging bug reports. Can a report be in a moreinfo state for years? Has everyone forgotten this, or is someone trying to reproduce it?
I take it that in reality there wasn't an attempt to reproduce it, neither with a tainted nor with an untainted kernel. I am sorry for the loss of data that happened, but this stands now, four years later, as an act of God, strike of a cosmic ray or a butterfly flapping its wings in Japan and causing a data loss halfway across the globe.
Bug reports by definition can't do anything for disasters that probably only ever happen once. Unless the cause is known beforehand.
If there is a known cause, I'd suggest that something happened to the crypto key. It was altered in memory somehow and some important part of the disk's data or the on-disk representation of the key got overwritten. That wouldn't explain though why two drives were lost.
Were the disks physically fine afterwards?
Offline
Disks are fine, and nothing else happened after this event.
I repeat:
different controllers
different brands
two different disks
two different technologies: one HDD and one nvme
more than 100GB have been loss, while the system has max theoretically up to 32GB of memory
but they all have in common EXT4.
So, I still suspect EXT4, but cannot prove this. Maybe I am totally wrong, but why EXT4? Well, there are some infamous known bugs:
- https://bugs.launchpad.net/ubuntu/+sour … bug/317781
- https://lore.kernel.org/lkml/50882787.3 … home.de/T/
- https://bugs.debian.org/cgi-bin/bugrepo … bug=785672
Unfortunately, Devuan team seems to be too small to handle this kind of issue.
Offline
Can a report be in a moreinfo state for years?
A bug can be in [moreinfo] for as long as it takes for the submitter to provide more info as requested... Or until people get tired of waiting.
The last comment was "Did you reproduce it on an untainted kernel?", to which the response so far has been *crickets*.
I still suspect EXT4, but cannot prove this.
Isolated report from a single user (vs. the uncounted thousands running EXT4 daily), and no proposed mechanism or reliable reproduction: highly likely to remain an unproven suspicion.
Unfortunately, Devuan team seems to be too small to handle this kind of issue.
If this is a bug in EXT4 (and frankly I very much doubt that) it should be punted upstream to the kernel mailing list. Investigating esoteric kernel bugs is not Devuan's responsibility.
Last edited by steve_v (2023-05-26 16:01:28)
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Offline
Hello:
... no proposed mechanism or reliable reproduction ...
... should be punted upstream to the kernel mailing list.
Investigating esoteric kernel bugs is not Devuan's responsibility.
+1
I was about to post the same thing/idea.
Best,
A.
Offline
zero wrote:I still suspect EXT4, but cannot prove this.
Isolated report from a single user (vs. the uncounted thousands running EXT4 daily), and no proposed mechanism or reliable reproduction: highly likely to remain an unproven suspicion.
This is why I can only suspect EXT4…
zero wrote:Unfortunately, Devuan team seems to be too small to handle this kind of issue.
If this is a bug in EXT4 (and frankly I very much doubt that) it should be punted upstream to the kernel mailing list. Investigating esoteric kernel bugs is not Devuan's responsibility.
And as long as EXT4 is only suspected, there should be some investigations done. So, for now, there is no investigation, not because this is, for sure, an EXT4 bug, rather because there is no resource available. Hence, the bug report stales…
Take a look at Debian, Ubuntu, etc. In these other bigger projects, we have at least some feedbacks, etc. Here we have nothing after 4 years.
Last edited by zero (2023-05-26 18:29:35)
Offline
steve_v gave an excellent and correct assessment. Anything else is flapping gums . , .
Online
there should be some investigations done... there is no investigation... because there is no resource available.
The most available resource for initial investigation of a bug only you have encountered is... You.
You appear to want want somebody else to take responsibility for this, yet you have so far:
* Reported it to a team that is not directly responsible for the component in question, with minimal information, no reproduction steps, and no logs or debug output.
* Tried to divert discussion and tracking away from the official channels (bugtracker & mailing lists) to a user forum (which few developers frequent).
* Failed to follow up when asked to reproduce the problem with an untainted kernel (taint disables kernel debugging), or engage on the bugtracker at all beyond the initial report. (hint: replies to the bugtracker mail people who might care, forum posts do not)
* Continually stated a "suspicion" that this is a software problem that "somebody" should investigate, yet focused your own cursory "investigation" entirely on hardware.
Why exactly do you expect the Devuan developers, of all people, to work on reproducing a "bug" you "suspect" you have found, when you yourself are apparently unwilling to do the same? Do you want them to feed and burp you as well?
It is not at all uncommon, regardless of the size of a project, for a bug report to remain that - just a report - until either the reporter or somebody else with the same problem provides enough information to reproduce the issue. Developers are not psychic, and what cannot be seen cannot be fixed.
Only then can it move to [confirmed] and work on finding the cause begin.
Last edited by steve_v (2023-05-27 07:51:30)
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Offline
@steve_v, are you trying to troll?
What you wrote until now is pretty illogical.
Moreover, you fail to understand what I am telling.
there should be some investigations done... there is no investigation... because there is no resource available.
The most available resource for initial investigation of a bug only you have encountered is... You.
LOL. I hope you are not a developer, because what you wrote is totally dumb. This is a big part of any developer work: try to fix bugs, that are not so obvious to find. And you know what? This is not the customers who do the work, but developers…
You appear to want want somebody else to take responsibility for this, yet you have so far:
* Reported it to a team that is not directly responsible for the component in question, with minimal information, no reproduction steps, and no logs or debug output.
You wrote "you appear". Indeed, this is your interpretation. Not the truth. Obviously, you did not take the time to read what I have stated before.
* Tried to divert discussion and tracking away from the official channels (bugtracker & mailing lists) to a user forum (which few developers frequent).
Ok, now I can affirm that you are a troll, because you are going to a kind of trial of intent.
* Failed to follow up when asked to reproduce the problem with an untainted kernel (taint disables kernel debugging), or engage on the bugtracker at all beyond the initial report. (hint: replies to the bugtracker mail people who might care, forum posts do not)
Now I can affirm that you are a troll that did not read everything, yet acting very badly.
* Continually stated a "suspicion" that this is a software problem that "somebody" should investigate, yet focused your own cursory "investigation" entirely on hardware.
What are you talking about?
Why exactly do you expect the Devuan developers, of all people, to work on reproducing a "bug" you "suspect" you have found, when you yourself are apparently unwilling to do the same? Do you want them to feed and burp you as well?
It is not at all uncommon, regardless of the size of a project, for a bug report to remain that - just a report - until either the reporter or somebody else with the same problem provides enough information to reproduce the issue. Developers are not psychic, and what cannot be seen cannot be fixed.
Only then can it move to [confirmed] and work on finding the cause begin.
Where I have written that I was expecting anything from Devuan developers? Are you dumb? Standardpoodle was amazed, not me.
It is a fact that bigger projects like Debian, Ubuntu, Fedora, etc. handle better this kind of issue. Here an issue with Fedora and XFS:
- https://bugzilla.redhat.com/show_bug.cgi?id=2208553
And it is a fact that here we have nothing after 4 years. Is this a complaint? NO JUST FACTS.
Now just ignore me, because I will ignore you, as you need to get treatment.
Offline
Now just ignore me, because I will ignore you, as you need to get treatment.
Nah - nah - nanah - nah . . . Good grief . . . Maybe it's time to grow up?
Online
This is not the customers who do the work, but developers
Ahh, I see. You think you are a "customer" for the operating system you received for free.
That explains everything... Or at least everything prior to the unhinged drivel in your last post anyway.
Perhaps read up on how FOSS development works, and how to file actionable bug reports?
Anyhow, bye now, have fun.
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Offline
It's fairly well known that the design of EXT4 favors speed over data integrity, which is why I always use tune2fs to adjust the filesystem parameters after creating an EXT4 filesystem (if I care about the data on that partition). I'd post the command that I use, but I am currently composing this message on a Windows laptop, and don't have easy access to my notes.
Even so, I doubt that EXT4 is the culprit in this case. My intuition is telling me that you may have somehow gotten a hold of a trojanized (malicious) version of Sigil, as unlikely as that may seem. Since that's probably not the case, perhaps you were just hit with a rare, nasty bug (though not necessarily in ext4).
This information may be helpful:
How to verify that package-installed files match originals?
https://unix.stackexchange.com/question … -originals
Offline
I think I got my answer. That particular bug is never going anywhere and might as well be closed with worksforme or something like that. (Or unreproducible. That exists.)
It is a fact that bigger projects like Debian, Ubuntu, Fedora, etc. handle better this kind of issue. Here an issue with Fedora and XFS:
The first obvious difference is that the person who reported that bug could and did reproduce it. With a report like yours, any distro, no matter how large, would have simply done the same thing Devuan did: asked you for more information.
There simply exists no way for them to reproduce your bug. What should they do? Run ext4 in random configurations and see if they encounter it? Well this is what millions of people are doing all over the world anyway. Every ext4 user is already technically hunting for your bug without even knowing it.
As far as I know, every person in this project (referring to Devuan) is a volunteer. And the average person who will respond to you on this forum is either a) doubly a volunteer (because not even appointed volunteers respond here in any official capacity, as far as I know). Or b) only a user like you. The project's response is what you got in January of 2020. And the best thing you're going to get from any developer in any corporation or project to this issue is either moreinfo or unreproducible.
If photorec could not find any of your pictures then either a) you ran it on the raw, encrypted device. Or b) you ran it on the logical, unencrypted device, but something had overwritten the files. Did you still see the decrypted devices (nvme_crypt and sde1_crypt) after the fact? If you did not then it's possibly the crypto that failed (ergo not the filesystem).
Offline
That particular bug is never going anywhere and might as well be closed with worksforme or something like that.
Many projects automate this with a "janitor" bot, e.g.
This bug has been in NEEDSINFO status with no change for at least
30 days. The bug is now closed as RESOLVED > WORKSFORME
due to lack of needed information.
Might not be a terrible idea to implement something similar for the Devuan bugtracker.
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Offline