#devuan

19:03:18 <Guest19> nah, not worried about getting hacked on this basically single-purpose device; they can hack the guest account all they want, it gets wiped out on boot. Browser bloat is a problem for me though. But thanks for the browser suggestions guys, I'll check those out.

19:05:19 <n4dir> Guest19: netsurf is really very basic, but at least not a command-line-webbrowser. falkon, last time i checked, is a modern browser and works on all sites, but you don't really save a lot of ressources. Worth a try though

19:16:27 <systemdlete> gnarface (and anyone else): I'm seeing intermittent problems with "apt update" from apt-cacher-ng server: e.g, W: Failed to fetch http://security.debian.org/debian[...] Connection failed [IP: 172.XX.YY.ZZ 3142]

19:19:18 <systemdlete> I swear on a stack of Linux Manuals that I have not mucked (much) with the acng.conf file. I followed the directions gnarface offered me, and consulted the man pages (and some web pages also)

19:19:31 <systemdlete> https://unix.stackexchange.com/questions/623174/apt-cacher-ng-random-download-failures-with-apt-update-acgn

19:19:59 <systemdlete> I disabled ipv6 EVERYWHERE on my home network, but it did not help. (I was just following their suggestion)

19:20:37 <onefang> I use falkon when all my firefox-esr protective plugins means some crappy web site wont work.

19:20:49 <systemdlete> Apparently, I am not the only one having this issue. It is intermittent, which makes it hard to figure out.

19:21:05 <systemdlete> I think my network is OK, since I DO get successful apt-get's

19:22:25 <systemdlete> And no other programs or applications are complaining about network issues (recently I mean).

19:23:28 <systemdlete> gnarface, this seems to be most frequent when hitting a lot of packages at once, such as the dist-upgrade I am attempting on one system

19:24:15 <systemdlete> The first one (apt-get upgrade, after repointing the sources file to chimaera from beowulf) was 640 packages, the second one over 1000.

19:24:34 <systemdlete> But I've also seen it, a bit less often maybe, with small updates

19:24:44 <gnarface> hmm, i also haven't updated that much at once for a while, i suspect

19:25:12 <gnarface> i wonder if it's a problem where the cache can invalidate due to age while the update is still in progress?

19:25:13 <systemdlete> I'm wondering if I should try to throttle apt-cacher-ng somehow

19:25:45 <systemdlete> I mean, this happens several times within, say, an hour

19:26:23 <gnarface> i could only speculate... doesn't seem like quite the error i ever got, the only transient errors i ever got it was clearly complaining about some checksum mismatch, and it'd always just go away if i waited until the next cache invalidation

19:26:54 <gnarface> (or if i logged into the control panel and flushed it manually if i was feeling impatient)

19:26:59 <systemdlete> I have seen a lot of web pages re your issue (checksum mismatch), but I really haven't noticed that one myself here.

19:27:51 <gnarface> some other hidden environmental cause we must be missing

19:28:19 <systemdlete> It's really strange, because I was in the middle of the upgrade and dist-upgrade steps, and the process would start getting hit with these errors. Then, I simply restarted it as suggested on devuan's upgrade instructions page. Then, no problem with same packages, only moments later. But then others would/might fail, and rinse and repeat...

19:28:59 <gnarface> so, i wonder if it could just be something caused by the repos updating during your downloads

19:29:16 <gnarface> you said it was in the security section after all, which is what i'd expect to be changing most

19:29:41 <systemdlete> there are many different errors; that was just the general format

19:30:06 <systemdlete> (I'm trying hard to be precise as possible here...)

19:30:39 <gnarface> is there any firewall between you and the apt-cacher-ng server? i wonder if it could just be an issue with it failing to hold enough connection states?

19:30:49 <systemdlete> I'm afraid that the actual messages I was seeing rolled off the bottom of my copy-paste buffer

19:31:27 <systemdlete> this happens (sometimes) even on the cacher server itself!!!

19:31:58 <systemdlete> That server DOES have a firewall running, but I don't think that will matter for local connections, right?

19:32:38 <systemdlete> that's why I am not very suspicious of the network, at least not for the cacher clients

19:33:16 <gnarface> but i know that if you're tracking state of inbound or outbound connections, there's a theoretical possibility of running up against a kernel memory limit

19:33:16 <onefang> We tell package mirrors to update every 30 minutes, but some updated less often.

19:33:26 <systemdlete> I've added a few pinholes for ports like the cacher's

19:33:59 <gnarface> like, if you end up tracking too many states at once you could start losing connections randomly

19:34:26 <gnarface> usually this is not a problem with default kernel builds unless you're doing something weird that is causing connections to stick around too long though

19:35:10 <gnarface> although often needing to is the sign of something else wrong

19:35:14 <systemdlete> I mean, this is overall rather minor. It's mainly a nuisance.

19:35:58 <systemdlete> just restarting the upgrade has always been enough to get it working again (so far)

19:37:00 <gnarface> systemdlete: no, and i doubt it will give me any insight but i will if you use paste.debian.net or just /msg it to me

19:37:39 <gnarface> i just stopped adding new domains to my personal "trust" list years ago

19:37:53 <gnarface> it's not about any particular domain, it's about reducing attack surface

19:38:19 <gnarface> (i'm a mafia target and sometimes that gets innocents involved unnecessarily)

19:39:48 <gnarface> failed to move stale item out of the way... no such file or directory

19:40:33 <gnarface> so it can't move it out of the way because... it's gone already?

19:41:01 <gnarface> you don't have two apt-cacher-ng instances running on the same cache directory, do you?

19:42:00 <gnarface> do you think disk I/O is really high at those points?

19:42:15 <gnarface> i wonder if it could be some sort of race condition caused by disk bottlenecknig

19:42:32 <systemdlete> maybe, but wouldn't that be a problem on the upstream side also?

19:42:51 <gnarface> never seen this before but my cache server is not under load

19:43:19 <systemdlete> I mean, I'm only seeing the problem from the client's POV. On the other hand, that log file seems to indicate :I: and :O: which I am guessing is input and output

19:43:24 <gnarface> not only is it not under load but i actually just recently upgraded it to a hardware raid with two SSDs

19:43:58 <systemdlete> I'm still on mech disks. They are still a good value for the money here.

19:44:25 <gnarface> i ran the old one right into the ground, it died making grinding noises :D

19:45:26 <gnarface> i figured it was a good idea to have drives that were less sensitive to vibration

19:46:05 <systemdlete> Some of these drives (2TB Seagate and 2TB WD) are up to 3 years old, maybe older. Since I started leaving my PCs on 24x7, the frequency of hardware replacements has gone waaaaay down (MB, disks, etc). So I've become a believer.

19:46:39 <systemdlete> well, that can't be the only annoying thing arising from that problem

19:48:47 <gnarface> well, if it were the drives you'd see i/o errors in dmesg around the same time

19:49:17 <gnarface> it looks like a race condition but i can't imagine why it'd be happening

19:49:46 <systemdlete> and I have a thruk installation here so I will find out fairly quickly if there are problems.

19:50:07 <systemdlete> and I also check, periodically, the host log files for hw errors

19:53:45 <gnarface> how many VMs you usually run the updates with at a time?

19:54:12 <gnarface> i'm rarely updating more than 2 things at once, i wonder if you're seeing something i'm not seeing just because i'm not throwing 1000 clients at it at once

19:54:37 <systemdlete> only one running apt upgrade at a time, otherwise the whole network crawls. However, there can be more than one running apt update at a time

19:55:14 <systemdlete> well, as I said, even casual upgrades of just a few packages can see this happen... iirc

19:57:54 <systemdlete> when I upgrade from chimaera to daedalus, I will try to monitor the caecher server with htop

19:58:41 <systemdlete> (I'm upgrading beowulf -> chimaera -> daedalus because skip upgrades are not officially supported)

19:59:15 <systemdlete> the chimaera upgrade is almost complete. It is unpacking and installing the dist-upgrade part right now.

19:59:58 <systemdlete> So if all goes well, and the resulting chimaera doesn't have problems for a day or so, I'll do the daedalus upgrade.

20:13:52 <systemdlete> gnarface, how do I msg you a file? I have it gzip'd down to about .5MB (I cannot paste it to deb paste bin)

20:14:14 <systemdlete> this file is the actual log, whereas what I pasted was the error log

20:15:06 <gnarface> systemdlete: oh, i just meant literally paste it and just wait for the flood protect throttling to let it finish going through

20:16:17 <gnarface> i mean, large pastes will take a while but it also will all get here, but i'm not sure if 8MB will actually fit in your clipboard

20:19:56 <gnarface> i think it'll work, just say something when it's done

20:20:45 <gnarface> i just sent you 1 line, do you see that open in a new tab on your IRC client?

20:21:33 <gnarface> but if your client lets you reply to the message i just sent, that might be easier

20:22:10 <systemdlete> (yes, I see the tab, but I think the paste is easier)

20:22:29 <gnarface> 1713568491|E|108923|192.168.40.41|devrep/merged/pool/DEBIAN/main/p/python-pip/python-pip-whl_20.3.4-4+deb11u1_all.deb [HTTP error, code: 502]

20:23:30 <gnarface> lemme double check, but i think 502 means remote server replied "aak, i'm misconfigured!"

20:23:59 <gnarface> i only see a couple 502 errors... the rest looks completely normal

20:24:14 <systemdlete> it means gateway interference, which is hard to believe when I'm getting the errors on the cacher server itself!

20:24:48 <gnarface> wait, but is that the cache server? or is that actually the debian mirror responding?

20:25:21 <gnarface> might be a problem with a mirror that's just rippling down into your cache

20:25:27 <systemdlete> now, between the cacher and the Internet, there be some gateways, routers, firewalls...

20:26:04 <gnarface> hmm, but i think 502 means it had to actually get some sort of reply though...

20:26:06 <systemdlete> but as I mentioned, I don't have many network issues here (other than stupid things I caused, which are now corrected, to the best of my knowledge)

20:26:48 <gnarface> yea, see if it always happens while apt-cacher-ng is hitting the same remote mirror

20:27:15 <gnarface> first thing that comes to mind is tail the log file while watching the raw network traffic

20:27:16 <onefang> And if it's all their fault, I might have to do something.

20:27:18 <systemdlete> the log file (I just pasted) doesn't give that info

20:27:41 <systemdlete> onefang: don't fret, this happens to a lot of people outside devuan

20:27:48 <gnarface> yea, you'd just have to be tailing it and watching the network traffic at the same time, then see which remote ip is in the tcpdump at the same time that error hits the logs

20:28:28 <systemdlete> but the log doesn't show the upstream, does it? Maybe I misse dthat)

20:29:09 <gnarface> no, but my guess is the error will show up in that log file within milliseconds of the actual remote connection being made, so if you're just running tcpdump at the same time you'll see the corresponding IP

20:29:33 <gnarface> in fact, with the right filter you would be able to even see the 502 error in the response

20:29:43 <gnarface> then you'd be sure which side of the network it was coming from

20:30:04 <gnarface> actually, what am i thinking? you could certainly filter tcpdump output just for http errors

20:30:04 <systemdlete> I forgot to obfuscate my network addresses in that paste

20:30:22 <systemdlete> but probalby not an issue since my network is behind a residential firewall anyway

20:31:48 <systemdlete> now, why would it be that apt is able to round-robin the repo servers, but apt-cacher has a problem doing the same--don't they both use the same code, ultimately?

20:32:29 <gnarface> and apt may just be coded to retry without complaint

20:32:33 <onefang> Maybe not. I know debootstrap doesn't use apt, but mdebstrap does.

20:33:10 <gnarface> but with tcpdump you should be able with some fiddling to isolate the source of these 502 errors explicitly

20:33:35 <gnarface> there might be a simpler way that i'm not thinking of, but tcpdump is definitely up to this task

20:34:12 <onefang> And I definitely know that apt-panopticon doesn't use apt directly, but that's a different case. It's testing every step of the apt process on the package mirrors.

20:34:27 <gnarface> if you find it, the payload might even actually make the cause obvious

20:37:03 <systemdlete> I wonder if wireshark might be easier in order to easily examine the packets

20:37:48 <systemdlete> but either way, it looks like it should be easy enough to gather the stream

20:42:17 <gnarface> wireshark might be easier but i have less familiarity with it

20:42:41 <gnarface> i'm not exactly a tcpdump pro but when i learned it there weren't alternatives

20:43:27 <gnarface> all you should have to do is filter for http 502 error headers, https might sabotage that though

20:43:55 <systemdlete> I just realized something. When I looked at https://unix.stackexchange.com/questions/623174/apt-cacher-ng-random-download-failures-with-apt-update-acgn, I failed to distinguish what they meant by disabling ipv6.

20:44:15 <systemdlete> They meant, in the acng.conf file! Not the actual network interfaces. (though no harm there)

20:45:05 <gnarface> hmm, i do also have ipv6 disabled everywhere afaik, though i don't see any particular indication that's what's causing this problem

20:45:06 <systemdlete> the cacher can be configured to only listen to ipv4 (or ipv6) as desired. I missed that, and I think I will try that before starting the next step of the upgrade to daedalus

20:46:44 <systemdlete> Yeah, I agree. And besides, it really OUGHT to work for ipv6-enabled networks.

20:47:09 <systemdlete> It's reallly sad if those networks users are deprived of this functionality

20:47:13 <gnarface> yea, but there could be an issue where just one mirror in the round-robin is missing the ipv6 dns entries or something weird like that

20:48:06 <systemdlete> gnarface, some of the devuan mirrors are supported by users like us, right? I mean, some might be on big hardware in a DC somewhere, but maybe not all, and some might not be correctly configured for ipv6?

20:48:56 <systemdlete> not meaning to repeat you, just trying to get clear on your meaning

20:49:13 <onefang> That's why if there's some mirror issue, I want to know the IP of the errant mirror. Especially if it's something apt-panopticon isn't finding.

20:49:53 <systemdlete> onefang, the "errant mirror"-- do you mean a debian mirror, or only devuan?

20:50:34 <systemdlete> because debian users are hitting this too, if my survey of ddg hits on this topic is an indicator

20:50:43 <onefang> Could be either, but I'm only in charge of Devuan package mirrors, though if it's a Debian mirror problem, that's good to know as well.

20:50:44 <gnarface> though obviously scrutiny will be on the ones we can do something about first...

20:50:56 <systemdlete> as well as people using the cacher for openwrt (which I do as well)

20:51:39 <systemdlete> I think it is freebsd, but I always forget. It's one of the BSDs

20:52:38 <onefang> There's at least one package mirror running from someones home server. I even once had an offer for a home based mirror server running over that PsaceX satellite network. lol

20:52:49 <systemdlete> I know they will be moving to pkg in the future; they're already migrating some of their tools that way. But for now at least, apt-cacher-ng works for openwrt

20:53:51 <onefang> How did I manage to typo SpaceX twice in a row, once while trying to fix the first typo. lol

20:54:14 <systemdlete> you must have borrowed my fingers for a moment...

20:54:36 <systemdlete> not your fault. You expected mine to work correctly I think.

20:55:12 <systemdlete> I've typo'd here about 4 or 5 times today already

20:56:04 <systemdlete> gnarface: yeah, I followed the directions at openwrt.org wiki to set up the cacher for openwrt packages.

20:57:46 <systemdlete> It's great with openwrt. I have several routers, and by upgrading them in the order so that my gateway is last, by that time, all the packages are cached for that upgrade, and I don't have to do any special pre-downloading (gateway needs some firmware and other stuff not available in the release iso)

20:58:27 <systemdlete> so I'm really indebted to you for having me set this up. Even with this annoying issue...

20:59:07 <onefang> gnarface is always very helpful, we should all thank them.

20:59:22 <systemdlete> So onefang and gnarface, I will do my utmost to track down that repo for you.

20:59:58 <systemdlete> and you are too, and so are the rest of the folks here

21:08:11 <systemdlete> The 502 Bad Gateway error is an HTTP status code that occurs when a server acting as a gateway or proxy receives an invalid or faulty response from another server in the communication chain. This error indicates a problem with the communication between the involved servers and can result in disruption of internet services. Wikipedia

21:08:31 <systemdlete> That sounds exactly like what we are thinking here.

21:08:51 <systemdlete> So it is almost definitely an upstream-side issue

21:28:22 <gnarface> i don't know if apt-cacher-ng can be made to add the remote server IPs to the log files or not

21:29:00 <gnarface> might be worth looking into, but packet sniffing will work, albeit probably quite tediously

21:29:48 <systemdlete> I just made that change to only listen on ipv4. But I don't see why that will make any difference, that is, if the problem is almost certainly on the upstream side.

21:30:58 <gnarface> i did upgrade a full kde desktop, beowulf->chimaera->daedalus recently, i think something like 18 hours?

21:31:45 <gnarface> i might have been sleeping for part of that, but it took a long time either way

21:31:53 <onefang> My upgrade to daedalus has been ongoing since last year. This time I want to write configs from scratch instead of just copying them across, and I'm doing a shit load of testing. Once it's done testing, I'll roll it out to all my Linux computers.

21:32:28 <onefang> I'm also using my own script for the system building.

21:34:13 <systemdlete> still in development, just the same way. I've used it a few times, but it needs work...

21:35:10 <onefang> That's the other reason mines taking so long, I'm doing major surgery to the scripts. Not to mention my crappy life keeping me busy this last year. lol

21:35:17 <systemdlete> It would have been more expeditious, I think, to simply clone a new daedalus VM from a template VM I created months ago and just update/upgrade to current package levels and do restores to the home areas

21:35:57 <systemdlete> and add in all the programs and configuration from restores... ay uh

21:36:48 <systemdlete> I really have done a lot of customization on this system I'm upgrading. So maybe this will be worth it.

21:46:55 <systemdlete> https://askubuntu.com/questions/119298/apt-get-using-apt-cacher-ng-fails-to-fetch-packages-with-hash-sum-mismatch#answer-431764

21:48:36 <systemdlete> def systemdlete: "systemd: delete from all systems, immediately!"

21:49:06 <systemdlete> (i.e., nothing to do with "lete" or "l3t3" or whatever meme it is...)

00:27:42 <systemdlete> gnarface, onefang: oopsies. Looks like some hard drive errors on host as it turns out. I hadn't been alerted about them, and I could have sworn I had thruk set up to alert me upon hardware problems.

00:28:23 <systemdlete> So I might have just wasted your time, but I am not 100% sure about that. None of the articles I read about the cacher problem indicated a hard disk error.

00:48:37 <onefang> No problem for me, I'm in weekend mode, so I was mostly letting you and gnarface work on it. Was waiting to see if you had found an actual broken mirror for me to sort out.

00:50:17 <systemdlete> onefang, that could take some time. I am still trying to finish the upgrade to chimaera and that has taken ALL DAY. So it may be some time before I can explore that further. For one thing, I need to get my kernel logging configured to make thruk alert me for hard disk problems (which I thought I had already done...)

00:50:37 <systemdlete> I'll be focusing on that to avoid more problems going forward.

00:50:50 <systemdlete> But I will definitely be back here to let you know what I find out.

00:51:55 <systemdlete> I'm sort of hoping, though, that these hard disk errors are the smoking gun. The first of the errors seems to have begun around April 14 (PST time), and that was about the time I began to notice the errors, but I did not enter them into my journal here...

01:00:19 <CueXXIII> systemdlete: anything in smartclt of that harddisk? otherwise it might be bad cabling producing those errors

01:43:43 <systemdlete> CueXXIII, not sure yet. I'm juggling a few things...

01:49:26 <systemdlete> I did look at the cell values on that harddisk and did not see anything that looked like hard errors. I only see evidence in the kern.log

01:51:01 <systemdlete> I will take the system down in a while, just as soon as my disks resync (RAID1). The system has been up 46 straight days, so maybe it is "tired"...

01:51:43 <systemdlete> I'll remove the bad disk and test it on a test box. I have some spares (good thinking, systemdlete, for once).

01:52:53 <systemdlete> I usually run badblocks for a day or two to see if it errors. Since the test box has its own cables, that will help to eliminate that possiblity.

02:46:17 <gnarface> systemdlete: which filesystem are you using on these?

02:51:45 <systemdlete> I've tried some of the more exciting stuff, I think xfs? or something like that

02:52:01 <systemdlete> but I had problems with it, but years ago, so I should prob try again

02:55:34 <gnarface> hmm, well being not btrfs or anything similarly experimental, i dunno, but ext4 did have one bad corruption bug that was affecting upgrades a few releases back, only i thought that was before beowulf

02:55:56 <gnarface> (and it did manifest itself exactly like a physical drive failure)

02:56:39 <gnarface> the issue was to do with using older e2fsprogs with newer kernels or something like that

02:57:16 <systemdlete> I am pretty sure I stumbled into that one at some point.

02:58:09 <systemdlete> Thing is, before I started this upgrade, I cloned the VM, and upon boot I have all filesystems set up to fsck every time (I don't reboot any systems much).

02:59:03 <systemdlete> So I believe I have a clean file system to start with. But still, if the actual hardware backing the virtual FS has actual hard errors, then maybe there is still issues.

02:59:27 <systemdlete> I've been carefully checkking the kern.log's on both VM and host for any suspicious errors

04:43:10 <systemdlete> ok now what am I doing wrong? I tried upgrading rsyslog on a beowulf system with the backports so I could get a more recent version. But there is no /etc/init.d/rsyslog

04:44:07 <systemdlete> maybe a trigger is not running? I forget the details of how that happens...

04:53:12 <systemdlete> I removed the upgrade and tried re-installing the package from the regular repo, but same thing happens

04:53:20 <fsmithred> I just tried to install rsyslog from beowulf-backports and apt tells me that i already have the newest version

04:53:41 <fsmithred> but apt policy tells me that version is in beowulf. No rsyslog in backports.

04:54:37 <systemdlete> apt policy shows the correct versions, and the one I have installed has a star

04:55:49 <systemdlete> but it doesn't matter; the rsyslog script does not get installed

04:56:24 <systemdlete> trouble is, I now have NO rsyslog running on that system!

04:59:32 <rrq> I haven't read backlog, but which script are you talking about?

04:59:38 <gnarface> hmm, did you debootstrap? i seem to recall a problem with rsyslog and debootstrap some point

05:00:28 <gnarface> i think it was in ascii though, and maybe only on arm64

05:01:09 <systemdlete> no, nothing to do with installing the system. The version of rsyslog I had was 2102, and I wanted the 23.02 version.

05:01:09 <gnarface> yea, according to my notes i had to exclude rsyslog and udev and include syslog-ng instead to successfully debootstrap ascii on arm64

05:01:33 <gnarface> never had any other problems with rsyslog that i can recall

05:02:07 <systemdlete> I removed all rsyslog versions from the apt-cacher and I'll see if this works

05:02:33 <systemdlete> I'm going to see if maybe it will cache a fresh copy from the repos

06:14:59 <systemdlete> well reboot did not help, even shutting it down completely and restarting it "cold" (and swapping out the bad drive for a new one)

06:15:39 <systemdlete> then I tried to apt remove rsyslog and apt install rsyslog, but for whatever reason, it does not install the /etc/init.d/rsyslog script

06:16:02 <systemdlete> I also note that the archive does not contain the deb file for rsyslog

06:16:30 <systemdlete> (I am using the cacher, but I didn't figure that would affect the client as far as grabbing a copy of the deb file)

06:17:03 <gnarface> wait maybe that's because it's in beowulf-security now? or chimaera-security or whichever you're on?

06:17:51 <gnarface> if you have to, you can always switch to syslog-ng but i was pretty sure rsyslog worked...

06:18:32 <gnarface> i would be tempted to examine the preinst/postinst scripts

06:19:15 <gnarface> but rsyslog is definitely working on my beowulf systems

06:19:48 <systemdlete> this is a different sea of trouble than the one we were dealing with before.

06:22:42 <systemdlete> and rsyslog was installed just fine previously. What happened was that I was running into some odd error messages from rsyslogd and I thought maybe upgrading to a more recent version could correct that (very hopeful thinking here)

06:23:02 <systemdlete> but after doing the upgrade to chimaera-backports

06:23:23 <systemdlete> I noticed that the script was missing (and maybe other stuff, idk)

06:24:28 <systemdlete> this is what happens when you got up early the day before your birthday, ran into technical problems, stayed up all night trying to fix them, and kept running into more problems...

06:25:03 <systemdlete> so I am totally exhausted, but I won't be able to sleep wondering about this.

06:30:46 <systemdlete> gnarface, should I expect to see a deb file for rsyslog in the client's cache directory?

06:31:11 <systemdlete> or does using the cacher cause different behavior?

06:40:44 <systemdlete> or maybe that was only for openwrt clients of the cacher...

07:22:35 <gnarface> systemdlete: yes, it should still use /var/cache/apt/archives normally, if that's what you're asking

14:36:40 <Harzilein> jaromil: grats for getting a new dyne release out of the door. i think that should showcase devuan very nicely.