Crash

The place to discuss Linux and Unix Operating Systems
Forum rules
Behave
Post Reply
User avatar
Zema Bus
Your Co-Host
Posts: 240
Joined: Sun Feb 04, 2024 1:25 am

Crash

Post by Zema Bus »

I noticed the last couple days in Slackware that YT videos would occasionally freeze while the audio continued. Then tonight both screens suddenly shutoff. I couldn't drop to a TTY, it wasn't responsive and ended up having to force a reboot. I didn't see anything obvious on running dmesg, except for this (that's not the kernel I was running, it rebooted into it for some reason):

Code: Select all

[[    0.000000] Linux version 6.6.22 (root@z-mp.slackware.lan) (gcc (GCC) 13.2.0, GNU ld version 2.42-slack151) #1 SMP PREEMPT_DYNAMIC Fri Mar 15 15:52:16 CDT 2024
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-huge-6.6.22 root=UUID=afeeaa81-0179-4848-85b2-99be8652c13e ro
[    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Something I've never seen before, at least not on AMD systems (this is my Intel system). On rebooting it back into Slackware it booted the LTS kernel which I never use (I haven't gotten around to removing it yet), that was odd since I have grub set to boot the last booted kernel (after compiling one I select it in grub and then on subsequent boots it boots that kernel until I change it). But when I rebooted again and went in to select the latest compiled kernel, it was already selected. I read suggestions to disable the split lock detection, which I have done, but I don't know if that even has anything to do with it. I have not experienced this in Arch so far, and I was able to play a YT video without any freezes.
User avatar
Grogan
Your Host
Posts: 484
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Crash

Post by Grogan »

It has everything to do with it. Take a look at my kernel stanzas in my grub.cfg

Code: Select all

menuentry 'Arch Linux' {
        insmod gzio
        insmod part_gpt
        insmod fat
        search --no-floppy --fs-uuid --set=root 69A1-4857
        linux  /vmlinuz-6.8.5 root=/dev/nvme1n1p2 ro mitigations=off split_lock_detect=off loglevel=3 quiet
}
See split_lock_detect=off

That's what that is for, to turn off that deliberate sabotage. That's more than a warning, it deliberately kills performance when that happens to "punish" the application (which means user) by throttling.

I think if you're using grub-mkconfig rather than manually like me, you should probably do it in /etc/default/grub

In that file, look for GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet" near the top, and add it there

Code: Select all

GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet split_lock_detect=off"
I do mean add this to all your kernels, everywhere.

P.S. I'm refreshing your memory here, I've covered this before. One thing I know for sure hits this, under some circumstances, Wine will translate some Windows primitives into splitlocks and you may see those messages while playing some games. (it might cause a stutter or the whole game to perform like shit). It's not only Wine processes though, there are still applications that might hit it. The first I heard of it was a few years ago, suddenly God of War performance changed and got shitty. I was thinking Mesa, and I was done with the game anyway, finished my third playthrough so I shrugged. I came to find out that it was splitlock detection in the kernel causing it.

The kernel devs want splitlocks to go away. We have much better mechanisms, this old crap doesn't jibe well with atomic operations.

Now, splitlocks are bad, but not as bad as that sabotage for the way we use our systems (desktop/gaming not multiuser server environment etc.). Linux doesn't really favour the user as much as the system. Alot of the shit in the kernel (and the defaults) is to protect the system from me, hence all my overrides, sysctls, limits etc.
A split lock is any atomic operation whose operand crosses two cache lines. Since the operand spans two cache lines and the operation must be atomic, the system locks the bus while the CPU accesses the two cache lines.
User avatar
Grogan
Your Host
Posts: 484
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Crash

Post by Grogan »

Actually I'll refresh my fucking memory too, I just covered this a month ago (I forgot I made this thread). I knew it happened to me recently. It was Far Cry 6 that triggered it.

Kernel split lock detection

Huh... that "open link in new window" addon doesn't seem to treat the board's own URLs correctly. It doesn't format them right, and it opens in same window. Let's see what happens if I format it with bbcode. It formats it right, but it doesn't get the new window directive.
User avatar
Grogan
Your Host
Posts: 484
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Crash

Post by Grogan »

OK, now... the reason your message is different (and possibly more hurty) than mine. Well, two reasons actually, but related here.

Modern Intel CPUs issue a debug exception trap that assists this split_lock_detection. It's not using that for you, as it's using the #AC method rather than #DB. They are raising an alignment check exception. I see that's an Intel CPU, so it's probably because it's an older kernel base, and the behaviour of split_lock_detect was worse then. (note that "crashing the kernel" is deliberate... it's not really your kernel crashing and off will stop that). The split_lock_detect code may not have been using the more graceful method yet. The idea was to punish "applications" so that users complain, however there is nothing you can do about this unless the applications are fixed, or you can fix them yourself. Your only other choice is to stop using the applications, so there is no point in enforcing this! It should just STFU.

If interested, this covers it:
https://docs.kernel.org/arch/x86/buslock.html
User avatar
Grogan
Your Host
Posts: 484
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Crash

Post by Grogan »

Code: Select all

[    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Ohh fuck, that's just an informational message because split_lock_detect is enabled. You didn't actually show a split_lock_detect happening. It's simply informing you that it will "crash the kernel" on splitlock detection using the #AC exception (because it's Linux 6.6.x that got booted otherwise it would probably be using the debug trap as it is on mine, rather than "crashing the kernel"). I should have clued into the timestamp if nothing else lol

I didn't read carefully enough today. I has just woken up, uncaffeinated, saw split lock detection and jumped into action (it's something that pisses me off)

This likely has nothing to do with your crash. In fact I read too fast and didn't catch that it was a real crash. Sometimes you "can't see the forest for the trees" so to speak.

Firefox and youtube aren't going to be causing split_lock_detect behaviour (and you'd find evidence spamming dmesg and logs if it did). This is likely graphics card/driver related.

What's the card? If you've got a current kernel, you may want a more current mesa build. It also could be, dare I say it, a graphics card on the way out. How many times have I learned that lesson (fortunately this time it died a quick death and didn't have me chasing my tail for more than one night lol)

P.S. The advice to add that to your kernel command lines still stands, that's bollocks that is detrimental to you, the user. Every kernel of mine gets that.
User avatar
Zema Bus
Your Co-Host
Posts: 240
Joined: Sun Feb 04, 2024 1:25 am

Re: Crash

Post by Zema Bus »

Thanks Grogan. I'm pretty sure I was in kernel 6.8.4 at the time that happened, but on reboot, before I ran dmesg, it booted into the old LTS kernel despite my grub settings to boot the last booted kernel. It's a Radeon RX 6750 card that was trouble free in another machine for a long time and used for gaming. I'll try another card, I put this one in last weekend. I'm not in Slackware right now but I should have this mesa package: mesa-24.0.4-x86_64-1.txz. Forgot to add that it's an Asrock card :)
User avatar
Grogan
Your Host
Posts: 484
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Crash

Post by Grogan »

Don't try another card unless it's happening again, but that's almost current stable (24.0.5 is)

It could also be an issue with your display compositor. Sync issues can have severe consequences with video. Most of the time it's not a fault for you, just the video playback freezes if I finally understand things right, but I remember a time with my R9 380 card, when experiencing black screen halts with audio urrrring (symptoms of what happens when graphics card/driver crashes on that system, not necessarily specific). It never happened while gaming, but youtube videos were one of the things that could trigger it. It turned out that I was hitting that condition because of gkrellm polling a myriad of sensors 10 times a second. I was doing something similar (and I think it may have had lmsensors code or similar... why wouldn't it) in Windows7 with HWInfo64 at the times it was happening there too (but never while gaming). When I stopped doing that, the problem cleared right up forever... until the graphics card died a year or so later with the same symptoms and soon permanently died. I'm wondering if the real problem was something hinkey with the graphics card all along and the constant sensor polling was just generating interrupts and precipitating the problem and losing communication. You should be able to read sensors, and polling at intervals is what monitoring software does.

Still, to this day, I haven't run a monitoring program. I don't need to see that shit, I'll just query lmsensors manually (or start that btop if I'm interested in watching usage and temperature stats... it polls once every 2 seconds by default. I like that program. Even so I don't leave it running for very long. Polling sensors scares me)

If you want to try a mesa build, I'm looking at his .SlackBuild. It should just be a matter of git clone, and if you manually tar it up with a sane name, like mesa-24.1.0.tar.gz the SlackBuild should just work if you drop that in and remove mesa-24.0.4.tar.xz and its sig file. It would be found like this based on the tarball name (and you can see why I want you to sanitize the tarball name lol)

Code: Select all

VERSION=${VERSION:-$(echo $PKGNAM-[0-9]*.tar.?z | rev | cut -f 3- -d . | rev | cut -f 2- -d -)}
So then, I would do this:

Code: Select all

git clone https://gitlab.freedesktop.org/mesa/mesa.git
tar czfp mesa-24.1.0.tar.gz mesa
(or use xz if you wish, with upper case J instead of z... tar cJfp. I hate xz, it's expensive. The .?z wildcard in the variable definition will find it as long as the extension is two letters and ends with z (e.g .zst or .zstd or even .tgz nomenclature would not work) :lol:

Then just take and use that tarball with the Mesa SlackBuild

and yes, I would use main (Mesa uses the politically correct main instead of master) because it's in good shape right now, and there has been a new Vulkan extension enabled by default the other day.
User avatar
Zema Bus
Your Co-Host
Posts: 240
Joined: Sun Feb 04, 2024 1:25 am

Re: Crash

Post by Zema Bus »

Thanks, I see Mesa was updated in Slackware today: mesa-24.0.5-x86_64-1.txz I'll see how that does for now. The compositor occurred to me as a possible culprit since I did still have it enabled so I disabled it. I still had freezing videos though. This does seem to be limited to Slackware, and Slackware itself was fine until recently. Maybe a recent update did it, I think I did updates earlier in the week, after which I was in Arch until a few days ago when I rebooted into Slackware and first noticed the video freezes.
User avatar
Zema Bus
Your Co-Host
Posts: 240
Joined: Sun Feb 04, 2024 1:25 am

Re: Crash

Post by Zema Bus »

After updating/rebooting I played a video and no freezes this time.
User avatar
Grogan
Your Host
Posts: 484
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Crash

Post by Grogan »

I don't think it would have been the mesa point release that helped.

a/upower-1.90.4-x86_64-1.txz: Upgraded.

Maybe that. It falls inline with constant polling causing problems. It's dbus machinery for polling power stats and battery sensors and something in your DE could be using the daemon.

I actually remove that from Arch as well power managers, like xfce-power-manager... I handle that stuff. I mean, I handle it so it doesn't happen lol (I just use x11's facility for the display power management, and xset if I need to change/disable/enable it for games). For example I have start and stop scripts that turn it on and off with xset when I start or quit a game in Lutris.

There's nothing that's ever going to suspend or hibernate my system and I have nothing but performance governor and the like in the kernel. I don't even compile anything that would throttle me.
Post Reply