Skip to content

Conversation

@marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Apr 30, 2025

Boot time is not a good time to debug issues like
pmem/ndctl#278 or any other really.

git -C ndctl/ grep 'modprobe.*cxl_test' shows that pretty much every test already loads and unloads cxl_test by itself. If some exotic test managed to avoid doing this until today, then falling back in line is overdue.

PS: the "depmod_load_..." variable names were wrong: this was relying on systemd, not depmod.

Boot time is not a good time to debug issues like
pmem/ndctl#278 or any other really.

`git -C ndctl/ grep 'modprobe.*cxl_test'` shows that pretty much every
test already loads and unloads cxl_test by itself. If some exotic test
managed to avoid doing this until today, then falling back in line is
overdue.

PS: the "depmod_load_..." variable names were wrong: this was relying on
systemd, not depmod.

Signed-off-by: Marc Herbert <[email protected]>
@marc-hb marc-hb marked this pull request as ready for review April 30, 2025 23:25
@marc-hb marc-hb requested a review from stellarhopper as a code owner April 30, 2025 23:25
@AlisonSchofield
Copy link
Contributor

I don't know about this one. The history has been that when I select --cxl-test on the cmdline I get the cxl-test environment upon boot up and am ready to explore. I may be trying out any number of things in that environment, including, but not limited to running unit tests. Remember cxl-test is an environment, not only a unit test vehicle.
I'd rather it stay as is and present me with the usable environment upon boot up.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 30, 2025

The history has been that when I select --cxl-test on the cmdline I get the cxl-test environment upon boot up and am ready to explore.

Even after this PR, you are only one, simple and obvious modprobe cxl_test command away from that exploration. Choice.

Before this PR, no choice: pmem/ndctl#278 crash (and others) immediately at boot time and very obscure how to "opt out". For weeks, I didn't even realize run_qemu.sh was responsible and really wondered how to disable this.
I also struggled to connect messages in the logs to loading that module: logs are extremely noisy at boot time.
Depending on the issue it's not obvious the system can be recovered at all. cxl_test is a kernel module test hack after all.

@AlisonSchofield
Copy link
Contributor

The 'why' in this pull request is: "Boot time is not a good time to debug issues like pmem/ndctl#278 or any other really."

Following the link to issue #278 the story doesn't get much crisper. Why does this change need to be made?

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 1, 2025

"Boot time is not a good time to debug issues"

Boot time is a very busy time, so it's impossible to focus on one thing at time. This means:

  • logs are very noisy, hard to see what's relevant and what is not
  • races are very common

Making environment changes (log levels, tracing, init state,...) is also MUCH harder because it can require rebuilding the image as opposed to simply making the changes in place. Depending on the change desired, this can require advanced mkosi knowledge or even hacking. It also requires one to keep track of all image changes made as opposed to just rebooting a fresh image when needed.

There's really nothing specific to this PR there.

Applied to cxl_test: this PR allows "exploration" before and after loading.

@AlisonSchofield
Copy link
Contributor

If this is a change to make debugging something else easier, why not do that on a debug branch and only commit a change like this to main when it's deemed required?

FWIW adjusting the cxl dyndbg settings is useful for controlling the spew during boot and runtime.

@stellarhopper
Copy link
Member

I'd added this as a convenience thing - agree with @AlisonSchofield in that a lot of times you just want to look at something in cxl_test manually, not just run the test suite, and for those times it is nice to have it pre loaded by the time you get to a shell.

If it is causing actual problems (I'm not yet sure what the motivation for this is other than general cleanliness?) - sure removing it makes sense, but otherwise I think it's nice to have. I'll leave it to the folks who use this day to day for an ack/nack.

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 19, 2025

I'd added this as a convenience thing

There is indeed a big convenience difference. Both use cases are valid. But carrying this patch locally on every test system is pretty inconvenient and that's assuming you even know this patch exists. When you don't know it exists, the inconvenience is huge.

On the other, modprobe cxl_test is obvious and its inconvenience is very small.

@marc-hb

This comment was marked as resolved.

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 19, 2025

I found a surprising modprobe -r cxl_test behavior ...

That's because modprobe -r cxl_test removes cxl_acpi. It does not happen with rmmod cxl_test. That behavior is not intuitive because it causes cxl_test to affect non-cxl_test devices but there seems to be some logic to it. Thanks @davejiang for solving this mystery! EDIT: cxl list -i shows the devices, thanks @AlisonSchofield

@marc-hb
Copy link
Collaborator Author

marc-hb commented Oct 9, 2025

Interestingly, loading cxl_test at boot makes the default configuration of Redhat/CentOS/Alma/etc. incompatible with the default run_qemu.sh configuration because the former try to auto-online its memory using some udev rules. See https://bugzilla.redhat.com/show_bug.cgi?id=1748051 and https://issues.redhat.com/browse/RHEL-92781 for more.

If I ever get bored I might add add a new run_qemu.sh option...

From @davejiang:

don't online cxl_test devices. They are not backed by real memory and really bad things happen.

Indeed, this warning gets printed 250 times:

Oct 09 21:33:18 :  non-paged memory
Oct 09 21:33:18 : ------------[ cut here ]------------
Oct 09 21:33:18 : list_add corruption. prev->next should be next (ffff88807fd92cf8), but was 0000000000000000. (prev=ffffeaffc0a08008).
Oct 09 21:33:18 : WARNING: CPU: 5 PID: 1255 at lib/list_debug.c:32 __list_add_valid_or_report+0xca/0xf0
Oct 09 21:33:18 : Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common ppdev kmem device_dax nd_pmem nd_btt dax_pm>
Oct 09 21:33:18 : CPU: 5 UID: 0 PID: 1255 Comm: daxctl Tainted: G           O        6.17.0 #264 PREEMPT(none) 
Oct 09 21:33:18 : Tainted: [O]=OOT_MODULE
Oct 09 21:33:18 : Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
Oct 09 21:33:18 : RIP: 0010:__list_add_valid_or_report+0xca/0xf0
Oct 09 21:33:18 : Code: f7 48 89 54 24 08 48 89 34 24 e8 31 b9 9e ff 48 8b 34 24 48 c7 c7 f0 a3 13 83 48 8b 16 48 89 f1 48 8b 74 24 08 e8 76 70>
Oct 09 21:33:18 : RSP: 0018:ffffc9000f33fbc8 EFLAGS: 00010082
Oct 09 21:33:18 : RAX: 0000000000000000 RBX: 0000000000000009 RCX: 0000000000000003
Oct 09 21:33:18 : RDX: ffff88807cf57f48 RSI: 0000000000000001 RDI: 00000000ffffffff
Oct 09 21:33:18 : RBP: 0000000000000009 R08: 00000000ffefffff R09: ffff88827c7fdfa8
Oct 09 21:33:18 : R10: 0000000000000005 R11: 00000000fff00000 R12: 0000000000000005
Oct 09 21:33:18 : R13: ffff88807fd92bf8 R14: ffffeaffc0a08008 R15: 0000000000000200
Oct 09 21:33:18 : FS:  00007faa98278c40(0000) GS:ffff8880f5561000(0000) knlGS:0000000000000000
Oct 09 21:33:18 : CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 09 21:33:18 : CR2: 000055fed74a8780 CR3: 0000000028254006 CR4: 0000000000372ef0
Oct 09 21:33:18 : Call Trace:
Oct 09 21:33:18 :  <TASK>
Oct 09 21:33:18 :  __free_one_page+0x205/0x770
Oct 09 21:33:18 :  split_large_buddy+0xdb/0x110
Oct 09 21:33:18 :  free_one_page+0x5e/0x160
Oct 09 21:33:18 :  online_pages+0x211/0x360
Oct 09 21:33:18 :  memory_block_online+0x13f/0x220
Oct 09 21:33:18 :  memory_subsys_online+0x30/0x60
Oct 09 21:33:18 :  device_online+0x4a/0x80
Oct 09 21:33:18 :  state_store+0x92/0xa0
Oct 09 21:33:18 :  kernfs_fop_write_iter+0x146/0x200
Oct 09 21:33:18 :  vfs_write+0x216/0x470
Oct 09 21:33:18 :  ksys_write+0x70/0xf0
Oct 09 21:33:18 :  do_syscall_64+0x57/0xd90
Oct 09 21:33:18 :  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Oct 09 21:33:18 : RIP: 0033:0x7faa98520cd4
Oct 09 21:33:18 : Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 15 c4 0d 00 00 74 13 b8 01 00 00 00>
Oct 09 21:33:18 : RSP: 002b:00007ffd9aa61a18 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Oct 09 21:33:18 : RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007faa98520cd4
Oct 09 21:33:18 : RDX: 000000000000000f RSI: 00007faa9865b62b RDI: 0000000000000004
Oct 09 21:33:18 : RBP: 00007faa9865b62b R08: 0000000000000073 R09: 00000000ffffffff
Oct 09 21:33:18 : R10: 0000000000000000 R11: 0000000000000202 R12: 000055ae10660f50
Oct 09 21:33:18 : R13: 000055ae106602a0 R14: 000055ae106602a0 R15: 0000000000000001
Oct 09 21:33:18 :  </TASK>
Oct 09 21:33:18 : ---[ end trace 0000000000000000 ]---

@marc-hb
Copy link
Collaborator Author

marc-hb commented Oct 9, 2025

Just tested and confirmed that cxl_test is incompatible with CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_xxx too.

So if you accidentally try the wrong combination then run_qemu.sh blows up at boot time.

run_qemu.sh a test tool, so it's expected to blow up in various combinations but again: boot time is super inconvenient because it's the most busy and non-interactive time which means non-experts are incapable of finding what caused what, what they did wrong and what they should change.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Oct 22, 2025

because the former try to auto-online its memory using some udev rules. See https://bugzilla.redhat.com/show_bug.cgi?id=1748051 and https://issues.redhat.com/browse/RHEL-92781 for more.

A bit off-topic sorry: this udev hotplug is even more evil than I expected: in my configuration, that rule was somehow enough to corrupt QEMU itself! I was booting with this rule first, deleting it and then restart Linux only (not QEMU) with a reboot command. Not good enough: the cxl test suite would then fail in mysterious ways. Like this for instance:

[  193.326790] cxl_pci 0000:c2:00.0: Range register decodes outside platform defined CXL ranges.
[  193.332012] cxl_mem mem0: endpoint20 failed probe

marc-hb added a commit to marc-hb/run_qemu that referenced this pull request Oct 22, 2025
This file passes the entire ndctl test suite with a couple additional
quirks like `meson setup -Ddocs=disabled` and deleting the evil udev
hotplug rule discussed in
pmem#185 (comment)

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit that referenced this pull request Oct 22, 2025
This file passes the entire ndctl test suite with a couple additional
quirks like `meson setup -Ddocs=disabled` and deleting the evil udev
hotplug rule discussed in
#185 (comment)

Signed-off-by: Marc Herbert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants