If a Kernel Panic Occurs Due to a BMC Restart Analysis

Yes, your conclusion is very plausible — the kernel crash you observed appears to be directly linked to the BMC (or BMI, likely a typo for BMC) restarting, especially given the presence of USB-related stalls and a subsequent RCU stall. Here’s a breakdown of what’s happening:

🧠 Understanding the Crash

🔹 Key Indicators in Your Logs

RCU Preempt Stall
- The kernel’s RCU (Read-Copy-Update) subsystem timed out waiting for grace periods to complete. This happens when some CPUs/tasks are unable to make forward progress.
- In your case, the rcu_preempt kthread was starved for ~10 minutes (59518 jiffies ~ 9.9 minutes assuming HZ=1000), which is extreme.
CPU 1 Idling
- An NMI (non-maskable interrupt) was sent to CPU 1, which was found in intel_idle_irq, meaning it was deep in a C-state and unresponsive to softirqs.
Kworker Blocked on USB
- The kernel thread kworker/1:1 was blocked in usb_start_wait_urb() → usb_control_msg() → usbnet_cdc_update_filter().
- This strongly suggests a USB management/network interface (CDC-ETHER class) stopped responding.
Kernel Panic
- Triggered by RCU Stall, which is often a symptom of a broader system stall, not the root cause.

🔌 How the BMC Reboot Can Trigger This

Modern servers often expose USB devices to the host via the BMC, for purposes like:

Virtual KVM
Virtual media (USB mass storage over IPMI)
Management NICs exposed as USB CDC Ethernet devices (like usbnet/cdc_ether)

When the BMC resets, these virtual USB devices disappear temporarily, possibly without proper detachment, which leads to:

URBs (USB Request Blocks) being sent to a non-responsive USB controller
usbnet and kworker threads waiting indefinitely for completions that will never arrive
RCU threads not progressing due to blocked workers or softirq starvation
CPU stuck in idle C-state, not handling softirqs → leads to RCU panic

This is consistent with your backtrace, where:

kworker/1:1 is stuck in usbnet_cdc_update_filter(), waiting on usb_control_msg()
USB timeouts do not always cleanly abort, especially if underlying hardware interface is suddenly gone (e.g. BMC USB detach)

✅ Conclusion

Yes, a BMC reboot in your system configuration can very realistically:

Remove/disable a virtual USB device
Cause kernel threads using that USB (e.g. usbnet) to block indefinitely
Trigger an RCU stall due to downstream starvation and softirq failures
Lead to a kernel panic

🛠️ Recommendations

1. Mitigate USB Dependency on BMC

Avoid using BMC-exposed virtual USB devices in critical host kernel operations (e.g. networking).
If possible, use physical interfaces instead of USB CDC-based virtual ones for system management.

2. Harden USB Stack

Set USB URB timeouts explicitly and conservatively (usb_control_msg() has a timeout argument).
Apply kernel patches related to usbnet and cdc_ether robustness if available (especially for EL8 kernels).

3. Update Kernel & BMC Firmware

This is a known issue on older kernels (you are on 4.18.0-372.32.1.rt7.189.el8_6), where USB URB handling is brittle.
BMC firmware upgrades can improve hotplug detection and stability.

4. Monitor with NMI Watchdog

Enable NMI watchdog (nmi_watchdog=1) if not already active to capture CPU lockups earlier.