Kernel panic - not syncing: hung_task: blocked tasks
Kernel panic - not syncing: hung_task: blocked tasks
https://access.redhat.com/solutions/146183
Solution Verified - Updated August 6 2024 at 7:50 AM - English
Environment
Red Hat Enterprise Linux 5
Kernel-2.6.18-308.8.2.el5PAE
Issue
System crashed with following Call Traces.
Raw
INFO: task kjournald:437 blocked for more than 120 seconds.
“echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
kjournald D 000009B5 2992 437 19 462 416 (L-TLB)
cf7fded4 00000046 a53cb5f8 000009b5 c04e19df 00000046 cf699b60 0000000a
f7fbbaa0 a53cc8ef 000009b5 000012f7 00000000 f7fbbbac cf392600 f4894040
0146c25c cf392fa0 cf49fe2c cf49fe3c 0146c25c c042d81f cf49fe1c ffffffff
Call Trace:
[
Collect vmcore at the time of crash.
Vmcore Analysis
Raw
KERNEL: /cores/20120619070936/work/vmlinux
DUMPFILE: /cores/20120619070936/work/vmcore-654238
CPUS: 4
DATE: Mon Jun 18 08:47:37 2012
UPTIME: 03:02:01
LOAD AVERAGE: 64.76, 25.04, 9.55
TASKS: 409
NODENAME: server1
RELEASE: 2.6.18-308.8.2.el5PAE
VERSION: #1 SMP Tue May 29 12:17:25 EDT 2012
MACHINE: i686 (3200 Mhz)
MEMORY: 12.4 GB
PANIC: “Kernel panic - not syncing: hung_task: blocked tasks” <===
PID: 197
COMMAND: “khungtaskd”
TASK: f7f71550 [THREAD_INFO: f7f6f000]
CPU: 1
STATE: TASK_RUNNING (PANIC)
Raw
crash > log
…
INFO: task kjournald:437 blocked for more than 120 seconds.
“echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
kjournald D 000009B5 2992 437 19 462 416 (L-TLB)
cf7fded4 00000046 a53cb5f8 000009b5 c04e19df 00000046 cf699b60 0000000a
f7fbbaa0 a53cc8ef 000009b5 000012f7 00000000 f7fbbbac cf392600 f4894040
0146c25c cf392fa0 cf49fe2c cf49fe3c 0146c25c c042d81f cf49fe1c ffffffff
Call Trace:
[
crash> bt 437 PID: 437 TASK: f7fbbaa0 CPU: 0 COMMAND: “kjournald” #0 [cf7fde60] schedule at c0622e8a #1 [cf7fded8] io_schedule at c062352b #2 [cf7fdee4] sync_buffer at c0479526 #3 [cf7fdee8] __wait_on_bit at c0623704 #4 [cf7fdf00] out_of_line_wait_on_bit at c0623789 #5 [cf7fdf38] __wait_on_buffer at c04794a3 #6 [cf7fdf44] journal_commit_transaction at f88534b5 [jbd] #7 [cf7fdfa8] kjournald at f8856d27 [jbd] #8 [cf7fdfcc] kthread at c04372e0 #9 [cf7fdfe4] kernel_thread_helper at c0405c85 Following are the tasks which are waiting for a long time. Raw [ 0 00:02:54.358] UN+ PID: 27575 TASK: ddac8000 CPU: 3 COMMAND: “mysqld” [ 0 00:02:56.949] UN+ PID: 5664 TASK: f531caa0 CPU: 3 COMMAND: “mysqld” [ 0 00:03:05.371] UN+ PID: 6755 TASK: f7f61aa0 CPU: 3 COMMAND: “mysqld” [ 0 00:03:10.666] UN+ PID: 28175 TASK: dc354550 CPU: 2 COMMAND: “mysqld” [ 0 00:03:25.446] UN+ PID: 5843 TASK: f4ff8aa0 CPU: 1 COMMAND: “mysqld” [ 0 00:03:27.752] UN+ PID: 4186 TASK: f5ce9550 CPU: 0 COMMAND: “cmaidad” [ 0 00:03:30.949] UN+ PID: 4080 TASK: f5ceb000 CPU: 1 COMMAND: “cmaeventd” [ 0 00:03:33.514] UN+ PID: 437 TASK: f7fbbaa0 CPU: 0 COMMAND: “kjournald” [ 0 00:03:34.206] UN+ PID: 5639 TASK: f6ef7550 CPU: 1 COMMAND: “mysqld” [ 0 00:03:35.007] UN+ PID: 7930 TASK: d7b74000 CPU: 1 COMMAND: “sendmail” [ 0 00:03:35.149] UN+ PID: 2026 TASK: f745c550 CPU: 2 COMMAND: “syslogd” [ 0 00:03:35.154] UN+ PID: 2004 TASK: f7a8d000 CPU: 0 COMMAND: “auditd” Raw crash> bt 2004 PID: 2004 TASK: f7a8d000 CPU: 0 COMMAND: “auditd” #0 [f7b3fe5c] schedule at c0622e8a #1 [f7b3fed4] log_wait_commit at f88562e0 [jbd] #2 [f7b3fefc] journal_stop at f8851674 [jbd] #3 [f7b3ff18] __writeback_single_inode at c0495f83 #4 [f7b3ff54] sync_inode at c049661f #5 [f7b3ff60] ext3_sync_file at f8889014 [ext3] #6 [f7b3ff94] do_fsync at c0479062 #7 [f7b3ffac] __do_fsync at c04790bf #8 [f7b3ffb8] system_call at c0404f44 There are 77 processes waiting on I/O. Raw crash> foreach bt | grep io_schedule | wc -l 77 Raw crash> ctlr_info_t 0xcf4f0000 struct ctlr_info_t { ctlr = 0, devname = “cciss0\000”, product_name = 0xf88ce60c “Smart Array 6i”, firm_ver = “2.32”, pdev = 0xf7f46000, board_id = 1083248145, vaddr = 0xf8806000, paddr = 4261347328, nr_cmds = 12, cfgtable = 0xf881a000, interrupts_enabled = 1, major = 104, max_commands = 1024, commands_outstanding = 14, <— max_outstanding = 14, num_luns = 2, highest_lun = 1, usage_count = 7, scatter_list = 0xcf529000, maxsgentries = 543, chainsize = 513, max_cmd_sgentries = 32, cmd_sg_list = 0xf7fad9c0, intr = {201, 0, 0, 0}, msix_vector = 0, msi_vector = 0, intr_mode = 0, cciss_read = 40 ‘(‘, cciss_write = 42 ‘*’, cciss_max_sectors = 8192,
reqQ = { next = 0xcf4f0898, prev = 0xcf4f0898 }, cmpQ = { next = 0xcf6da248, prev = 0xd7bce248 }, Load 32 bit ext-cciss module in vmcore and verify. Raw crash> extend ext-cciss-56-i386.so ./ext-cciss-56-i386.so: shared object loaded
crash> cciss -v V0.01 cmd starting… switch = ‘v’ DBG: jiffies = 0000000000A211F2 DBG: hba[00] 0x00000000CF4F0000 DBG: cciss0.ctlr = ‘0’ DBG: cciss0.devname = ‘cciss0’ DBG: cciss0.firm_ver = ‘2.32’ DBG: cciss0.product_name = ‘00000000F88CE60C’ (Smart Array 6i) DBG: cciss0.board_id = ‘40910E11’ DBG: cciss0.pci_dev = ‘00000000F7F46000’ DBG: cciss0.major = ‘104’ DBG: cciss0.max_commands = ‘1024’ DBG: cciss0.nr_cmds = ‘12’ DBG: cciss0.commands_outstanding = ‘14’ DBG: cciss0.max_outstanding = ‘14’ DBG: cciss0.num_luns = ‘2’ DBG: cciss0.highest_lun = ‘1’ DBG: cciss0.cciss_read = ‘0’ DBG: cciss0.cciss_write = ‘0’ DBG: cciss0.drv{list} = ‘00000000BFD97270’ DBG: cciss0.reqQ = ‘00000000F88C9316’ DBG: cciss0.cmpQ = ‘00000000CF4F0898’ [00000000CF4F0650] DBG: cciss0.maxQsinceinit = ‘00000000CF6DA248’ DBG: cciss0.maxSG = ‘-675487160’ DBG: cciss0.cmd_pool = ‘000000000000000C’ DBG: cciss0.errinfo_pool = ‘00000000CF6DA000’ DBG: cciss0.cmd_pool_bits = ‘00000000CF681000’ DBG: cciss0.nr_allocs = ‘258478080’ DBG: cciss0.nr_frees = ‘0’ (258478080) ccissQ request CDB started —————- —————- ————————————- ———— [ 0] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 <=== Waiting for almost 3 hours [ 1] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 2] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 3] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 4] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 5] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 6] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 7] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 8] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 9] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 10] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 11] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 12] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424 [ 13] 00000000CF4F0650 00000000CF6DA248 00h ??? 02:57:01.424
UPTIME: 03:02:01 The above report indicates that the requests are being waiting for some time. Raw crash> ctlr_info_t 0xcf4f0000|grep -A 2 cmpQ cmpQ = {
next = 0xcf6da248, <===
prev = 0xd7bce248 Raw crash> request 0xcf6da248|grep start start_time = 2, Raw crash> pd jiffies0xcf6da248 jiffies = $14 = 10621426 crash> pd 10621426-2 $15 = 10621424 Product(s) Red Hat Enterprise LinuxComponent kernelCategory TroubleshootTags panic rhel_5 This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.