On hypervisor (which also service as cinder-volume) the instance becomes unavailable. Cinder-volume and nova-compute stops writing logs, a large number of blkid processes are present in the system. qemu process gets stuck with the following messages in kern.log:
<3>Jan 16 01:13:32 node-7 kernel: [7064341.517162] INFO: task qemu-system-x86:46940 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.521209] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.523780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.531828] INFO: task qemu-system-x86:26526 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.539063] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.546241] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.558120] INFO: task qemu-system-x86:28329 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.570788] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.576466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.591563] INFO: task qemu-system-x86:52940 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.606967] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.614284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.632489] INFO: task qemu-system-x86:66936 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.651233] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.659943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.679644] INFO: task qemu-system-x86:15165 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.700147] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.710537] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.732005] INFO: task qemu-system-x86:19831 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.754275] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.765828] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.787478] INFO: task qemu-system-x86:26367 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.810232] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.821415] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.842957] INFO: task qemu-system-x86:33540 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.864547] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.875733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.897105] INFO: task qemu-system-x86:11121 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.919139] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.929885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Device /dev/sdb is unreadable (dd if=/dev/sdb ... hangs).
In case of restart of cinder-volume service, the qemu process becomes zombie.
In case of restart of tgt service, the qemu process gets killed, but the device sdb (mounted via iSCSI) disappears and can not be remounted via iscsiadm.
lsmod reports that scsi_tgt module is in use by 1 module, but there is no module specified.
The issue happens randomly, on different (but built with the same template) environments.
Environment: MOS 7.0
On hypervisor (which also service as cinder-volume) the instance becomes unavailable. Cinder-volume and nova-compute stops writing logs, a large number of blkid processes are present in the system. qemu process gets stuck with the following messages in kern.log:
<3>Jan 16 01:13:32 node-7 kernel: [7064341.517162] INFO: task qemu-system- x86:46940 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. x86:26526 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. x86:28329 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. x86:52940 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. x86:66936 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. x86:15165 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. x86:19831 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. x86:26367 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. x86:33540 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. x86:11121 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.521209] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.523780] "echo 0 > /proc/sys/
<3>Jan 16 01:13:32 node-7 kernel: [7064341.531828] INFO: task qemu-system-
<3>Jan 16 01:13:32 node-7 kernel: [7064341.539063] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.546241] "echo 0 > /proc/sys/
<3>Jan 16 01:13:32 node-7 kernel: [7064341.558120] INFO: task qemu-system-
<3>Jan 16 01:13:32 node-7 kernel: [7064341.570788] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.576466] "echo 0 > /proc/sys/
<3>Jan 16 01:13:32 node-7 kernel: [7064341.591563] INFO: task qemu-system-
<3>Jan 16 01:13:32 node-7 kernel: [7064341.606967] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.614284] "echo 0 > /proc/sys/
<3>Jan 16 01:13:32 node-7 kernel: [7064341.632489] INFO: task qemu-system-
<3>Jan 16 01:13:32 node-7 kernel: [7064341.651233] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.659943] "echo 0 > /proc/sys/
<3>Jan 16 01:13:32 node-7 kernel: [7064341.679644] INFO: task qemu-system-
<3>Jan 16 01:13:32 node-7 kernel: [7064341.700147] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.710537] "echo 0 > /proc/sys/
<3>Jan 16 01:13:32 node-7 kernel: [7064341.732005] INFO: task qemu-system-
<3>Jan 16 01:13:32 node-7 kernel: [7064341.754275] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.765828] "echo 0 > /proc/sys/
<3>Jan 16 01:13:32 node-7 kernel: [7064341.787478] INFO: task qemu-system-
<3>Jan 16 01:13:32 node-7 kernel: [7064341.810232] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.821415] "echo 0 > /proc/sys/
<3>Jan 16 01:13:32 node-7 kernel: [7064341.842957] INFO: task qemu-system-
<3>Jan 16 01:13:32 node-7 kernel: [7064341.864547] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.875733] "echo 0 > /proc/sys/
<3>Jan 16 01:13:32 node-7 kernel: [7064341.897105] INFO: task qemu-system-
<3>Jan 16 01:13:32 node-7 kernel: [7064341.919139] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.929885] "echo 0 > /proc/sys/
Device /dev/sdb is unreadable (dd if=/dev/sdb ... hangs).
In case of restart of cinder-volume service, the qemu process becomes zombie.
In case of restart of tgt service, the qemu process gets killed, but the device sdb (mounted via iSCSI) disappears and can not be remounted via iscsiadm.
lsmod reports that scsi_tgt module is in use by 1 module, but there is no module specified.
The issue happens randomly, on different (but built with the same template) environments.