Comment 0 for bug 1750835

Revision history for this message
Roman Safonov (rsafonov) wrote :

Environment: MOS 7.0

On hypervisor (which also service as cinder-volume) the instance becomes unavailable. Cinder-volume and nova-compute stops writing logs, a large number of blkid processes are present in the system. qemu process gets stuck with the following messages in kern.log:

<3>Jan 16 01:13:32 node-7 kernel: [7064341.517162] INFO: task qemu-system-x86:46940 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.521209] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.523780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.531828] INFO: task qemu-system-x86:26526 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.539063] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.546241] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.558120] INFO: task qemu-system-x86:28329 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.570788] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.576466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.591563] INFO: task qemu-system-x86:52940 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.606967] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.614284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.632489] INFO: task qemu-system-x86:66936 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.651233] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.659943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.679644] INFO: task qemu-system-x86:15165 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.700147] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.710537] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.732005] INFO: task qemu-system-x86:19831 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.754275] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.765828] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.787478] INFO: task qemu-system-x86:26367 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.810232] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.821415] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.842957] INFO: task qemu-system-x86:33540 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.864547] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.875733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.897105] INFO: task qemu-system-x86:11121 blocked for more than 120 seconds.
<3>Jan 16 01:13:32 node-7 kernel: [7064341.919139] Tainted: G OX 3.13.0-65-generic #105-Ubuntu
<3>Jan 16 01:13:32 node-7 kernel: [7064341.929885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Device /dev/sdb is unreadable (dd if=/dev/sdb ... hangs).

In case of restart of cinder-volume service, the qemu process becomes zombie.

In case of restart of tgt service, the qemu process gets killed, but the device sdb (mounted via iSCSI) disappears and can not be remounted via iscsiadm.

lsmod reports that scsi_tgt module is in use by 1 module, but there is no module specified.

The issue happens randomly, on different (but built with the same template) environments.