2018-11-26 22:53:57 |
dann frazier |
bug |
|
|
added bug |
2018-12-05 11:20:39 |
Alex Bennée |
tags |
|
qemu-img |
|
2019-04-15 22:25:55 |
dann frazier |
qemu: status |
New |
Confirmed |
|
2019-04-15 23:37:35 |
dann frazier |
bug watch added |
|
https://bugzilla.redhat.com/show_bug.cgi?id=1524770 |
|
2019-04-15 23:37:54 |
dann frazier |
bug |
|
|
added subscriber Ike Panhc |
2019-06-05 16:16:14 |
dann frazier |
bug task added |
|
qemu (Ubuntu) |
|
2019-06-05 16:16:24 |
dann frazier |
qemu (Ubuntu): status |
New |
Confirmed |
|
2019-09-05 15:03:50 |
Rafael David Tinoco |
qemu (Ubuntu): status |
Confirmed |
In Progress |
|
2019-09-05 15:03:53 |
Rafael David Tinoco |
qemu (Ubuntu): assignee |
|
Rafael David Tinoco (rafaeldtinoco) |
|
2019-09-05 15:03:56 |
Rafael David Tinoco |
qemu (Ubuntu): importance |
Undecided |
Medium |
|
2019-09-10 01:28:48 |
Rafael David Tinoco |
bug |
|
|
added subscriber Canonical Server Team |
2019-09-10 01:28:55 |
Rafael David Tinoco |
bug |
|
|
added subscriber Christian Ehrhardt |
2019-09-10 18:15:40 |
Rafael David Tinoco |
summary |
qemu-img hangs on high core count ARM system |
qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images |
|
2019-09-10 18:15:54 |
Rafael David Tinoco |
qemu: status |
Confirmed |
In Progress |
|
2019-09-10 18:15:57 |
Rafael David Tinoco |
qemu: assignee |
|
Rafael David Tinoco (rafaeldtinoco) |
|
2019-09-11 11:19:48 |
Rafael David Tinoco |
description |
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
Command:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
[ Original Description ]
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
|
2019-09-11 11:20:10 |
Rafael David Tinoco |
description |
Command:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
[ Original Description ]
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
Command:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
[ Original Description ]
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
|
2019-09-11 19:23:21 |
Rafael David Tinoco |
nominated for series |
|
Ubuntu Ff-series |
|
2019-09-11 19:23:21 |
Rafael David Tinoco |
bug task added |
|
qemu (Ubuntu Ff-series) |
|
2019-09-11 19:23:21 |
Rafael David Tinoco |
nominated for series |
|
Ubuntu Bionic |
|
2019-09-11 19:23:21 |
Rafael David Tinoco |
bug task added |
|
qemu (Ubuntu Bionic) |
|
2019-09-11 19:23:21 |
Rafael David Tinoco |
nominated for series |
|
Ubuntu Eoan |
|
2019-09-11 19:23:21 |
Rafael David Tinoco |
bug task added |
|
qemu (Ubuntu Eoan) |
|
2019-09-11 19:23:21 |
Rafael David Tinoco |
nominated for series |
|
Ubuntu Disco |
|
2019-09-11 19:23:21 |
Rafael David Tinoco |
bug task added |
|
qemu (Ubuntu Disco) |
|
2019-10-02 11:02:52 |
Jan Glauber |
attachment added |
|
aio-posix.tar.xz https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5293619/+files/aio-posix.tar.xz |
|
2019-10-03 12:28:50 |
Rafael David Tinoco |
bug watch added |
|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 |
|
2019-10-03 12:29:29 |
Rafael David Tinoco |
qemu (Ubuntu Disco): importance |
Undecided |
Medium |
|
2019-10-03 12:29:31 |
Rafael David Tinoco |
qemu (Ubuntu Bionic): importance |
Undecided |
Medium |
|
2019-10-03 12:29:33 |
Rafael David Tinoco |
qemu (Ubuntu Ff-series): importance |
Undecided |
Medium |
|
2019-10-03 21:35:01 |
dann frazier |
bug task added |
|
kunpeng920 |
|
2019-12-13 14:24:54 |
dann frazier |
kunpeng920: status |
New |
Confirmed |
|
2019-12-13 14:25:03 |
dann frazier |
qemu (Ubuntu Bionic): status |
New |
Confirmed |
|
2019-12-13 14:25:06 |
dann frazier |
qemu (Ubuntu Disco): status |
New |
Confirmed |
|
2019-12-13 14:25:10 |
dann frazier |
qemu (Ubuntu Focal): status |
New |
Confirmed |
|
2019-12-18 14:52:52 |
dann frazier |
attachment added |
|
comment-34-ported-to-upstream.patch https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5313631/+files/comment-34-ported-to-upstream.patch |
|
2019-12-18 16:21:29 |
Ubuntu Foundations Team Bug Bot |
tags |
qemu-img |
patch qemu-img |
|
2019-12-26 02:35:55 |
dlw |
bug |
|
|
added subscriber dlw |
2020-02-13 08:41:19 |
Ike Panhc |
tags |
patch qemu-img |
ikeradar patch qemu-img |
|
2020-02-13 08:42:47 |
Andrew Cloke |
kunpeng920: status |
Confirmed |
Incomplete |
|
2020-02-13 08:42:50 |
Andrew Cloke |
qemu (Ubuntu Bionic): status |
Confirmed |
Incomplete |
|
2020-02-13 08:42:52 |
Andrew Cloke |
qemu (Ubuntu Disco): status |
Confirmed |
Incomplete |
|
2020-02-13 08:42:56 |
Andrew Cloke |
qemu (Ubuntu Eoan): status |
In Progress |
Incomplete |
|
2020-02-13 08:42:59 |
Andrew Cloke |
qemu (Ubuntu Focal): status |
Confirmed |
Incomplete |
|
2020-04-15 02:47:27 |
Rafael David Tinoco |
qemu (Ubuntu Eoan): assignee |
Rafael David Tinoco (rafaeldtinoco) |
|
|
2020-04-15 02:48:17 |
Rafael David Tinoco |
qemu: assignee |
Rafael David Tinoco (rafaeldtinoco) |
|
|
2020-05-05 15:02:29 |
Ike Panhc |
kunpeng920: status |
Incomplete |
Triaged |
|
2020-05-05 15:02:45 |
Ike Panhc |
kunpeng920: assignee |
|
Ike Panhc (ikepanhc) |
|
2020-05-05 15:03:03 |
Ike Panhc |
nominated for series |
|
kunpeng920/upstream-kernel |
|
2020-05-05 15:03:03 |
Ike Panhc |
bug task added |
|
kunpeng920/upstream-kernel |
|
2020-05-05 15:03:03 |
Ike Panhc |
nominated for series |
|
kunpeng920/ubuntu-20.04 |
|
2020-05-05 15:03:03 |
Ike Panhc |
bug task added |
|
kunpeng920/ubuntu-20.04 |
|
2020-05-05 15:03:03 |
Ike Panhc |
nominated for series |
|
kunpeng920/ubuntu-19.10 |
|
2020-05-05 15:03:03 |
Ike Panhc |
bug task added |
|
kunpeng920/ubuntu-19.10 |
|
2020-05-05 15:03:03 |
Ike Panhc |
nominated for series |
|
kunpeng920/ubuntu-18.04 |
|
2020-05-05 15:03:03 |
Ike Panhc |
bug task added |
|
kunpeng920/ubuntu-18.04 |
|
2020-05-05 15:03:03 |
Ike Panhc |
nominated for series |
|
kunpeng920/ubuntu-18.04-hwe |
|
2020-05-05 15:03:03 |
Ike Panhc |
bug task added |
|
kunpeng920/ubuntu-18.04-hwe |
|
2020-05-05 15:03:24 |
Ike Panhc |
kunpeng920/upstream-kernel: status |
New |
Fix Committed |
|
2020-05-06 13:08:14 |
Rafael David Tinoco |
qemu (Ubuntu): assignee |
Rafael David Tinoco (rafaeldtinoco) |
|
|
2020-05-06 13:08:26 |
Rafael David Tinoco |
qemu: status |
In Progress |
Fix Released |
|
2020-05-06 13:08:59 |
Rafael David Tinoco |
qemu (Ubuntu Focal): status |
Incomplete |
In Progress |
|
2020-05-06 13:09:10 |
Rafael David Tinoco |
qemu (Ubuntu Eoan): status |
Incomplete |
In Progress |
|
2020-05-06 13:09:25 |
Rafael David Tinoco |
qemu (Ubuntu Disco): status |
Incomplete |
In Progress |
|
2020-05-06 13:09:38 |
Rafael David Tinoco |
qemu (Ubuntu Bionic): status |
Incomplete |
In Progress |
|
2020-05-06 13:09:56 |
Rafael David Tinoco |
qemu (Ubuntu): status |
Incomplete |
In Progress |
|
2020-05-06 13:23:06 |
Rafael David Tinoco |
description |
Command:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
[ Original Description ]
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
[Impact]
* QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely.
[Test Case]
* qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs in Aarch64.
[Regression Potential]
* This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks.
* There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log.
* Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions.
[Other Info]
* Original Description bellow:
Command:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
|
2020-05-06 15:45:56 |
Ike Panhc |
kunpeng920/ubuntu-18.04: status |
New |
Triaged |
|
2020-05-06 15:45:56 |
Ike Panhc |
kunpeng920/ubuntu-18.04: assignee |
|
Ike Panhc (ikepanhc) |
|
2020-05-06 15:46:15 |
Ike Panhc |
kunpeng920/ubuntu-18.04-hwe: status |
New |
Triaged |
|
2020-05-06 15:46:15 |
Ike Panhc |
kunpeng920/ubuntu-18.04-hwe: assignee |
|
Ike Panhc (ikepanhc) |
|
2020-05-06 15:46:25 |
Ike Panhc |
kunpeng920/ubuntu-19.10: status |
New |
Triaged |
|
2020-05-06 15:46:25 |
Ike Panhc |
kunpeng920/ubuntu-19.10: assignee |
|
Ike Panhc (ikepanhc) |
|
2020-05-06 15:46:35 |
Ike Panhc |
kunpeng920/ubuntu-20.04: status |
New |
Triaged |
|
2020-05-06 15:46:35 |
Ike Panhc |
kunpeng920/ubuntu-20.04: assignee |
|
Ike Panhc (ikepanhc) |
|
2020-05-06 16:42:49 |
dann frazier |
description |
[Impact]
* QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely.
[Test Case]
* qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs in Aarch64.
[Regression Potential]
* This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks.
* There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log.
* Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions.
[Other Info]
* Original Description bellow:
Command:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
[Impact]
* QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely.
[Test Case]
* qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs in Aarch64.
[Regression Potential]
* This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks.
* There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log.
* Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions.
* dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems.
[Other Info]
* Original Description bellow:
Command:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
|
2020-05-06 19:04:22 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/qemu/+git/qemu/+merge/383530 |
|
2020-05-06 19:06:08 |
Philippe Mathieu-Daudé |
bug |
|
|
added subscriber Stefan Hajnoczi |
2020-05-06 21:10:06 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/qemu/+git/qemu/+merge/383545 |
|
2020-05-06 21:44:13 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/qemu/+git/qemu/+merge/383551 |
|
2020-05-07 03:37:34 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/qemu/+git/qemu/+merge/383566 |
|
2020-05-14 08:05:17 |
Andrew Cloke |
kunpeng920: status |
Triaged |
In Progress |
|
2020-05-27 04:55:20 |
Christian Ehrhardt |
bug task deleted |
qemu (Ubuntu Disco) |
|
|
2020-05-29 07:55:40 |
Launchpad Janitor |
qemu (Ubuntu): status |
In Progress |
Fix Released |
|
2020-06-02 22:44:57 |
Brian Murray |
tags |
ikeradar patch qemu-img |
block-proposed-focal ikeradar patch qemu-img |
|
2020-06-02 22:45:15 |
Brian Murray |
qemu (Ubuntu Focal): status |
In Progress |
Fix Committed |
|
2020-06-02 22:45:17 |
Brian Murray |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2020-06-02 22:45:20 |
Brian Murray |
bug |
|
|
added subscriber SRU Verification |
2020-06-02 22:45:30 |
Brian Murray |
tags |
block-proposed-focal ikeradar patch qemu-img |
block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-focal |
|
2020-06-02 22:49:01 |
Brian Murray |
qemu (Ubuntu Eoan): status |
In Progress |
Fix Committed |
|
2020-06-02 22:49:10 |
Brian Murray |
tags |
block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-focal |
block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-eoan verification-needed-focal |
|
2020-06-02 22:49:47 |
Brian Murray |
tags |
block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-eoan verification-needed-focal |
block-proposed-eoan block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-eoan verification-needed-focal |
|
2020-06-02 22:54:23 |
Brian Murray |
qemu (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2020-06-02 22:54:32 |
Brian Murray |
tags |
block-proposed-eoan block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-eoan verification-needed-focal |
block-proposed-eoan block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-bionic verification-needed-eoan verification-needed-focal |
|
2020-06-02 22:55:05 |
Brian Murray |
tags |
block-proposed-eoan block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-bionic verification-needed-eoan verification-needed-focal |
block-proposed-bionic block-proposed-eoan block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-eoan verification-needed-focal |
|
2020-06-03 06:36:24 |
Ike Panhc |
tags |
block-proposed-bionic block-proposed-eoan block-proposed-focal ikeradar patch qemu-img verification-needed verification-needed-eoan verification-needed-focal |
block-proposed-bionic block-proposed-eoan block-proposed-focal ikeradar patch qemu-img verification-done-bionic verification-done-eoan verification-done-focal |
|
2020-06-03 06:37:09 |
Ike Panhc |
kunpeng920/ubuntu-18.04: status |
Triaged |
In Progress |
|
2020-06-03 06:37:09 |
Ike Panhc |
kunpeng920/ubuntu-18.04: assignee |
Ike Panhc (ikepanhc) |
|
|
2020-06-03 06:37:22 |
Ike Panhc |
kunpeng920: assignee |
Ike Panhc (ikepanhc) |
|
|
2020-06-03 06:37:33 |
Ike Panhc |
kunpeng920/ubuntu-18.04-hwe: status |
Triaged |
In Progress |
|
2020-06-03 06:37:33 |
Ike Panhc |
kunpeng920/ubuntu-18.04-hwe: assignee |
Ike Panhc (ikepanhc) |
|
|
2020-06-03 06:37:51 |
Ike Panhc |
kunpeng920/ubuntu-19.10: status |
Triaged |
In Progress |
|
2020-06-03 06:37:51 |
Ike Panhc |
kunpeng920/ubuntu-19.10: assignee |
Ike Panhc (ikepanhc) |
|
|
2020-06-03 06:38:02 |
Ike Panhc |
kunpeng920/ubuntu-20.04: status |
Triaged |
In Progress |
|
2020-06-03 06:38:02 |
Ike Panhc |
kunpeng920/ubuntu-20.04: assignee |
Ike Panhc (ikepanhc) |
|
|
2020-06-03 06:38:22 |
Ike Panhc |
kunpeng920/upstream-kernel: status |
Fix Committed |
Invalid |
|
2020-06-11 08:04:02 |
Andrew Cloke |
kunpeng920/ubuntu-18.04: status |
In Progress |
Fix Committed |
|
2020-06-11 08:04:17 |
Andrew Cloke |
kunpeng920/ubuntu-18.04-hwe: status |
In Progress |
Fix Committed |
|
2020-06-11 08:04:27 |
Andrew Cloke |
kunpeng920/ubuntu-19.10: status |
In Progress |
Fix Committed |
|
2020-06-11 08:04:39 |
Andrew Cloke |
kunpeng920/ubuntu-20.04: status |
In Progress |
Fix Committed |
|
2020-06-11 08:04:49 |
Andrew Cloke |
kunpeng920: status |
In Progress |
Fix Committed |
|
2020-06-17 05:17:01 |
Christian Ehrhardt |
tags |
block-proposed-bionic block-proposed-eoan block-proposed-focal ikeradar patch qemu-img verification-done-bionic verification-done-eoan verification-done-focal |
ikeradar patch qemu-img verification-done-bionic verification-done-eoan verification-done-focal |
|
2020-06-18 09:23:27 |
Łukasz Zemczak |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2020-06-18 09:23:26 |
Launchpad Janitor |
qemu (Ubuntu Focal): status |
Fix Committed |
Fix Released |
|
2020-06-18 09:38:29 |
Launchpad Janitor |
qemu (Ubuntu Eoan): status |
Fix Committed |
Fix Released |
|
2020-06-18 09:39:05 |
Launchpad Janitor |
qemu (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2020-06-18 10:27:36 |
Andrew Cloke |
kunpeng920/ubuntu-20.04: status |
Fix Committed |
Fix Released |
|
2020-06-18 10:27:57 |
Andrew Cloke |
kunpeng920/ubuntu-19.10: status |
Fix Committed |
Fix Released |
|
2020-06-18 10:28:23 |
Andrew Cloke |
kunpeng920/ubuntu-18.04-hwe: status |
Fix Committed |
Fix Released |
|
2020-06-18 10:28:33 |
Andrew Cloke |
kunpeng920/ubuntu-18.04: status |
Fix Committed |
Fix Released |
|
2020-06-18 10:28:43 |
Andrew Cloke |
kunpeng920: status |
Fix Committed |
Fix Released |
|
2020-06-30 06:53:57 |
Christian Ehrhardt |
qemu (Ubuntu Bionic): status |
Fix Released |
Triaged |
|
2020-06-30 06:55:07 |
Christian Ehrhardt |
qemu (Ubuntu Bionic): assignee |
|
Rafael David Tinoco (rafaeldtinoco) |
|
2020-07-01 07:00:44 |
Ike Panhc |
kunpeng920/ubuntu-18.04: status |
Fix Released |
Triaged |
|
2020-07-01 07:00:59 |
Ike Panhc |
kunpeng920/ubuntu-18.04-hwe: status |
Fix Released |
Triaged |
|
2020-07-01 07:02:15 |
Ike Panhc |
kunpeng920: status |
Fix Released |
Triaged |
|
2020-07-12 13:16:31 |
Rafael David Tinoco |
qemu (Ubuntu Bionic): status |
Triaged |
In Progress |
|
2020-07-13 03:59:59 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/qemu/+git/qemu/+merge/387269 |
|
2020-07-21 20:02:38 |
Rafael David Tinoco |
bug watch added |
|
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890446 |
|
2020-07-31 18:51:30 |
Rafael David Tinoco |
description |
[Impact]
* QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely.
[Test Case]
* qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs in Aarch64.
[Regression Potential]
* This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks.
* There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log.
* Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions.
* dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems.
[Other Info]
* Original Description bellow:
Command:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
SRU TEAM REVIEWER: This has already been SRUed for Focal, Eoan and Bionic. Unfortunately the Bionic SRU did not work and we had to reverse the change. Since then we had another update and now I'm retrying the SRU.
After discussing with @paelzer (and @dannf as a reviewer) extensively, Christian and I agreed that we should scope this SRU as Aarch64 only AND I was much, much more conservative in question of what is being changed in the AIO qemu code.
New code has been tested against the initial Test Case and the new one, regressed for Bionic. More information (about tests and discussion) can be found in the MR at ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-bionic-refix
BIONIC REGRESSION BUG:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419
[Impact]
* QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely.
[Test Case]
INITIAL
* qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs in Aarch64.
[Regression Potential]
* This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks.
* There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log.
* Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions.
* dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems.
BIONIC REGRESSED ISSUE
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419
[Other Info]
* Original Description bellow:
Command:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182) |
|
2020-07-31 21:41:16 |
Rafael David Tinoco |
qemu (Ubuntu Bionic): assignee |
Rafael David Tinoco (rafaeldtinoco) |
|
|
2020-08-07 09:53:40 |
Timo Aaltonen |
qemu (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2020-08-07 09:53:43 |
Timo Aaltonen |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2020-08-07 09:53:50 |
Timo Aaltonen |
tags |
ikeradar patch qemu-img verification-done-bionic verification-done-eoan verification-done-focal |
ikeradar patch qemu-img verification-done-eoan verification-done-focal verification-needed verification-needed-bionic |
|
2020-08-07 20:12:35 |
dann frazier |
tags |
ikeradar patch qemu-img verification-done-eoan verification-done-focal verification-needed verification-needed-bionic |
ikeradar patch qemu-img verification-done verification-done-bionic verification-done-eoan verification-done-focal |
|
2020-08-14 19:49:32 |
dann frazier |
kunpeng920/ubuntu-18.04-hwe: status |
Triaged |
Fix Committed |
|
2020-08-14 19:49:43 |
dann frazier |
kunpeng920/ubuntu-18.04: status |
Triaged |
Fix Committed |
|
2020-08-14 19:49:53 |
dann frazier |
kunpeng920: status |
Triaged |
Fix Committed |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
qemu (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-10756 |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-12829 |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-13253 |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-13361 |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-13362 |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-13659 |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-13754 |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-13765 |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-15863 |
|
2020-08-19 16:36:32 |
Launchpad Janitor |
cve linked |
|
2020-16092 |
|
2020-08-19 17:16:01 |
Andrew Cloke |
kunpeng920/ubuntu-18.04: status |
Fix Committed |
Fix Released |
|
2020-08-19 17:16:26 |
Andrew Cloke |
bug task deleted |
kunpeng920/ubuntu-18.04-hwe |
|
|
2020-08-19 17:16:39 |
Andrew Cloke |
kunpeng920: status |
Fix Committed |
Fix Released |
|