Alright, I'm still investigating this but wanted to share some findings... I haven't got a kernel dump yet after the task is frozen, I have analyzed only the userland part of it (although I have checked if code was running inside kernel with perf cycles:u/cycles:k at some point). The big picture is this: Whenever qemu-img hangs, we have 3 hung tasks basically with these stacks: ---- TRHREAD #1 __GI_ppoll (../sysdeps/unix/sysv/linux/ppoll.c:39) ppoll (/usr/include/aarch64-linux-gnu/bits/poll2.h:77) qemu_poll_ns (./util/qemu-timer.c:322) os_host_main_loop_wait (./util/main-loop.c:233) main_loop_wait (./util/main-loop.c:497) convert_do_copy (./qemu-img.c:1981) img_convert (./qemu-img.c:2457) main (./qemu-img.c:4976) got stack traces: ./33293/stack ./33293/stack [<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 [<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sys_poll+0x508/0x5c0 [<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_ppoll+0xc0/0x118 [<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168 [<0>] work_pending+0x8/0x10 [<0>] el0_svc_handler+0x38/0x78 [<0>] el0_svc+0x8/0xc root@d06-1:~$ perf record -F 9999 -e cycles:u -p 33293 -- sleep 10 [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 1.871 MB perf.data (48730 samples) ] root@d06-1:~$ perf report --stdio # Overhead Command Shared Object Symbol # ........ ........ .................. ...................... # 37.82% qemu-img libc-2.29.so [.] 0x00000000000df710 21.81% qemu-img [unknown] [k] 0xffff000010099504 14.23% qemu-img [unknown] [k] 0xffff000010085dc0 9.13% qemu-img [unknown] [k] 0xffff00001008fff8 6.47% qemu-img libc-2.29.so [.] 0x00000000000df708 5.69% qemu-img qemu-img [.] qemu_event_reset 2.57% qemu-img libc-2.29.so [.] 0x00000000000df678 0.63% qemu-img libc-2.29.so [.] 0x00000000000df700 0.49% qemu-img libc-2.29.so [.] __sigtimedwait 0.42% qemu-img libpthread-2.29.so [.] __libc_sigwait ---- TRHREAD #3 __GI___sigtimedwait (../sysdeps/unix/sysv/linux/sigtimedwait.c:29) __sigwait (linux/sigwait.c:28) qemu_thread_start (./util/qemu-thread-posix.c:498) start_thread (pthread_create.c:486) thread_start (linux/aarch64/clone.S:78) ./33303/stack ./33303/stack [<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 [<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sigtimedwait.isra.9+0x194/0x288 [<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_rt_sigtimedwait+0xac/0x110 [<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168 [<0>] work_pending+0x8/0x10 [<0>] el0_svc_handler+0x38/0x78 [<0>] el0_svc+0x8/0xc root@d06-1:~$ perf record -F 9999 -e cycles:u -p 33303 -- sleep 10 [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 1.905 MB perf.data (49647 samples) ] root@d06-1:~$ perf report --stdio # Overhead Command Shared Object Symbol # ........ ........ .................. ...................... # 45.37% qemu-img libc-2.29.so [.] 0x00000000000df710 23.52% qemu-img [unknown] [k] 0xffff000010099504 9.08% qemu-img [unknown] [k] 0xffff00001008fff8 8.89% qemu-img [unknown] [k] 0xffff000010085dc0 5.56% qemu-img libc-2.29.so [.] 0x00000000000df708 3.66% qemu-img libc-2.29.so [.] 0x00000000000df678 1.01% qemu-img libc-2.29.so [.] __sigtimedwait 0.80% qemu-img libc-2.29.so [.] 0x00000000000df700 0.64% qemu-img qemu-img [.] qemu_event_reset 0.55% qemu-img libc-2.29.so [.] 0x00000000000df718 0.52% qemu-img libpthread-2.29.so [.] __libc_sigwait ---- TRHREAD #2 syscall (linux/aarch64/syscall.S:38) qemu_futex_wait (./util/qemu-thread-posix.c:438) qemu_event_wait (./util/qemu-thread-posix.c:442) call_rcu_thread (./util/rcu.c:261) qemu_thread_start (./util/qemu-thread-posix.c:498) start_thread (pthread_create.c:486) thread_start (linux/aarch64/clone.S:78) ./33302/stack ./33302/stack [<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 [<0>] ptrace_stop+0x148/0x2b0 [<0>] ptrace_stop+0x148/0x2b0 [<0>] get_signal+0x5a4/0x730 [<0>] get_signal+0x5a4/0x730 [<0>] do_notify_resume+0x1c4/0x358 [<0>] do_notify_resume+0x1c4/0x358 [<0>] work_pending+0x8/0x10 [<0>] work_pending+0x8/0x10 root@d06-1:~$ perf report --stdio # Overhead Command Shared Object Symbol # ........ ........ .................. ...................... # 50.30% qemu-img libc-2.29.so [.] 0x00000000000df710 26.44% qemu-img [unknown] [k] 0xffff000010099504 5.88% qemu-img libc-2.29.so [.] 0x00000000000df708 5.26% qemu-img [unknown] [k] 0xffff000010085dc0 5.25% qemu-img [unknown] [k] 0xffff00001008fff8 4.25% qemu-img libc-2.29.so [.] 0x00000000000df678 0.93% qemu-img libc-2.29.so [.] __sigtimedwait 0.51% qemu-img libc-2.29.so [.] 0x00000000000df700 0.35% qemu-img libpthread-2.29.so [.] __libc_sigwait Their stack show those tasks are pretty much "stuck" in same userland program logic, while one of them is stuck at the same program counter address. Profiling those tasks give no much information without more debugging data and less optimizations. Although all the 0x000000dfXXX addresses seem broken as we get where libc was mapped (mid heap) and we have: (gdb) print __libc_sigwait $25 = {int (const sigset_t *, int *)} 0xffffbf128080 <__GI___sigwait> ---- Anyway, continuing.... I investigated the qemu_event_{set,reset,xxx} logic. In non Linux OSes it uses pthread primitives, but, for Linux, it uses a futex() implementation with a struct QemuEvent (rcu_call_ready_event) being the one holding values (busy, set, free, etc). I got 2 hung situations: (gdb) print (struct QemuEvent) *(0xaaaacd35fce8) $16 = { value = 4294967295, initialized = true } value = 4294967295 -> THIS IS A 32-bit 0xFFFF (casting vs overflow issue ?) AND a situation where value was either 0 or 1 (like expected). In this last situation I changed things by hand to make program to continue its execution: void qemu_event_wait(QemuEvent *ev) { unsigned value; assert(ev->initialized); value = atomic_read(&ev->value); smp_mb_acquire(); if (value != EV_SET) { if (value == EV_FREE) { if (atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) { return; } } qemu_futex_wait(ev, EV_BUSY); } } 438 in ./util/qemu-thread-posix.c 0x0000aaaaaabd4174 <+44>: mov w1, #0xffffffff // #-1 0x0000aaaaaabd4178 <+48>: ldaxr w0, [x19] 0x0000aaaaaabd417c <+52>: cmp w0, #0x1 0x0000aaaaaabd4180 <+56>: b.ne 0xaaaaaabd418c // b.any => 0x0000aaaaaabd4184 <+60>: stlxr w2, w1, [x19] 0x0000aaaaaabd4188 <+64>: cbnz w2, 0xaaaaaabd4178 0x0000aaaaaabd418c <+68>: cbz w0, 0xaaaaaabd41cc 0x0000aaaaaabd4190 <+72>: mov w6, #0x0 // #0 0x0000aaaaaabd4194 <+76>: mov x5, #0x0 // #0 0x0000aaaaaabd4198 <+80>: mov x4, #0x0 // #0 0x0000aaaaaabd419c <+84>: mov w3, #0xffffffff // #-1 0x0000aaaaaabd41a0 <+88>: mov w2, #0x0 // #0 0x0000aaaaaabd41a4 <+92>: mov x1, x19 0x0000aaaaaabd41a8 <+96>: mov x0, #0x62 // #98 0x0000aaaaaabd41ac <+100>: bl 0xaaaaaaaff380 I unblocked it by hand, setting the program counter register to outside that logic: (gdb) print qemu_event_wait+132 $15 = (void (*)(QemuEvent *)) 0xaaaaaabd41cc (gdb) print rcu_call_ready_event $16 = {value = 1, initialized = true} (gdb) set rcu_call_ready_event->value=0 (gdb) set $pc=0xaaaaaabd41cc And it got stuck again with program counter in other STLXR instruction: (gdb) thread 2 [Switching to thread 2 (Thread 0xffffbec61d90 (LWP 33302))] #0 0x0000aaaaaabd4110 in qemu_event_reset (ev=0xaaaaaac87ce8 ) at ./util/qemu-thread-posix.c:414 414 ./util/qemu-thread-posix.c: No such file or directory. (gdb) bt #0 0x0000aaaaaabd4110 in qemu_event_reset (ev=0xaaaaaac87ce8 ) at ./util/qemu-thread-posix.c:414 #1 0x0000aaaaaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:255 #2 0x0000aaaaaabd34c8 in qemu_thread_start (args=) at ./util/qemu-thread-posix.c:498 #3 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486 #4 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 (gdb) print rcu_call_ready_event $20 = {value = 1, initialized = true} (gdb) disassemble qemu_event_reset Dump of assembler code for function qemu_event_reset: 0x0000aaaaaabd40f0 <+0>: ldrb w1, [x0, #4] 0x0000aaaaaabd40f4 <+4>: cbz w1, 0xaaaaaabd411c 0x0000aaaaaabd40f8 <+8>: ldr w1, [x0] 0x0000aaaaaabd40fc <+12>: dmb ishld 0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 0x0000aaaaaabd4104 <+20>: ret 0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0] 0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1 => 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0] 0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108 0x0000aaaaaabd4118 <+40>: ret And it does not matter if I continue, CPU keeps stuck in that program counter (again in a STLXR instruction) ---- So, initially I was afraid that the lack barriers (or not so strong ones being used) could have caused a race condition that would make one thread to depend on the other thread logic. Unfortunately it looks that instruction STLXR might not be behaving appropriately for this CPU/architecture as program counter seem to be stuck in the same instruction (which is super weird, by not throwing a general exception for some microcode issue, for example). But this was just an initial overview, I still have to revisit this in order interpret results better (and recompile qemu with debugging data, and possible with other GCC version). Any comments are appreciated.