libvirt 6.8 / qemu 5.1 - --p2p --tunnelled is hanging

Bug #1904584 reported by Christian Ehrhardt 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned
qemu (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

With the most recent versions of livbirt 6.8 and qemu 5.1 migrations with --p2p --tunnelled block.

Example:
$ virsh migrate --unsafe --live --p2p --tunnelled kvmguest-hirsute-normal qemu+ssh://10.222.1.159/system

--p2p works, but --p2p --tunneled hangs, eventually ending in "terminated" or "internal error: client socket is closed"

INFO level debug logs

Source:
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fb19c002060 msg={"execute":"client_migrate_info","arguments":{"protocol":"spice","hostname":"127.0.0.1","port":5901,"tls-port":-1},"id":"libvirt-13"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fb19c002060 buf={"execute":"client_migrate_info","arguments":{"protocol":"spice","hostname":"127.0.0.1","port":5901,"tls-port":-1},"id":"libvirt-13"} len=135 ret=135 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fb19c002060 reply={"return": {}, "id": "libvirt-13"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fb19c002060 msg={"execute":"query-migrate-parameters","id":"libvirt-14"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fb19c002060 buf={"execute":"query-migrate-parameters","id":"libvirt-14"} len=58 ret=58 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fb19c002060 reply={"return": {"cpu-throttle-tailslow": false, "xbzrle-cache-size": 67108864, "cpu-throttle-initial": 20, "announce-max": 550, "decompress-threads": 2, "compress-threads": 8, "compress-level": 1, "multifd-channels": 2, "multifd-zstd-level": 1, "announce-initial": 50, "block-incremental": false, "compress-wait-thread": true, "downtime-limit": 300, "tls-authz": "", "multifd-compression": "none", "announce-rounds": 5, "announce-step": 100, "tls-creds": "", "multifd-zlib-level": 1, "max-cpu-throttle": 99, "max-postcopy-bandwidth": 0, "tls-hostname": "", "throttle-trigger-threshold": 50, "max-bandwidth": 33554432, "x-checkpoint-delay": 20000, "cpu-throttle-increment": 10}, "id": "libvirt-14"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fb19c002060 msg={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":true},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false}]},"id":"libvirt-15"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fb19c002060 buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":true},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false}]},"id":"libvirt-15"} len=444 ret=444 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fb19c002060 reply={"return": {}, "id": "libvirt-15"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fb19c002060 msg={"execute":"migrate-set-parameters","arguments":{"tls-creds":"","tls-hostname":"","max-bandwidth":9223372036853727232},"id":"libvirt-16"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fb19c002060 buf={"execute":"migrate-set-parameters","arguments":{"tls-creds":"","tls-hostname":"","max-bandwidth":9223372036853727232},"id":"libvirt-16"} len=139 ret=139 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fb19c002060 reply={"return": {}, "id": "libvirt-16"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fb19c002060 msg={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-17"} fd=35
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fb19c002060 buf={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-17"} len=72 ret=72 errno=0
qemuMonitorIOWrite:436 : QEMU_MONITOR_IO_SEND_FD: mon=0x7fb19c002060 fd=35 ret=72 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fb19c002060 reply={"return": {}, "id": "libvirt-17"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fb19c002060 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-18"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fb19c002060 buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-18"} len=112 ret=112 errno=0
qemuMonitorJSONIOProcessLine:234 : QEMU_MONITOR_RECV_EVENT: mon=0x7fb19c002060 event={"timestamp": {"seconds": 1605537716, "microseconds": 753931}, "event": "MIGRATION", "data": {"status": "setup"}}
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fb19c002060 reply={"return": {}, "id": "libvirt-18"}
qemuMonitorJSONIOProcessLine:234 : QEMU_MONITOR_RECV_EVENT: mon=0x7fb19c002060 event={"timestamp": {"seconds": 1605537716, "microseconds": 760587}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
qemuMonitorJSONIOProcessLine:234 : QEMU_MONITOR_RECV_EVENT: mon=0x7fb19c002060 event={"timestamp": {"seconds": 1605537716, "microseconds": 760691}, "event": "MIGRATION", "data": {"status": "active"}}
<stuck here>

Target:
systemd-machined[203]: New machine qemu-1-testguest3.
systemd[1]: Started Virtual Machine qemu-1-testguest3. fd/26, skipping
qemuMonitorOpenInternal:717 : QEMU_MONITOR_NEW: mon=0x7fca3c2a3010 fd=36
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"qmp_capabilities","id":"libvirt-1"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"qmp_capabilities","id":"libvirt-1"} len=49 ret=49 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": {}, "id": "libvirt-1"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"query-migrate-capabilities","id":"libvirt-2"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"query-migrate-capabilities","id":"libvirt-2"} len=59 ret=59 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": [{"state": false, "capability": "xbzrle"}, {"state": false, "capability": "rdma-pin-all"}, {"state": false, "capability": "auto-converge"}, {"state": false, "capability": "zero-blocks"}, {"state": false, "capability": "compress"}, {"state": false, "capability": "events"}, {"state": false, "capability": "postcopy-ram"}, {"state": false, "capability": "x-colo"}, {"state": false, "capability": "release-ram"}, {"state": false, "capability": "block"}, {"state": false, "capability": "return-path"}, {"state": false, "capability": "pause-before-switchover"}, {"state": false, "capability": "multifd"}, {"state": false, "capability": "dirty-bitmaps"}, {"state": false, "capability": "postcopy-blocktime"}, {"state": false, "capability": "late-block-activate"}, {"state": false, "capability": "x-ignore-shared"}, {"state": false, "capability": "validate-uuid"}], "id": "libvirt-2"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true}]},"id":"libvirt-3"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true}]},"id":"libvirt-3"} len=125 ret=125 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": {}, "id": "libvirt-3"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"query-chardev","id":"libvirt-4"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"query-chardev","id":"libvirt-4"} len=46 ret=46 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": [{"frontend-open": false, "filename": "disconnected:unix:/var/lib/libvirt/qemu/channel/target/domain-1-testguest3/org.qemu.guest_agent.0,server", "label": "charchannel0"}, {"frontend-open": true, "filename": "pty:/dev/pts/2", "label": "charserial0"}, {"frontend-open": true, "filename": "unix:/var/lib/libvirt/qemu/domain-1-testguest3/monitor.sock,server", "label": "charmonitor"}], "id": "libvirt-4"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"qom-list","arguments":{"path":"/machine/unattached/device[0]"},"id":"libvirt-5"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"qom-list","arguments":{"path":"/machine/unattached/device[0]"},"id":"libvirt-5"} len=94 ret=94 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": [{"name": "type", "type": "string"}, {"name": "parent_bus", "type": "link<bus>"}, {"name": "realized", "type": "bool"}, {"name": "hotplugged", "type": "bool"}, {"name": "hotpluggable", "type": "bool"}, {"name": "memory", "type": "link<qemu:memory-region>"}, {"name": "legacy-memory", "type": "str"}, {"name": "fill-mtrr-mask", "type": "bool"}, {"name": "host-phys-bits-limit", "type": "uint8"}, {"name": "x-force-features", "type": "bool"}, {"name": "hv-vpindex", "type": "bool"}, {"name": "kvm", "type": "bool"}, {"name": "apic-id", "type": "uint32"}, {"name": "min-xlevel", "type": "uint32"}, {"name": "enforce", "type": "bool"}, {"name": "hv-runtime", "type": "bool"}, {"name": "level", "type": "uint32"}, {"name": "hv-tlbflush", "type": "bool"}, {"name": "core-id", "type": "int32"}, {"name": "hv-crash", "type": "bool"}, {"name": "hv-reset", "type": "bool"}, {"name": "hv-no-nonarch-coresharing", "type": "OnOffAuto"}, {"name": "hv-spinlocks", "type": "uint32"}, {"name": "pmu", "type": "bool"}, {"name": "socket-id", "type": "int32"}, {"name": "hv-relaxed", "type": "bool"}, {"name": "thread-id", "type": "int32"}, {"name": "tcg-cpuid", "type": "bool"}, {"name": "die-id", "type": "int32"}, {"name": "node-id", "type": "int32"}, {"name": "vmware-cpuid-freq", "type": "bool"}, {"name": "min-xlevel2", "type": "uint32"}, {"name": "hv-stimer", "type": "bool"}, {"name": "x-hv-max-vps", "type": "int32"}, {"name": "legacy-cache", "type": "bool"}, {"name": "x-migrate-smi-count", "type": "bool"}, {"name": "hv-reenlightenment", "type": "bool"}, {"name": "hv-vendor-id", "type": "str"}, {"name": "x-hv-synic-kvm-only", "type": "bool"}, {"name": "min-level", "type": "uint32"}, {"name": "phys-bits", "type": "uint32"}, {"name": "l3-cache", "type": "bool"}, {"name": "level-func7", "type": "uint32"}, {"name": "lmce", "type": "bool"}, {"name": "hv-passthrough", "type": "bool"}, {"name": "kvm-no-smi-migration", "type": "bool"}, {"name": "xlevel2", "type": "uint32"}, {"name": "hv-stimer-direct", "type": "bool"}, {"name": "ucode-rev", "type": "uint64"}, {"name": "xlevel", "type": "uint32"}, {"name": "host-phys-bits", "type": "bool"}, {"name": "check", "type": "bool"}, {"name": "hv-time", "type": "bool"}, {"name": "full-cpuid-auto-level", "type": "bool"}, {"name": "hv-synic", "type": "bool"}, {"name": "hv-evmcs", "type": "bool"}, {"name": "hv-ipi", "type": "bool"}, {"name": "hv-vapic", "type": "bool"}, {"name": "cpuid-0xb", "type": "bool"}, {"name": "hv-frequencies", "type": "bool"}, {"name": "x-intel-pt-auto-level", "type": "bool"}, {"name": "host-cache-info", "type": "bool"}, {"name": "migratable", "type": "bool"}, {"name": "vmx-invept-single-context", "type": "bool"}, {"name": "core-capability", "type": "bool"}, {"name": "popcnt", "type": "bool"}, {"name": "kvm_mmu", "type": "bool"}, {"name": "dtes64", "type": "bool"}, {"name": "sse4_1", "type": "bool"}, {"name": "vmx-invvpid-single-addr", "type": "bool"}, {"name": "xstore", "type": "bool"}, {"name": "stepping", "type": "int"}, {"name": "vmx-vmfunc", "type": "bool"}, {"name": "pse36", "type": "bool"}, {"name": "avx512-4vnniw", "type": "bool"}, {"name": "fma4", "type": "bool"}, {"name": "avx512-vp2intersect", "type": "bool"}, {"name": "avx2", "type": "bool"}, {"name": "kvm-poll-control", "type": "bool"}, {"name": "kvm_nopiodelay", "type": "bool"}, {"name": "amd-stibp", "type": "bool"}, {"name": "family", "type": "int"}, {"name": "pclmuldq", "type": "bool"}, {"name": "vmx-store-lma", "type": "bool"}, {"name": "erms", "type": "bool"}, {"name": "vaes", "type": "bool"}, {"name": "vmx-activity-wait-sipi", "type": "bool"}, {"name": "vmx-flexpriority", "type": "bool"}, {"name": "rdrand", "type": "bool"}, {"name": "vmx-ept-advanced-exitinfo", "type": "bool"}, {"name": "vmx-pause-exit", "type": "bool"}, {"name": "vmx-shadow-vmcs", "type": "bool"}, {"name": "model", "type": "int"}, {"name": "avx512-vpopcntdq", "type": "bool"}, {"name": "tbm", "type": "bool"}, {"name": "xcrypt", "type": "bool"}, {"name": "lm", "type": "bool"}, {"name": "kvm-pv-eoi", "type": "bool"}, {"name": "vmx-monitor-exit", "type": "bool"}, {"name": "vmx-eptad", "type": "bool"}, {"name": "pae", "type": "bool"}, {"name": "ssse3", "type": "bool"}, {"name": "vmx-unrestricted-guest", "type": "bool"}, {"name": "phe", "type": "bool"}, {"name": "movdiri", "type": "bool"}, {"name": "taa-no", "type": "bool"}, {"name": "full-width-write", "type": "bool"}, {"name": "perfctr_nb", "type": "bool"}, {"name": "arat", "type": "bool"}, {"name": "avx512vbmi2", "type": "bool"}, {"name": "perfctr_core", "type": "bool"}, {"name": "kvm_asyncpf", "type": "bool"}, {"name": "x2apic", "type": "bool"}, {"name": "npt", "type": "bool"}, {"name": "avx512ifma", "type": "bool"}, {"name": "kvm_poll_control", "type": "bool"}, {"name": "model-id", "type": "string"}, {"name": "vmx-vnmi-pending", "type": "bool"}, {"name": "vmx-invpcid-exit", "type": "bool"}, {"name": "vmx-io-exit", "type": "bool"}, {"name": "pmm-en", "type": "bool"}, {"name": "tsc", "type": "bool"}, {"name": "vmx-ins-outs", "type": "bool"}, {"name": "dca", "type": "bool"}, {"name": "vmx-apicv-x2apic", "type": "bool"}, {"name": "ia64", "type": "bool"}, {"name": "perfctr-core", "type": "bool"}, {"name": "vmx-cr3-load-noexit", "type": "bool"}, {"name": "kvmclock", "type": "bool"}, {"name": "vmx-entry-noload-debugctl", "type": "bool"}, {"name": "invtsc", "type": "bool"}, {"name": "pn", "type": "bool"}, {"name": "avx512cd", "type": "bool"}, {"name": "md-clear", "type": "bool"}, {"name": "cmp-legacy", "type": "bool"}, {"name": "cx16", "type": "bool"}, {"name": "avx512dq", "type": "bool"}, {"name": "abm", "type": "bool"}, {"name": "vmx-activity-shutdown", "type": "bool"}, {"name": "kvm-pv-ipi", "type": "bool"}, {"name": "vmx-apicv-register", "type": "bool"}, {"name": "fxsr-opt", "type": "bool"}, {"name": "skip-l1dfl-vmentry", "type": "bool"}, {"name": "pcid", "type": "bool"}, {"name": "rdpid", "type": "bool"}, {"name": "wbnoinvd", "type": "bool"}, {"name": "filtered-features", "type": "X86CPUFeatureWordInfo"}, {"name": "syscall", "type": "bool"}, {"name": "pse", "type": "bool"}, {"name": "tsc_scale", "type": "bool"}, {"name": "mce", "type": "bool"}, {"name": "xsaves", "type": "bool"}, {"name": "vmx-entry-load-pat", "type": "bool"}, {"name": "ibpb", "type": "bool"}, {"name": "vmx-apicv-xapic", "type": "bool"}, {"name": "cldemote", "type": "bool"}, {"name": "rtm", "type": "bool"}, {"name": "lwp", "type": "bool"}, {"name": "kvm-steal-time", "type": "bool"}, {"name": "vmx-hlt-exit", "type": "bool"}, {"name": "vmx-entry-load-efer", "type": "bool"}, {"name": "nrip_save", "type": "bool"}, {"name": "vmx-invlpg-exit", "type": "bool"}, {"name": "vmx-exit-save-efer", "type": "bool"}, {"name": "vmx-eptp-switching", "type": "bool"}, {"name": "vme", "type": "bool"}, {"name": "kvm-pv-unhalt", "type": "bool"}, {"name": "vmx-desc-exit", "type": "bool"}, {"name": "kvm_pv_unhalt", "type": "bool"}, {"name": "svm", "type": "bool"}, {"name": "lahf-lm", "type": "bool"}, {"name": "vmx-invvpid", "type": "bool"}, {"name": "kvm-hint-dedicated", "type": "bool"}, {"name": "mca", "type": "bool"}, {"name": "mtrr", "type": "bool"}, {"name": "cid", "type": "bool"}, {"name": "vmcb_clean", "type": "bool"}, {"name": "pfthreshold", "type": "bool"}, {"name": "pmm", "type": "bool"}, {"name": "tm", "type": "bool"}, {"name": "pbe", "type": "bool"}, {"name": "vmx-entry-ia32e-mode", "type": "bool"}, {"name": "vmx-rdpmc-exit", "type": "bool"}, {"name": "split-lock-detect", "type": "bool"}, {"name": "fpu", "type": "bool"}, {"name": "skinit", "type": "bool"}, {"name": "sep", "type": "bool"}, {"name": "avx512-bf16", "type": "bool"}, {"name": "vmx-rdtscp-exit", "type": "bool"}, {"name": "nx", "type": "bool"}, {"name": "vmx-invvpid-all-context", "type": "bool"}, {"name": "ds-cpl", "type": "bool"}, {"name": "pause-filter", "type": "bool"}, {"name": "fsrm", "type": "bool"}, {"name": "pause_filter", "type": "bool"}, {"name": "vmx-vmwrite-vmexit-fields", "type": "bool"}, {"name": "vmx-ept", "type": "bool"}, {"name": "nodeid-msr", "type": "bool"}, {"name": "smap", "type": "bool"}, {"name": "vmx-io-bitmap", "type": "bool"}, {"name": "vmx-intr-exit", "type": "bool"}, {"name": "fxsr_opt", "type": "bool"}, {"name": "cr8legacy", "type": "bool"}, {"name": "cmp_legacy", "type": "bool"}, {"name": "vmx-zero-len-inject", "type": "bool"}, {"name": "virt-ssbd", "type": "bool"}, {"name": "pschange-mc-no", "type": "bool"}, {"name": "umip", "type": "bool"}, {"name": "avx512er", "type": "bool"}, {"name": "vmx-vpid", "type": "bool"}, {"name": "xstore-en", "type": "bool"}, {"name": "avx512vl", "type": "bool"}, {"name": "vmx-invept-single-context-noglobals", "type": "bool"}, {"name": "cmov", "type": "bool"}, {"name": "rsba", "type": "bool"}, {"name": "vmx-posted-intr", "type": "bool"}, {"name": "xcrypt-en", "type": "bool"}, {"name": "tm2", "type": "bool"}, {"name": "vmx-exit-clear-bndcfgs", "type": "bool"}, {"name": "vmx-ept-execonly", "type": "bool"}, {"name": "vmx-exit-load-efer", "type": "bool"}, {"name": "xsaveerptr", "type": "bool"}, {"name": "fsgsbase", "type": "bool"}, {"name": "avx512bw", "type": "bool"}, {"name": "vmx-vintr-pending", "type": "bool"}, {"name": "smx", "type": "bool"}, {"name": "vmx-secondary-ctls", "type": "bool"}, {"name": "lbrv", "type": "bool"}, {"name": "tsc-frequency", "type": "int"}, {"name": "hle", "type": "bool"}, {"name": "monitor", "type": "bool"}, {"name": "tce", "type": "bool"}, {"name": "sse4a", "type": "bool"}, {"name": "vpclmulqdq", "type": "bool"}, {"name": "tsc-scale", "type": "bool"}, {"name": "kvm-pv-tlb-flush", "type": "bool"}, {"name": "mds-no", "type": "bool"}, {"name": "vmx-cr8-load-exit", "type": "bool"}, {"name": "vmx-activity-hlt", "type": "bool"}, {"name": "i64", "type": "bool"}, {"name": "adx", "type": "bool"}, {"name": "vmx-invept-all-context", "type": "bool"}, {"name": "ffxsr", "type": "bool"}, {"name": "amd-no-ssb", "type": "bool"}, {"name": "crash-information", "type": "GuestPanicInformation"}, {"name": "vmx-entry-load-bndcfgs", "type": "bool"}, {"name": "svm_lock", "type": "bool"}, {"name": "msr", "type": "bool"}, {"name": "pclmulqdq", "type": "bool"}, {"name": "vmx-preemption-timer", "type": "bool"}, {"name": "clflush", "type": "bool"}, {"name": "ssb-no", "type": "bool"}, {"name": "mpx", "type": "bool"}, {"name": "extapic", "type": "bool"}, {"name": "vmx-vnmi", "type": "bool"}, {"name": "xop", "type": "bool"}, {"name": "vmx-rdseed-exit", "type": "bool"}, {"name": "smep", "type": "bool"}, {"name": "vmx-mwait-exit", "type": "bool"}, {"name": "vmx-rdtsc-exit", "type": "bool"}, {"name": "cx8", "type": "bool"}, {"name": "sse4-2", "type": "bool"}, {"name": "xsavec", "type": "bool"}, {"name": "pku", "type": "bool"}, {"name": "tsx-ldtrk", "type": "bool"}, {"name": "vmx-mtf", "type": "bool"}, {"name": "vmcb-clean", "type": "bool"}, {"name": "vmx-exit-load-pat", "type": "bool"}, {"name": "stibp", "type": "bool"}, {"name": "vmx-cr3-store-noexit", "type": "bool"}, {"name": "tsx-ctrl", "type": "bool"}, {"name": "svm-lock", "type": "bool"}, {"name": "sse4-1", "type": "bool"}, {"name": "lahf_lm", "type": "bool"}, {"name": "kvm_pv_eoi", "type": "bool"}, {"name": "sse", "type": "bool"}, {"name": "clzero", "type": "bool"}, {"name": "ds", "type": "bool"}, {"name": "osvw", "type": "bool"}, {"name": "kvm_steal_time", "type": "bool"}, {"name": "vmx-true-ctls", "type": "bool"}, {"name": "movdir64b", "type": "bool"}, {"name": "vmx-invept", "type": "bool"}, {"name": "acpi", "type": "bool"}, {"name": "xd", "type": "bool"}, {"name": "nodeid_msr", "type": "bool"}, {"name": "ds_cpl", "type": "bool"}, {"name": "vmx-ept-1gb", "type": "bool"}, {"name": "nrip-save", "type": "bool"}, {"name": "gfni", "type": "bool"}, {"name": "kvm-mmu", "type": "bool"}, {"name": "sse3", "type": "bool"}, {"name": "serialize", "type": "bool"}, {"name": "avx512bitalg", "type": "bool"}, {"name": "rdseed", "type": "bool"}, {"name": "sha-ni", "type": "bool"}, {"name": "ace2", "type": "bool"}, {"name": "waitpkg", "type": "bool"}, {"name": "vmx-rdrand-exit", "type": "bool"}, {"name": "f16c", "type": "bool"}, {"name": "3dnowprefetch", "type": "bool"}, {"name": "vmx-encls-exit", "type": "bool"}, {"name": "avx", "type": "bool"}, {"name": "topoext", "type": "bool"}, {"name": "sse2", "type": "bool"}, {"name": "rdctl-no", "type": "bool"}, {"name": "mmx", "type": "bool"}, {"name": "avx512-4fmaps", "type": "bool"}, {"name": "vmx-cr8-store-exit", "type": "bool"}, {"name": "avx512vnni", "type": "bool"}, {"name": "pni", "type": "bool"}, {"name": "movbe", "type": "bool"}, {"name": "mmxext", "type": "bool"}, {"name": "vmx-ple", "type": "bool"}, {"name": "vmx-apicv-vid", "type": "bool"}, {"name": "flushbyasid", "type": "bool"}, {"name": "rdtscp", "type": "bool"}, {"name": "clwb", "type": "bool"}, {"name": "kvm-asyncpf", "type": "bool"}, {"name": "kvm-pv-sched-yield", "type": "bool"}, {"name": "decodeassists", "type": "bool"}, {"name": "pat", "type": "bool"}, {"name": "invpcid", "type": "bool"}, {"name": "pdpe1gb", "type": "bool"}, {"name": "vmx-exit-save-pat", "type": "bool"}, {"name": "lapic", "type": "child<kvm-apic>"}, {"name": "tsc-adjust", "type": "bool"}, {"name": "vmx-entry-load-rtit-ctl", "type": "bool"}, {"name": "ht", "type": "bool"}, {"name": "xtpr", "type": "bool"}, {"name": "tsc_adjust", "type": "bool"}, {"name": "vmx-tsc-offset", "type": "bool"}, {"name": "ssbd", "type": "bool"}, {"name": "vmx-exit-clear-rtit-ctl", "type": "bool"}, {"name": "3dnowext", "type": "bool"}, {"name": "clflushopt", "type": "bool"}, {"name": "pdcm", "type": "bool"}, {"name": "xsave", "type": "bool"}, {"name": "unavailable-features", "type": "strList"}, {"name": "est", "type": "bool"}, {"name": "pge", "type": "bool"}, {"name": "vendor", "type": "string"}, {"name": "pcommit", "type": "bool"}, {"name": "ibs", "type": "bool"}, {"name": "tsc-deadline", "type": "bool"}, {"name": "kvmclock-stable-bit", "type": "bool"}, {"name": "misalignsse", "type": "bool"}, {"name": "vmx-msr-bitmap", "type": "bool"}, {"name": "perfctr-nb", "type": "bool"}, {"name": "feature-words", "type": "X86CPUFeatureWordInfo"}, {"name": "apic", "type": "bool"}, {"name": "avx512vbmi", "type": "bool"}, {"name": "vmx-exit-nosave-debugctl", "type": "bool"}, {"name": "avx512f", "type": "bool"}, {"name": "vmx-exit-save-preemption-timer", "type": "bool"}, {"name": "arch-capabilities", "type": "bool"}, {"name": "bmi2", "type": "bool"}, {"name": "vmx-pml", "type": "bool"}, {"name": "vmx-nmi-exit", "type": "bool"}, {"name": "intel-pt", "type": "bool"}, {"name": "wdt", "type": "bool"}, {"name": "ace2-en", "type": "bool"}, {"name": "avx512pf", "type": "bool"}, {"name": "bmi1", "type": "bool"}, {"name": "vmx-page-walk-5", "type": "bool"}, {"name": "kvm-nopiodelay", "type": "bool"}, {"name": "sse4.2", "type": "bool"}, {"name": "vmx-ept-2mb", "type": "bool"}, {"name": "vmx-exit-load-perf-global-ctrl", "type": "bool"}, {"name": "3dnow", "type": "bool"}, {"name": "vmx-wbinvd-exit", "type": "bool"}, {"name": "vmx", "type": "bool"}, {"name": "la57", "type": "bool"}, {"name": "vmx-page-walk-4", "type": "bool"}, {"name": "sse4.1", "type": "bool"}, {"name": "ibrs-all", "type": "bool"}, {"name": "xgetbv1", "type": "bool"}, {"name": "xsaveopt", "type": "bool"}, {"name": "spec-ctrl", "type": "bool"}, {"name": "vmx-entry-load-perf-global-ctrl", "type": "bool"}, {"name": "vmx-exit-ack-intr", "type": "bool"}, {"name": "vmx-xsaves", "type": "bool"}, {"name": "phe-en", "type": "bool"}, {"name": "fxsr", "type": "bool"}, {"name": "hypervisor", "type": "bool"}, {"name": "aes", "type": "bool"}, {"name": "amd-ssbd", "type": "bool"}, {"name": "sse4_2", "type": "bool"}, {"name": "de", "type": "bool"}, {"name": "ss", "type": "bool"}, {"name": "vmx-movdr-exit", "type": "bool"}, {"name": "fma", "type": "bool"}], "id": "libvirt-5"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-6"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"realized"},"id":"libvirt-6"} len=115 ret=115 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": true, "id": "libvirt-6"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotplugged"},"id":"libvirt-7"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hotplugged"},"id":"libvirt-7"} len=117 ret=117 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": false, "id": "libvirt-7"}
... more attributes
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-363"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"unavailable-features"},"id":"libvirt-363"} len=129 ret=129 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": [], "id": "libvirt-363"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"query-hotpluggable-cpus","id":"libvirt-364"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"query-hotpluggable-cpus","id":"libvirt-364"} len=58 ret=58 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": [{"props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "vcpus-count": 1, "qom-path": "/machine/unattached/device[0]", "type": "host-x86_64-cpu"}], "id": "libvirt-364"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"query-cpus-fast","id":"libvirt-365"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"query-cpus-fast","id":"libvirt-365"} len=50 ret=50 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": [{"arch": "x86", "thread-id": 3405, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "qom-path": "/machine/unattached/device[0]", "cpu-index": 0, "target": "x86_64"}], "id": "libvirt-365"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"query-iothreads","id":"libvirt-366"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"query-iothreads","id":"libvirt-366"} len=50 ret=50 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": [], "id": "libvirt-366"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"balloon","arguments":{"value":536870912},"id":"libvirt-367"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"balloon","arguments":{"value":536870912},"id":"libvirt-367"} len=74 ret=74 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": {}, "id": "libvirt-367"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"query-migrate-parameters","id":"libvirt-368"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"query-migrate-parameters","id":"libvirt-368"} len=59 ret=59 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": {"cpu-throttle-tailslow": false, "xbzrle-cache-size": 67108864, "cpu-throttle-initial": 20, "announce-max": 550, "decompress-threads": 2, "compress-threads": 8, "compress-level": 1, "multifd-channels": 2, "multifd-zstd-level": 1, "announce-initial": 50, "block-incremental": false, "compress-wait-thread": true, "downtime-limit": 300, "tls-authz": "", "multifd-compression": "none", "announce-rounds": 5, "announce-step": 100, "tls-creds": "", "multifd-zlib-level": 1, "max-cpu-throttle": 99, "max-postcopy-bandwidth": 0, "tls-hostname": "", "throttle-trigger-threshold": 50, "max-bandwidth": 33554432, "x-checkpoint-delay": 20000, "cpu-throttle-increment": 10}, "id": "libvirt-368"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":false},{"capability":"late-block-activate","state":true},{"capability":"multifd","state":false}]},"id":"libvirt-369"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":false},{"capability":"late-block-activate","state":true},{"capability":"multifd","state":false}]},"id":"libvirt-369"} len=445 ret=445 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": {}, "id": "libvirt-369"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"migrate-set-parameters","arguments":{"tls-creds":"","tls-hostname":""},"id":"libvirt-370"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"migrate-set-parameters","arguments":{"tls-creds":"","tls-hostname":""},"id":"libvirt-370"} len=104 ret=104 errno=0
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": {}, "id": "libvirt-370"}
qemuMonitorSend:944 : QEMU_MONITOR_SEND_MSG: mon=0x7fca3c2a3010 msg={"execute":"migrate-incoming","arguments":{"uri":"fd:26"},"id":"libvirt-371"} fd=-1
qemuMonitorIOWrite:431 : QEMU_MONITOR_IO_WRITE: mon=0x7fca3c2a3010 buf={"execute":"migrate-incoming","arguments":{"uri":"fd:26"},"id":"libvirt-371"} len=79 ret=79 errno=0
qemuMonitorJSONIOProcessLine:234 : QEMU_MONITOR_RECV_EVENT: mon=0x7fca3c2a3010 event={"timestamp": {"seconds": 1605537716, "microseconds": 739747}, "event": "MIGRATION", "data": {"status": "setup"}}
qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fca3c2a3010 reply={"return": {}, "id": "libvirt-371"}
qemuMonitorJSONIOProcessLine:234 : QEMU_MONITOR_RECV_EVENT: mon=0x7fca3c2a3010 event={"timestamp": {"seconds": 1605537716, "microseconds": 762755}, "event": "MIGRATION", "data": {"status": "active"}}
<stuck here>

Related branches

tags: added: libvirt-21.04
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Retest:
- Q5.0 / L6.6 -> Q5.0 / L6.6 - working
- Q5.1 / L6.9 -> Q5.1 / L6.9 - failing
- Q5.1 / L6.9 -> Q5.0 / L6.6 - working
- Q5.0 / L6.6 -> Q5.1 / L6.9 - working

So if either end is not on the most recent versions things still work.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Extending the above list changing Qemu and Libvirt individually to better identify what we need to look at:

- Q5.1 / L6.9 -> Q5.1 / L6.6 - working
- Q5.1 / L6.6 -> Q5.1 / L6.9 - working
- Q5.1 / L6.6 -> Q5.1 / L6.6 - working
- Q5.1 / L6.9 -> Q5.1 / L6.9 - (still) failing
- Q5.1 / L6.9 -> Q5.0 / L6.9 - failing
- Q5.0 / L6.9 -> Q5.1 / L6.9 - failing
- Q5.0 / L6.9 -> Q5.0 / L6.9 - failing

Note I: I kept the libvirtd.conf, apparmor rules and all that on the level of libvirt 6.9 all the time. Mostly just the binaries change.

Note II: when changing qemu on a system I ensured that the guest was destroyed&started to run with the new version.

Overall it depends on just libvirt, but not qemu.
And it is enough to not have the new version on either end of the migration to avoid the issue.

By the above list the "minimal" change we can look at to make a good/bad case comparison will be:
 Q5.1 / L6.9 -> Q5.1 / L6.6 (good) vs Q5.1 / L6.9 -> Q5.1 / L6.9 (bad)
I'll get debug logs of both peers in those two cases to compare.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Bad case:
1. trigger migration
  <hangs>
2. sleep 2m
  <to see what repeats / polls>
3. fetch the logs
^^ this shall ensure we don't have the error path when I abort the migration in the log

# reset the logfile via (so both have a log from the start of the service into a migration)
$ systemctl stop libvirtd
$ rm /var/log/libvirtd.log
$ systemctl start libvirtd

Good case
1. trigger migration
2. migration completed
3. fetch the logs

All logs are then stripped of the leading time and PID index for better comparison.
:%s/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9]+0000: [0-9]*: //gc

Then everything of the daemon init before the RPC call that led to "virDomainMigratePerform3Params" is removed (right now not interested in the startup, but it is in the attached files). In the same way everything before "remoteDispatchDomainMigratePrepareTunnel3Params" is cut out on the target.

Finally for further improved diffing (we can always check details in the raw logs) line numbers, threads and adresses are removed.
:%s/\v^(debug|info) : ([a-zA-Z0-9]*):[0-9]* : /\2 /gc
:%s/Thread [0-9]* /Thread <removed>/gc
:%s/=0x[a-f0-9]*/0xAddrRemoved/gc
:%s/:0x[a-f0-9]*/:0xAddrRemoved/gc
:%s/fd:[0-9]*/fd:FDRemoved/gc
:%s/fd=[0-9]*/fd=FDRemoved/gc

That is then compared.
Target:
- The incoming "destination_xml" on virDomainMigratePrepareTunnel3Params is the same
- the qemu spawn command on the target is the same
- migration communication with qemu initially goes the same query-migrate-capabilities -> migrate-set-capabilities ... (overall ~2k lines of log mostly qom-* commands)
- both successfully reach "qemuProcessHandleMigrationStatus Migration of domain ... h-migr-test changed state to active"
- then a long sequence of virStream activity starts with slightly alternating order but otherwise no diff
- the bad case stays in there, the good case has more of them and finally completes into "virStreamFinish -> virDomainMigrateFinish3Params".

Seems it would hang on communication, but why/where ?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I pinged upstream about the case at
https://www.redhat.com/archives/libvir-list/2020-November/msg01399.html

I'll try how easy (or not) it would be to build 6.7 and 6.8 with otherwise mostly the same.
That could help to spot where the change happened.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Prep Bisecting - info in case anyone needs to repro or take over

# Build
$ rm -rf build install
$ mkdir build install
$ V=$(git describe)
$ cd build
$ LC_ALL=C.UTF-8 meson .. --wrap-mode=nodownload --buildtype=plain --prefix=/usr --sysconfdir=/etc --localstatedir=/var --libdir=lib/x86_64-linux-gnu --libexecdir=/usr/lib/libvirt -Drunstatedir=/run -Dpackager="Debug" -Dpackager_version="bisect6.8-6.9" -Dapparmor=enabled -Dsecdriver_apparmor=enabled -Dapparmor_profiles=true -Ddriver_qemu=enabled -Dqemu_user=libvirt-qemu -Dqemu_group=kvm -Dqemu_moddir=/usr/lib/x86_64-linux-gnu/qemu -Ddocs=disabled -Dtests=disabled -Daudit=disabled -Ddriver_libxl=disabled -Ddriver_lxc=disabled -Ddriver_openvz=disabled -Ddriver_vbox=disabled -Dfirewalld=disabled -Dlibssh2=disabled -Dnetcf=disabled -Dnumactl=disabled -Dnumad=disabled -Dpolkit=disabled -Dsanlock=disabled -Dstorage_disk=disabled -Dstorage_gluster=disabled -Dstorage_iscsi=disabled -Dstorage_lvm=disabled -Dstorage_rbd=disabled -Dstorage_sheepdog=disabled -Dstorage_zfs=disabled -Ddriver_esx=disabled -Dwireshark_dissector=disabled -Ddtrace=disabled -Dglusterfs=disabled -Dfuse=disabled
$ meson compile
$ DESTDIR=../install meson install
$ cd ../install
$ find usr/sbin usr/bin usr/lib/x86_64-linux-gnu usr/lib/libvirt | tar czf ~/git-libvirt-${V}.tgz -T -

# example Transport
$ f=git-libvirt-v6.9.0.tgz; lxc file pull h-build-libvirt-bisect/$root/f .; lxc file push $f testkvm-hirsute-from/root/; lxc file push $f testkvm-hirsute-to/root/;

# example "Install"
$ tar xzf ~/git-libvirt-v6.9.0.tgz --directory=/
$ systemctl restart libvirtd
$ systemctl status libvirtd | grep version
Nov 25 10:41:45 testkvm-hirsute-to libvirtd[32067]: 2020-11-25 10:41:45.913+0000: 32067: info : libvirt version: 6.9.0, package: bisect6.8-6.9 (Debug)

Note: For better integration this uses an installed libvirt package and services/config of it as it is in 6.9-1ubuntu1 in Hirsute. We just exchange the binaries/libs, ugly but working.

With the above I have 6.8 from git working and 6.9 from git failing - just as I have with the Ubuntu package builds.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

bad case: running into a hang - takes at least 2 minutes
good case: migrates without error in <20s; I'm migrating forward and back between two systems to be sure (in case anything is flaky) and ready (start the next test with the same conditions).

git bisect start
git bisect good v6.8.0
git bisect bad v6.9.0
c1cfbaab25 - good
4ced77a309 - good
862f7e5c73 - good
b866adf8d9 - good
ab6439b960 - good
b87cfc957f - bad
ae23a87d85 - bad
7d959c302d - bad
ea7af657f1 - good

Identified:

commit 7d959c302d10e97390b171334b885887de427a32
Author: Andrea Bolognani <email address hidden>
Date: Tue Oct 27 00:15:33 2020 +0100

    rpc: Fix virt-ssh-helper detection

    When trying to figure out whether virt-ssh-helper is available
    on the remote host, we mistakenly look for the helper by the
    name it had while the feature was being worked on instead of
    the one that was ultimately picked, and thus end up using the
    netcat fallback every single time.

    Fixes: f8ec7c842df9e40c6607eae9b0223766cb226336
    Signed-off-by: Andrea Bolognani <email address hidden>
    Reviewed-by: Neal Gompa <email address hidden>
    Reviewed-by: Daniel P. Berrangé <email address hidden>

 src/rpc/virnetclient.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Ok that makes a bit of sense, first we had in 6.8
  f8ec7c84 rpc: use new virt-ssh-helper binary for remote tunnelling
That makes it related to tunneling which is our broken use-case.

The identified commit "7d959c30 rpc: Fix virt-ssh-helper detection" then might finally enable the new helper and that is the broken one?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With that knowledge I also was able to confirm that it really is the native mode

$ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test qemu+ssh://testkvm-hirsute-to/system?proxy=netcat
<works>
$ virsh migrate --unsafe --live --p2p --tunnelled h-migr-test qemu+ssh://testkvm-hirsute-to/system?proxy=native<hangs>

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

ps output of the hanging receiving virt-ssh-helper

Source:
4     0   41305       1  20   0 1627796 23360 poll_s Ssl ?          0:05 /usr/sbin/libvirtd
0     0   41523   41305  20   0   9272  4984 poll_s S    ?          0:02  \_ ssh -T -e none -- testkvm-hirsute-to sh -c 'virt-ssh-helper 'qemu:///system''

Target
4     0     213       1  20   0  13276  4132 poll_s Ss   ?          0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 250-500 startups
4     0   35148     213  20   0  19048 11320 poll_s Ss   ?          0:02  \_ sshd: root@notty
4     0   35206   35148  20   0   2584   544 do_wai Ss   ?          0:00      \_ sh -c virt-ssh-helper qemu:///system
0     0   35207   35206  20   0  81348 26684 -      R    ?          0:34          \_ virt-ssh-helper qemu:///system

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In comparison NC mode:

Source:
4 0 41305 1 20 0 1627796 23456 poll_s Ssl ? 0:05 /usr/sbin/libvirtd
0 0 41545 41305 20 0 9064 4440 poll_s S ? 0:00 \_ ssh -T -e none -- testkvm-hirsute-to sh -c 'if 'nc' -q 2>&1 | grep "requires an argument" >/dev/null 2>&1; then AR

Target:
4 0 213 1 20 0 13276 4132 poll_s Ss ? 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 250-500 startups
4 0 35259 213 20 0 17848 10120 - Rs ? 0:00 \_ sshd: root@notty
4 0 35316 35259 20 0 2584 588 do_wai Ss ? 0:00 \_ sh -c if nc -q 2>&1 | grep "requires an argument" >/dev/null 2>&1; then ARG=-q0;else ARG=;fi;nc $ARG -U /var/r
0 0 35319 35316 20 0 3088 748 - S ? 0:00 \_ nc -q0 -U /var/run/libvirt/libvirt-sock

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I got suspicious that the target has virt-ssh-helper always in run state.
With something that transports network data I'd expect more sleepy/poll states sometimes (as the NC case shows).

If I only start it manually it is in
0 0 35497 35490 20 0 69888 9924 poll_s S+ pts/0 0:00 \_ virt-ssh-helper qemu:///system

So does it really never return and busy loops on something - or is that a red herring?

Per strace it loops on this:

     0.000071 read(4, "\4\0\0\0\0\0\0\0", 16) = 8 <0.000011>
     0.000238 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000012>
     0.000040 read(0, "ba9821e493159a306fce6e8ff4adc1bd"..., 1024) = 1024 <0.000015>
     0.000043 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000010>
     0.000034 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000010>
     0.000035 write(3, "\nSHA512: 1d4f9ad2f209cca2d1d4cfd"..., 2048) = 2048 <0.000021>
     0.000057 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000012>
     0.000467 poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 3, 0) = 2 ([{fd=0, revents=POLLIN}, {fd=4, revents=POLLIN}]) <0.000016>
     0.000069 read(4, "\4\0\0\0\0\0\0\0", 16) = 8 <0.000011>
     0.000292 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000013>
     0.000049 read(0, "a26f12d123bdfc74ccc3\nHomepage: h"..., 1024) = 1024 <0.000012>
     0.000046 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000014>
     0.000051 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000010>
     0.000038 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000014>
     0.000518 poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN|POLLOUT}, {fd=4, events=POLLIN}], 3, 0) = 3 ([{fd=0, revents=POLLIN}, {fd=3, revents=POLLOUT}, {fd=4, revents=POLLIN}]) <0.000016>

Initially that had some readable content like:
     0.000048 read(0, " <pae/>\n </features>\n <cpu "..., 1024) = 1024 <0.000016>
But later on it does seem to have binary data like:
     0.000061 read(0, "i@\0\330\6\247\1\2|2\0\1\0\0\0\252\231\21\0\0\0\0\0\253\231\21\0\324\33\25\0\370"..., 1024) = 1024 <0.000015>

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (8.9 KiB)

Backtraces also look like iterating events.
virEventRunDefaultImpl is an event loop and an infinite loop by design. I'll report my new info upstream - maybe the Authors have a good hint now that we know that much.

A few Backtraces:

Program received signal SIGINT, Interrupt.
g_source_ref (source=0x561b88a36a10) at ../../../glib/gmain.c:2182
2182 in ../../../glib/gmain.c
#0 g_source_ref (source=0x561b88a36a10) at ../../../glib/gmain.c:2182
#1 0x00007feae7437568 in g_source_iter_next (iter=iter@entry=0x7ffee3d55ba0, source=source@entry=0x7ffee3d55b98) at ../../../glib/gmain.c:1060
#2 0x00007feae7438b91 in g_main_context_prepare (context=context@entry=0x561b875ed1b0, priority=priority@entry=0x7ffee3d55c20) at ../../../glib/gmain.c:3619
#3 0x00007feae743968b in g_main_context_iterate (context=context@entry=0x561b875ed1b0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4099
#4 0x00007feae7439893 in g_main_context_iteration (context=0x561b875ed1b0, context@entry=0x0, may_block=may_block@entry=1) at ../../../glib/gmain.c:4184
#5 0x00007feae75e9694 in virEventGLibRunOnce () at ../../src/util/vireventglib.c:533
#6 0x00007feae7654449 in virEventRunDefaultImpl () at ../../src/util/virevent.c:344
#7 0x0000561b8597cdfa in virRemoteSSHHelperRun (sock=<optimized out>) at ../../src/remote/remote_ssh_helper.c:316
#8 main (argc=<optimized out>, argv=<optimized out>) at ../../src/remote/remote_ssh_helper.c:418
^C
Program received signal SIGINT, Interrupt.
g_source_iter_next (iter=iter@entry=0x7ffee3d55ba0, source=source@entry=0x7ffee3d55b98) at ../../../glib/gmain.c:1035
1035 in ../../../glib/gmain.c
#0 g_source_iter_next (iter=iter@entry=0x7ffee3d55ba0, source=source@entry=0x7ffee3d55b98) at ../../../glib/gmain.c:1035
#1 0x00007feae743908b in g_main_context_check (context=context@entry=0x561b875ed1b0, max_priority=0, fds=fds@entry=0x561b875ed2d0, n_fds=-472556648, n_fds@entry=3) at ../../../glib/gmain.c:3919
#2 0x00007feae7439705 in g_main_context_iterate (context=context@entry=0x561b875ed1b0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4116
#3 0x00007feae7439893 in g_main_context_iteration (context=0x561b875ed1b0, context@entry=0x0, may_block=may_block@entry=1) at ../../../glib/gmain.c:4184
#4 0x00007feae75e9694 in virEventGLibRunOnce () at ../../src/util/vireventglib.c:533
#5 0x00007feae7654449 in virEventRunDefaultImpl () at ../../src/util/virevent.c:344
#6 0x0000561b8597cdfa in virRemoteSSHHelperRun (sock=<optimized out>) at ../../src/remote/remote_ssh_helper.c:316
#7 main (argc=<optimized out>, argv=<optimized out>) at ../../src/remote/remote_ssh_helper.c:418
^C
Program received signal SIGINT, Interrupt.
g_source_ref (source=0x561b87de7ac0) at ../../../glib/gmain.c:2182
2182 in ../../../glib/gmain.c
#0 g_source_ref (source=0x561b87de7ac0) at ../../../glib/gmain.c:2182
#1 0x00007feae7437568 in g_source_iter_next (iter=iter@entry=0x7ffee3d55ba0, source=source@entry=0x7ffee3d55b98) at ../../../glib/gmain.c:1060
#2 0x00007feae743908b in g_main_context_check (context=context@entry=0x561b875ed1b0, max_priority=0, fds=fds@entry=0x561b875ed2d0,...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - discussion with upstream ongoing, see the link above

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The upstream discussion helped to eliminate migration as well as networking from the equation.
It seems the new virt-ssh-helper mode is just rather slow and thereby stalling/hanging.

Here are logs of:
virsh -c qemu+ssh://127.0.0.1/system?proxy=netcat vol-download --pool uvtool h-migr-test.qcow testfile
=> ~150-220MB/s

vs

virsh -c qemu+ssh://127.0.0.1/system?proxy=native vol-download --pool uvtool h-migr-test.qcow testfile
=> 5 MB/s degrading to ~200 KB/s and less

Attaching logs Taken with these configs:
log_filters="1:qemu 1:libvirt 3:object 3:json 3:event 1:util"
log_outputs="1:file:/var/log/libvirtd.log"

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

https://www.redhat.com/archives/libvir-list/2020-November/msg01472.html has fixes.
I've done some extensive testing and the work fine for me.

Let us wait the few days until it lands in 6.10 and then either upgrade to 6.10 (if trivial) or apply the two patches for now (I've done that already - no patch noise and working).

Changed in libvirt (Ubuntu):
status: New → In Progress
Changed in qemu (Ubuntu):
status: New → Invalid
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Changes are upstream as:
829142699ecf1b51e677c6719fce29282af62c92
6d69afe4517646811ee96981408bc6fc18b5ffbb

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Uploaded the fix after some more testing, this should fix the remaining known issues we had with L6.9/Q5.1 in 21.04

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 6.9.0-1ubuntu2

---------------
libvirt (6.9.0-1ubuntu2) hirsute; urgency=medium

  * virt-ssh-helper: fix slow migrations and volume transfers (LP: #1904584)
    - d/p/ubuntu/lp-1904584-remote-make-ssh-helper-massively-faster.patch
    - d/p/ubuntu/lp-1904584-util-avoid-glib-event-loop-workaround.patch

 -- Christian Ehrhardt <email address hidden> Thu, 26 Nov 2020 16:52:23 +0100

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.