xen-netfront: potential deadlock in xennet_remove()
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-aws (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Incomplete
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Unassigned | ||
linux-aws-5.3 (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
High
|
Unassigned | ||
Focal |
Invalid
|
Undecided
|
Unassigned |
Bug Description
[Impact]
During our AWS testing we were experiencing deadlocks on hibernate across all Xen instance types.
The trace was showing that the system was stuck in xennet_remove():
[ 358.109087] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
[ 358.115102] modprobe D 0 4892 4833 0x00004004
[ 358.115104] Call Trace:
[ 358.115112] __schedule+
[ 358.115115] schedule+0x33/0xa0
[ 358.115118] xennet_
[ 358.115121] ? wait_woken+
[ 358.115124] xenbus_
[ 358.115126] device_
[ 358.115127] driver_
[ 358.115129] bus_remove_
[ 358.115131] driver_
[ 358.115132] xenbus_
[ 358.115134] netif_exit+
[ 358.115137] __x64_sys_
[ 358.115140] do_syscall_
[ 358.115142] entry_SYSCALL_
This prevented hibernation to complete.
The reason of this problem is a race condition in xennet_remove(): the system is reading the current state of the bus, it's requesting to change the state to "Closing", and it's waiting for the state to be changed to "Closing". However, if the state becomes "Closed" between reading the state and requesting the state change, we are stuck forever, because the state will never change from "Closed" back to "Closing".
[Test case]
Create any Xen-based instance in AWS, hibernate/resume multiple times. Some times the system gets stuck (hung task timeout).
[Fix]
Prevent the deadlock by changing the wait condition to check also for state == Closed.
[Regression potential]
Minimal, this change affects only Xen, more exactly only the xen-netfront driver.
CVE References
no longer affects: | linux-aws (Ubuntu Eoan) |
no longer affects: | linux-aws-5.3 (Ubuntu Eoan) |
Changed in linux-aws-5.3 (Ubuntu Focal): | |
status: | New → Invalid |
Changed in linux-aws-5.3 (Ubuntu): | |
status: | New → Invalid |
Changed in linux-aws (Ubuntu): | |
status: | New → Triaged |
status: | Triaged → Invalid |
Changed in linux-aws (Ubuntu Bionic): | |
status: | New → Incomplete |
Changed in linux-aws (Ubuntu Focal): | |
status: | New → Fix Committed |
importance: | Undecided → High |
Changed in linux-aws-5.3 (Ubuntu Bionic): | |
status: | New → Fix Committed |
importance: | Undecided → High |
Changed in linux-aws (Ubuntu): | |
status: | Invalid → Fix Released |
This bug was fixed in the package linux-aws-5.3 - 5.3.0-1032. 34~18.04. 2
--------------- 1032.34~ 18.04.2) bionic; urgency=medium
linux-aws-5.3 (5.3.0-
* bionic/ linux-aws- 5.3: 5.3.0-1032. 34~18.04. 2 -proposed tracker (LP: #1888815)
* xen-netfront: potential deadlock in xennet_remove() (LP: #1888510)
- SAUCE: xen-netfront: fix potential deadlock in xennet_remove()
linux-aws-5.3 (5.3.0- 1032.34~ 18.04.1) bionic; urgency=medium
* bionic/ linux-aws- 5.3: 5.3.0-1032. 34~18.04. 1 -proposed tracker (LP: #1887074)
[ Ubuntu: 5.3.0-1032.34 ]
* eoan/linux-aws: 5.3.0-1032.34 -proposed tracker (LP: #1887075)
* eoan/linux: 5.3.0-64.58 -proposed tracker (LP: #1887088)
* linux 4.15.0-109-generic network DoS regression vs -108 (LP: #1886668)
- SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2 cgroups"
linux-aws-5.3 (5.3.0- 1031.33~ 18.04.1) bionic; urgency=medium
* bionic/ linux-aws- 5.3: 5.3.0-1031. 33~18.04. 1 -proposed tracker (LP: #1885480)
[ Ubuntu: 5.3.0-1031.33 ]
* eoan/linux-aws: 5.3.0-1031.33 -proposed tracker (LP: #1885481) net/ibmvnic: Update VNIC protocol version reporting add_hotplug_ profile( )
* eoan/linux: 5.3.0-63.57 -proposed tracker (LP: #1885495)
* seccomp_bpf fails on powerpc (LP: #1885757)
- SAUCE: selftests/seccomp: fix ptrace tests on powerpc
* The thread level parallelism would be a bottleneck when searching for the
shared pmd by using hugetlbfs (LP: #1882039)
- hugetlbfs: take read_lock on i_mmap for PMD sharing
* Eoan update: upstream stable patchset 2020-06-30 (LP: #1885775)
- ipv6: fix IPV6_ADDRFORM operation logic
- net_failover: fixed rollback in net_failover_open()
- bridge: Avoid infinite loop when suppressing NS messages with invalid
options
- vxlan: Avoid infinite loop when suppressing NS messages with invalid options
- tun: correct header offsets in napi frags mode
- Input: mms114 - fix handling of mms345l
- ARM: 8977/1: ptrace: Fix mask for thumb breakpoint hook
- sched/fair: Don't NUMA balance for kthreads
- Input: synaptics - add a second working PNP_ID for Lenovo T470s
- drivers/
- powerpc/xive: Clear the page tables for the ESB IO mapping
- ath9k_htc: Silence undersized packet warnings
- RDMA/uverbs: Make the event_queue fds return POLLERR when disassociated
- x86/cpu/amd: Make erratum #1054 a legacy erratum
- perf probe: Accept the instance number of kretprobe event
- mm: add kvfree_sensitive() for freeing sensitive data objects
- aio: fix async fsync creds
- x86_64: Fix jiffies ODR violation
- x86/PCI: Mark Intel C620 MROMs as having non-compliant BARs
- x86/speculation: Prevent rogue cross-process SSBD shutdown
- x86/reboot/quirks: Add MacBook6,1 reboot quirk
- efi/efivars: Add missing kobject_put() in sysfs entry creation error path
- ALSA: es1688: Add the missed snd_card_free()
- ALSA: hda/realtek - add a pintbl quirk for several Lenovo machines
- ALSA: usb-audio: Fix inconsistent card PM state after resume
- ALSA: usb-audio: Add vendor, product and profile name for HP Thunderbolt
Dock
- ACPI: sysfs: Fix reference count leak in acpi_sysfs_
-...