Stalled IO Operations During MySQL Tests (with sysbench)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
qemu (Ubuntu) |
Fix Released
|
Undecided
|
Sergio Durigan Junior | ||
Jammy |
Fix Released
|
Undecided
|
Sergio Durigan Junior | ||
Kinetic |
Fix Released
|
Undecided
|
Sergio Durigan Junior |
Bug Description
[ Impact ]
* I/O stall from the guests POV, details:
* internal to qemus I/O infrastructure is a io-plug counting which was off balance. Due to that I/O stalls could happen as new submitted calls could be skipped while being plugged
* Upstream identified and fixed the issue and this is backporting the fix for it
[ Test Plan ]
This is a somewhat tricky issue to reproduce, so we will mostly be relying on the reporter's feedback in order to perform the final SRU verification. Below you can find the instructions (provided by the reporter) to try and setup a testing environment for this bug. Keep in mind that the problem doesn't always manifest, so it might be necessary to try a bunch of times.
You will need access to an NVMe storage device as well.
- Using a Jammy host with an NVMe storage device, install qemu/libvirt and make sure the setup is properly configured to create VMs. The following guide might be helpful: https:/
- Create an Ubuntu Jammy LTS VM. Make sure that the host NVMe device can be accessed by the VM.
- Run:
# apt-get install mysql-server mysql-common sysbench apparmor-utils
# systemctl disable --now mysql.service
# aa-complain /usr/sbin/mysqld
# reboot
Assuming that your NVMe device is mapped to /dev/vdb inside the VM:
# mkdir -p /data
# mkfs.ext4 /dev/vdb
# mount /dev/vdb /data
# mkdir /data/mysql
# mkdir /var/run/mysqld
# /usr/sbin/mysqld --no-defaults --datadir=
# /usr/sbin/mysqld --no-defaults --datadir=
# echo 'status' | mysql -uroot # verify that MySQL server is up
# echo 'drop database test1m' | mysql -uroot
# echo 'create database test1m' | mysql -uroot
# /usr/share/
# /usr/share/
According to the reporter's feedback, when the bug manifests you will see something like the following:
...
[ 620s ] thds: 6 tps: 327.00 qps: 18348.00 (r/w/o: 4578.00/
[ 621s ] thds: 6 tps: 320.00 qps: 17930.85 (r/w/o: 4479.96/
[ 622s ] thds: 6 tps: 317.00 qps: 17670.96 (r/w/o: 4432.99/
[ 623s ] thds: 6 tps: 299.83 qps: 16896.41 (r/w/o: 4202.61/
[ 624s ] thds: 6 tps: 0.00 qps: 6.00 (r/w/o: 0.00/6.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 625s ] thds: 6 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 626s ] thds: 6 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 627s ] thds: 6 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
...
which indicates that there's no I/O happening in the NVMe device at all.
[ Where problems could occur ]
Changes are isolated, small and only touch one subsystem. So we can reasonable assume that regressions - if any - would happen to the I/O subsystem. That is quite a small subset of the many things qemu does and should thereby help to spot regression-updates. So watch out for any odd disk I/O behavior with that SRU.
[ Original Description ]
---Problem Description---
In a virtual machine, during MySQL performance tests with sysbench, IO operations freeze, and the virtual disk does not respond. The data of MySQL is on a virtual drive, backed by a host's local NVMe, attached to VM as a raw virtio-block device. The test runs smoothly for a few minutes. After a while, the IO operations freeze, and any attempt to read or write to the virtual drive remains to wait. Also, after the problem occurs, every read operation to the affected drive (e.g. ls, cat, etc.) stays waiting forever.
---Host Hardware---
CPU: AMD EPYC 7302P 16-Core Processor (32 threads)
RAM: 128 GB
OS Drive: Toshiba KXG60ZNV256G M.2 NVMe PCI-E SSD (256 GB)
Data Drive: Samsung PM983 MZQLB960HAJR-00007 U.2 (960 GB)
---Host Software---
OS: Ubuntu 22.04 LTS
Kernel: 5.15.0-27-generic
Qemu: 1:6.2+dfsg-2ubuntu6
Libvirt: 8.0.0-1ubuntu7
---VM Hardware---
vCPU: <vcpu placement=
CPU Mode: <cpu mode='host-
RAM: 64 GB
OS Type: <type arch='x86_64' machine=
OS Drive (64 GB):
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none' io='native' discard='unmap'/>
<target dev='vda' bus='virtio'/>
Block Data Drive:
<disk type="block" device="disk">
<driver name="qemu" type="raw" cache="none" io="native" discard="unmap"/>
<target dev="vdb" bus="virtio"/>
---VM Software & Configuration---
OS: Ubuntu 22.04 LTS (minimized)
Kernel: 5.15.0-27-generic
Swap: disabled
OS Drive: /dev/vda2; file-system: ext4; mount-options: defaults; mount-point: /
Data Drive: /dev/vdb
MySQL: 8.0.28-0ubuntu4
Sysbench: 1.0.20+ds-2
---Prepare the VM---
1. Install Ubuntu 22.04 LTS (minimized) as VM OS
2. Boot the VM & log-in as root
3. apt-get install mysql-server mysql-common sysbench apparmor-utils
4. systemctl disable --now mysql.service
5. aa-complain /usr/sbin/mysqld
6. systemctl restart apparmor
---Reproduction---
1. Reboot the VM & log-in as root
2. mkdir -p /data
3. mkfs.ext4 /dev/vdb
4. mount /dev/vdb /data
5. mkdir /data/mysql
6. mkdir /var/run/mysqld
7. /usr/sbin/mysqld --no-defaults --datadir=
8. /usr/sbin/mysqld --no-defaults --datadir=
9. echo 'status' | mysql -uroot # verify that MySQL server is up
10. echo 'drop database test1m' | mysql -uroot
11. echo 'create database test1m' | mysql -uroot
12. /usr/share/
13. /usr/share/
---Resulting Output---
...
[ 620s ] thds: 6 tps: 327.00 qps: 18348.00 (r/w/o: 4578.00/
[ 621s ] thds: 6 tps: 320.00 qps: 17930.85 (r/w/o: 4479.96/
[ 622s ] thds: 6 tps: 317.00 qps: 17670.96 (r/w/o: 4432.99/
[ 623s ] thds: 6 tps: 299.83 qps: 16896.41 (r/w/o: 4202.61/
[ 624s ] thds: 6 tps: 0.00 qps: 6.00 (r/w/o: 0.00/6.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 625s ] thds: 6 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 626s ] thds: 6 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 627s ] thds: 6 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
...
---Expecting to happen---
To not have lines with "tps: 0.00 qps: 0.00", like the last four in the example.
---Additional Notes---
1. This is not happening on every run, so it is possible for some test iterations to complete successfully.
2. The same happens with a larger number of sysbench threads (e.g. 8, 16, 24, 32) too.
3. The problem does not occur if the io policy of the data drive is changed from io="native" to io="io_uring" (at least for 7 hours of continuous testing).
4. While IO operations in the VM are frozen, the NVMe device responds to requests from the host. (e.g. dd if=/dev/nvme1n1 of=/dev/null bs=512 count=1 iflag=direct).
Please find attached the libvirt XML configuration of the example VM.
Best regards,
Nikolay Tenev
Related branches
- git-ubuntu bot: Approve
- Christian Ehrhardt (community): Approve
- Canonical Server Reporter: Pending requested
- Canonical Server Reporter: Pending requested
- Canonical Server Reporter: Pending requested
- Canonical Server packageset reviewers: Pending requested
- Canonical Server: Pending requested
-
Diff: 118 lines (+90/-0)4 files modifieddebian/changelog (+7/-0)
debian/patches/series (+2/-0)
debian/patches/ubuntu/lp1970737-linux-aio-explain-why-max-batch-is-checked-in-laio_i.patch (+37/-0)
debian/patches/ubuntu/lp1970737-linux-aio-fix-unbalanced-plugged-counter-in-laio_io_.patch (+44/-0)
CVE References
Changed in ubuntu: | |
status: | New → Incomplete |
affects: | ubuntu → mysql-8.0 (Ubuntu) |
Changed in mysql-8.0 (Ubuntu): | |
status: | Incomplete → New |
description: | updated |
description: | updated |
description: | updated |
tags: | removed: server-triage-discuss |
Changed in qemu (Ubuntu Jammy): | |
status: | New → Confirmed |
Changed in qemu (Ubuntu Kinetic): | |
status: | New → Confirmed |
description: | updated |
Changed in qemu (Ubuntu Jammy): | |
assignee: | nobody → Sergio Durigan Junior (sergiodj) |
Changed in qemu (Ubuntu Kinetic): | |
assignee: | nobody → Sergio Durigan Junior (sergiodj) |
tags: | added: server-todo |
Changed in qemu (Ubuntu Jammy): | |
status: | Confirmed → In Progress |
Changed in qemu (Ubuntu Kinetic): | |
status: | Confirmed → In Progress |
Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https:/ /wiki.ubuntu. com/Bugs/ FindRightPackag e. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.
To change the source package that this bug is filed about visit https:/ /bugs.launchpad .net/ubuntu/ +bug/1970737/ +editstatus and add the package name in the text box next to the word Package.
[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]