Disk IO very slow on kernel 4.15.0-142-generic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned | ||
Bionic |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
After upgrading Ubuntu 18.04 kernel from 4.15.0-136-generic to kernel 4.15.0-142-generic disk IOs (writes) are suffering from terrible delay.
Time required to write 1GB of data to the dist is ~12 minutes.
1GB in 12 minutes:
strace -ttvvff -o mel.strace dd bs=1024k count=1024 if=/dev/zero of=/data/
Before the upgrade, it took seconds to complete.
The only change which was done on the system was security upgrade done with unattended-upgrade utility and main repositories for bionic and bionic-security.
This is physical server LENOVO System x3650 M5, with Symbios Logic MegaRAID SAS-3 3108 disk controller.
The only difference which I see is that there are more modules loaded by the latest kernel.
I would expect that the performance would be the same. There were no changes to the physical server and the disks are reported as online and healthy. No only dd is affected but also all disk-based operation by barman (we are using postgres steaming backup along with rsync/ssh archiving).
I think this could be a kernel bug. I would be happy to provide other information which would help in fixing this problem.
=======
Details:
lsb_release -rd
Description: Ubuntu 18.04.1 LTS
Release: 18.04
lsmod
Module Size Used by
ip6_tables 28672 0
iptable_filter 16384 0
ip_tables 28672 1 iptable_filter
x_tables 40960 3 iptable_
tcp_diag 16384 0
udp_diag 16384 0
raw_diag 16384 0
inet_diag 24576 3 tcp_diag,
unix_diag 16384 0
binfmt_misc 20480 1
ipmi_ssif 32768 0
kvm_intel 217088 0
kvm 614400 1 kvm_intel
irqbypass 16384 1 kvm
crct10dif_pclmul 16384 0
crc32_pclmul 16384 0
ghash_clmulni_intel 16384 0
pcbc 16384 0
aesni_intel 188416 0
aes_x86_64 20480 1 aesni_intel
crypto_simd 16384 1 aesni_intel
glue_helper 16384 1 aesni_intel
cryptd 24576 3 crypto_
ipmi_si 57344 0
ipmi_devintf 20480 0
ipmi_msghandler 53248 3 ipmi_devintf,
acpi_pad 180224 0
bonding 163840 0
lp 20480 0
parport 49152 1 lp
autofs4 40960 2
mgag200 45056 1
i2c_algo_bit 16384 1 mgag200
ttm 106496 1 mgag200
drm_kms_helper 172032 1 mgag200
syscopyarea 16384 1 drm_kms_helper
tg3 167936 0
sysfillrect 16384 1 drm_kms_helper
sysimgblt 16384 1 drm_kms_helper
fb_sys_fops 16384 1 drm_kms_helper
ptp 20480 1 tg3
drm 401408 4 drm_kms_
pps_core 20480 1 ptp
megaraid_sas 143360 12
lspci -vnn -s 0b:00.0
0b:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] [1000:005d] (rev 02)
Subsystem: IBM MegaRAID SAS-3 3108 [Invader] [1014:0454]
Physical Slot: 9
Flags: bus master, fast devsel, latency 0, IRQ 31, NUMA node 0
I/O ports at 2e00 [size=256]
Memory at 903f0000 (64-bit, non-prefetchable) [size=64K]
Memory at 90400000 (64-bit, non-prefetchable) [size=1M]
Expansion ROM at <ignored> [disabled]
Capabilities: [50] Power Management version 3
Capabilities: [68] Express Endpoint, MSI 00
Capabilities: [d0] Vital Product Data
Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=97 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [1e0] #19
Capabilities: [1c0] Power Budgeting <?>
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: megaraid_sas
Kernel modules: megaraid_sas
Strace example for 10MB:
strace -ttvvff -o mel.strace dd bs=1024k count=10 if=/dev/zero of=/data/
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 6.73972 s, 1.6 MB/s
14:59:19.145037 execve("/bin/dd", ["dd", "bs=1024k", "count=10", "if=/dev/zero", "of=/data/
14:59:19.145685 brk(NULL) = 0x563734ec4000
14:59:19.145766 access(
14:59:19.145846 access(
14:59:19.145909 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
14:59:19.145970 fstat(3, {st_dev=makedev(8, 3), st_ino=845, st_mode=
14:59:19.146055 mmap(NULL, 28885, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f49d79db000
14:59:19.146108 close(3) = 0
14:59:19.146159 access(
14:59:19.146215 openat(AT_FDCWD, "/lib/x86_
14:59:19.146270 read(3, "\177ELF\
14:59:19.146327 fstat(3, {st_dev=makedev(8, 3), st_ino=1413, st_mode=
14:59:19.146403 mmap(NULL, 8192, PROT_READ|
14:59:19.146460 mmap(NULL, 4131552, PROT_READ|
14:59:19.146510 mprotect(
14:59:19.146567 mmap(0x7f49d77b
14:59:19.146629 mmap(0x7f49d77b
14:59:19.146689 close(3) = 0
14:59:19.146758 arch_prctl(
14:59:19.146895 mprotect(
14:59:19.146990 mprotect(
14:59:19.147044 mprotect(
14:59:19.147093 munmap(
14:59:19.147190 rt_sigaction(
14:59:19.147248 rt_sigaction(
14:59:19.147301 rt_sigaction(
14:59:19.147430 brk(NULL) = 0x563734ec4000
14:59:19.147475 brk(0x563734ee5000) = 0x563734ee5000
14:59:19.147535 openat(AT_FDCWD, "/usr/lib/
14:59:19.147602 fstat(3, {st_dev=makedev(8, 7), st_ino=48819, st_mode=
14:59:19.147687 mmap(NULL, 1687536, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f49d783d000
14:59:19.147746 close(3) = 0
14:59:19.147853 openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
14:59:19.147912 dup2(3, 0) = 0
14:59:19.147960 close(3) = 0
14:59:19.148005 lseek(0, 0, SEEK_CUR) = 0
14:59:19.148051 openat(AT_FDCWD, "/data/
14:59:19.293893 dup2(3, 1) = 1
14:59:19.293961 close(3) = 0
14:59:19.294012 mmap(NULL, 1060864, PROT_READ|
14:59:19.294069 read(0, "\0\0\0\
14:59:19.294495 write(1, "\0\0\0\
14:59:26.025616 read(0, "\0\0\0\
14:59:26.025759 write(1, "\0\0\0\
14:59:26.026304 read(0, "\0\0\0\
14:59:26.026416 write(1, "\0\0\0\
14:59:26.026966 read(0, "\0\0\0\
14:59:26.027067 write(1, "\0\0\0\
14:59:26.027605 read(0, "\0\0\0\
14:59:26.027714 write(1, "\0\0\0\
14:59:26.028242 read(0, "\0\0\0\
14:59:26.028339 write(1, "\0\0\0\
14:59:26.028841 read(0, "\0\0\0\
14:59:26.028927 write(1, "\0\0\0\
14:59:26.029459 read(0, "\0\0\0\
14:59:26.029558 write(1, "\0\0\0\
14:59:26.030123 read(0, "\0\0\0\
14:59:26.030235 write(1, "\0\0\0\
14:59:26.030790 read(0, "\0\0\0\
14:59:26.030902 write(1, "\0\0\0\
14:59:26.031457 close(0) = 0
14:59:26.031503 close(1) = 0
14:59:26.032941 openat(AT_FDCWD, "/usr/share/
14:59:26.033098 fstat(0, {st_dev=makedev(8, 3), st_ino=992, st_mode=
14:59:26.033168 read(0, "# Locale name alias data base.\n#"..., 4096) = 2995
14:59:26.033323 read(0, "", 4096) = 0
14:59:26.033358 close(0) = 0
14:59:26.033400 openat(AT_FDCWD, "/usr/share/
14:59:26.033447 openat(AT_FDCWD, "/usr/share/
14:59:26.033487 openat(AT_FDCWD, "/usr/share/
14:59:26.033616 openat(AT_FDCWD, "/usr/share/
14:59:26.033678 write(2, "10+0 records in\n10+0 records out"..., 33) = 33
14:59:26.033747 write(2, "10485760 bytes (10 MB, 10 MiB) c"..., 58) = 58
14:59:26.033785 write(2, "\n", 1) = 1
14:59:26.033824 close(2) = 0
14:59:26.033863 exit_group(0) = ?
14:59:26.034041 +++ exited with 0 +++
affects: | ubuntu → linux-signed-hwe (Ubuntu) |
Thank you for taking the time to report this bug and helping to make Ubuntu better. Please execute the following command only once, as it will automatically gather debugging information, in a terminal:
apport-collect 1928744
When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https:/ /wiki.ubuntu. com/ReportingBu gs.
Note: you may have issues with this, as you've changed the package to the HWE kernel (kernel 5.4 for 18.04) but your details imply you're using 18.04's GA kernel of 4.15 (Ubuntu 16.04 using HWE uses the 4.15 kernel)
If you have issues; please change the package to 'linux' then try apport-collect again.