curtin

Slow formatting on SSDs in mdadm RAID10 with LVM and XFS

Bug #1882979 reported by David Andruczyk on 2020-06-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MAAS	Incomplete	Undecided	Unassigned
	curtin	New	Undecided	Unassigned

Bug Description

MAAS: 2.6.2-7841-ga10625be3-0ubuntu1~18.04.1

Building a machine with a RAID10 of 12 4T SSD's partitioned into two PV's for two separate LVM groups (750G and the remainder). Formatting of XFS blocks:

Running command['mkfs.xfs', '-f', '-L', '', '-m', 'uuid=<uuid>', /dev/<vg-name>/<lv-name>

INFO: tags: md2_resync:12438 blocked for more than 120 seconds
Tainted: P O 5.4.0-37-generic #41-Ubuntu
....
INFO: tags: mkfs.xfs:13764 blocked for more than 120 seconds
Tainted: P O 5.4.0-37-generic #41-Ubuntu
....

Logging into the deploying instance shows that the MDADM array is rebuilding at an abysmally slow speed of 5K/sec (not a typo)
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdn3[1] sdm2[0]
5848064 blocks super 1.2 [2/2] [UU]
resync=DELAYED

md0 : active raid1 sdn2[1] sdm1[0]
      228436992 blocks super 1.2 [2/2] [UU]
      [=========>...........] resync = 48.9% (111858496/228436992) finish=14.0min speed=138180K/sec
      bitmap: 2/2 pages [8KB], 65536KB chunk

md2 : active raid10 sdl2[11] sdk2[10] sdj2[9] sdi2[8] sdh2[7] sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
      22503598080 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]
      [>....................] resync = 0.0% (6832000/22503598080) finish=65615567.6min speed=5K/sec
      bitmap: 168/168 pages [672KB], 65536KB chunk

unused devices: <none>

installing iostat (sudo apt install sysstat) shows a high discard rate on the drives on md2
avg-cpu: %user %nice %system %iowait %steal %idle
0.04 0.00 0.43 2.90 0.00 96.62

Device dm-0 0.00 dm-1 0.00 loop0 0.00 loop1 0.00 loop2 0.00 loop3 0.00 md0 0.00 md1 0.00 md2 0.00 sda 419.60 sdb 418.80 sdc 418.80 sdd 420.00 sde 420.80 sdf 419.60 sdg 419.60 sdh 419.60 sdi 419.20 sdj 420.00 sdk 419.00 sdl 419.60 sdm sdn 248.40 tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
0.00 0.00 0.00 0 0 0
0.00 0.00 0.00 0 0 0
0.00 0.00 0.00 0 0 0
0.00 0.00 0.00 0 0 0
0.00 0.00 0.00 0 0 0
0.00 0.00 0.00 0 0 0
0.00 0.00 0.00 0 0 0
0.00 0.00 0.00 0 0 0
0.00 0.00 0.00 0 0 0
0.00 0.00 107417.60 0 0 537088
0.00 0.00 107212.80 0 0 536064
0.00 0.00 107212.80 0 0 536064
0.00 0.00 107520.00 0 0 537600
0.00 0.00 107724.80 0 0 538624
0.00 0.00 107417.60 0 0 537088
0.00 0.00 107417.60 0 0 537088
0.00 0.00 107417.60 0 0 537088
0.00 0.00 107315.20 0 0 536576
0.00 0.00 107520.00 0 0 537600
0.00 0.00 107310.40 0 0 536552
0.00 0.00 107417.60 0 0 537088
248.40 123276.80 0.30 0.00 616384 1 0
0.00 123123.50 0.00 0 615617 0

MaaS gui will eventually timeout and report "deployment failed" however the deploy WILL COMPLETE EVENTUALLY, usually in as long as 2-3 HOURS if left alone. When the host reboots into the target OS (xenial), the RAID10 rebuild speed will return to normal (throttled at 200MB/sec by /proc/sys/dev/raid/speed_limit_max), if that value is increased the array will rebuild at up to 1.3GB/sec, which is likely the bandwidth limit of the backplane to board connections.

This is on a host where I had issued a full blkdiscard of all SSD's from an ephemeral environment to provide "clean slates" which should give the fastest performance from the SSD's. If the drives had a previous configuration from a previous deployment the install wouldn't even get this far, it would block on trying to remove the old stuff (separate bug filed for this already https://bugs.launchpad.net/maas/+bug/1882964

Revision history for this message

Alberto Donato (ack) wrote on 2021-09-06:

Adding curtin as I'm not sure whether there's something maas could configure different for this case

Revision history for this message

Alberto Donato (ack) wrote on 2021-09-06:

Also, could you please attach the curtin config used for the deployment?
It's left in the /root dir during deployment.

Changed in maas:
status:	New → Incomplete

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.