== Comment: #0 - Manjunatha H R <email address hidden> - 2015-09-25 11:05:36 ==
Booting of Ubuntu15.10 lpar fails and control falls to initramfs.
uname -a
--------------
Linux (none) 4.2.0-10-generic #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 ppc64le GNU/Linux
Boot log:
-------------
Booting a command list
Loading Linux 4.2.0-10-generic ...
Loading initial ramdisk ...
OF stdout device is: /vdevice/vty@30000000
Preparing to boot Linux version 4.2.0-10-generic (buildd@fisher04) (gcc version 5.2.1 20150911 (Ubuntu 5.2.1-17ubuntu4) ) #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 (Ubuntu 4.2.0-10.12-generic 4.2.0)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/boot/vmlinux-4.2.0-10-generic root=UUID=822dd709-5b69-45a9-aba5-63cb55768ffb ro splash quiet topology_updates=off
memory layout at init:
memory_limit : 0000000000000000 (16 MB aligned)
alloc_bottom : 000000000bf80000
alloc_top : 0000000010000000
alloc_top_hi : 0000000010000000
rmo_top : 0000000010000000
ram_top : 0000000010000000
found display : /pci@80000002000002c/display@0, opening... done
instantiating rtas at 0x000000000eb60000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x000000000bf90000 -> 0x000000000bf91965
Device tree struct 0x000000000bfa0000 -> 0x000000000bfe0000
Quiescing Open Firmware ...
Booting Linux via __start() ...
-> smp_release_cpus()
spinning_secondaries = 199
<- smp_release_cpus()
<- setup_system()
[ 2.868103] [drm:radeon_device_init [radeon]] *ERROR* Unable to find PCI I/O BAR
[ 3.074553] [drm:radeon_atombios_init [radeon]] *ERROR* Unable to find PCI I/O BAR; using MMIO for ATOM IIO
[ 5.060785] lpfc 0002:90:00.0: 0:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
Scanning for Btrfs filesystems
fsck from util-linux 2.26.2
/dev/sdn2 is in use.
e2fsck: Cannot continue, aborting.
fsck exited with status code 8
[ 36.233086] rport-0:0-9: blocked FC remote port time out: removing rport
mount: mounting /dev/sdn2 on /root failed: Device or resource busy
Target filesystem doesn't have requested /sbin/init.
mount: mounting /dev on /root/dev failed: No such file or directory
No init found. Try passing init= bootarg.
BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
Enter 'help' for a list of built-in commands.
(initramfs)
-------------------------
This lpar is having multipath disks and boot disk is on a multipath disk.
Boot passes only whenever fsck tries to scan boot disk via : /dev/dm OR /dev/mapper/mpath
2. Boot passed when fsck tried scanning "/dev/dm-3"
Scanning for Btrfs filesystems
fsck from util-linux 2.26.2
/dev/dm-3: clean, 81802/3139584 files, 1040605/12558080 blocks
Boot fails, whenever fsck is called on /dev/sd
Boot fail scenario: Boot failed when fsck is called on "/dev/sdn"
-------------------------
Scanning for Btrfs filesystems
fsck from util-linux 2.26.2
/dev/sdn2 is in use.
e2fsck: Cannot continue, aborting.
fsck exited with status code 8
[ 36.108653] rport-0:0-9: blocked FC remote port time out: removing rport
mount: mounting /dev/sdn2 on /root failed: Device or resource busy
Target filesystem doesn't have requested /sbin/init.
mount: mounting /dev on /root/dev failed: No such file or directory
No init found. Try passing init= bootarg.
BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
Enter 'help' for a list of built-in commands.
== Comment: #12 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-02 13:51:29 ==
Hi Manju and Alton,
I could not reproduce this bug in 2 attempts.
The LPAR booted successfully, using the root=UUID= parameter.
By looking at this message from the description:
> mount: mounting /dev/sdn2 on /root failed: Device or resource busy
It should have happened because multipath udev rules failed to update the /dev/disk/by-id/<uuid> symlink from /dev/sdn to /dev/dm-X, but multipathing the path was successful (so it got locked/in-use).
If you can reproduce it again, please leave the LPAR in the failing state (in the initramfs), reopen this bug and ping me.
I'd be happy to debug it.
Thanks!
== Comment: #15 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-05 19:59:56 ==
This is probably a race between the resolve_device() call in mountroot() and the multipath discovery triggered by udev rules.
If resolve_device() runs before the root device is multipathed, $ROOT is set to an individual path (eg, /dev/sdf2) rather than its multipah device (eg, /dev/mapper/mpathb-part2), because the /dev/disk/by-uuid/<UUID> symlink is not updated yet.
The multipath discovery finishes after $ROOT is set, so the individual path becomes locked, and afterwards the root mount will be attempted on it -- this fails.
The LPAR is now patched w/ a test fix that is supposed to ensure resolve_device() only starts after udev rules are finished.
Can you try to recreate the issue, please? Thanks!
If resolve_device() runs before the multipath udev rules
(the rules multipath the root device and update the /dev/disk/by-uuid symlink of $ROOT)
this happens:
Begin: Checking root file system ... fsck from util-linux 2.26.2
/dev/sdf2 is in use.
e2fsck: Cannot continue, aborting.
fsck exited with status code 8
done.
Warning: File system check failed but did not detect errors
mount: mounting /dev/sdf2 on /root failed: Device or resource busy
done.
Target filesystem doesn't have requested /sbin/init.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ...
...
mount: mounting /dev on /root/dev failed: No such file or directory
done.
No init found. Try passing init= bootarg.
...
(initramfs)
So, /dev/sdf2 is in use ... and hits Device or resource busy.
This comes from $ROOT.
However, the root=UUID= symlink points to the multipath device:
It's probably because resolve_device() was racing w/ the multipath discoveries from udev rules.
resolve_device() finished before the /dev/disk/by-uuid/ symlink was updated by multipath discovery.
== Comment: #18 - Manjunatha H R <email address hidden> - 2015-10-06 06:05:30 ==
Thank you Mauricio for a quick fix!!
Lpar is booting up properly without seeing device or resource busy errors.
== Comment: #20 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-06 09:04:50 ==
Confirmed w/ Manju the # of tests.
10:00:41 AM: Manjunatha H R: Hi Mauricio, I tried around 10 boots..
10:01:07 AM: Manjunatha H R: all times it booted up..
== Comment: #0 - Manjunatha H R <email address hidden> - 2015-09-25 11:05:36 ==
Booting of Ubuntu15.10 lpar fails and control falls to initramfs.
uname -a
--------------
Linux (none) 4.2.0-10-generic #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 ppc64le GNU/Linux
Boot log:
-------------
Booting a command list
Loading Linux 4.2.0-10-generic ... vty@30000000 architecture- support. .. done /boot/vmlinux- 4.2.0-10- generic root=UUID= 822dd709- 5b69-45a9- aba5-63cb55768f fb ro splash quiet topology_ updates= off 0002c/display@ 0, opening... done 000... done secondaries = 199 device_ init [radeon]] *ERROR* Unable to find PCI I/O BAR atombios_ init [radeon]] *ERROR* Unable to find PCI I/O BAR; using MMIO for ATOM IIO
Loading initial ramdisk ...
OF stdout device is: /vdevice/
Preparing to boot Linux version 4.2.0-10-generic (buildd@fisher04) (gcc version 5.2.1 20150911 (Ubuntu 5.2.1-17ubuntu4) ) #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 (Ubuntu 4.2.0-10.12-generic 4.2.0)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-
command line: BOOT_IMAGE=
memory layout at init:
memory_limit : 0000000000000000 (16 MB aligned)
alloc_bottom : 000000000bf80000
alloc_top : 0000000010000000
alloc_top_hi : 0000000010000000
rmo_top : 0000000010000000
ram_top : 0000000010000000
found display : /pci@8000000200
instantiating rtas at 0x000000000eb60
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x000000000bf90000 -> 0x000000000bf91965
Device tree struct 0x000000000bfa0000 -> 0x000000000bfe0000
Quiescing Open Firmware ...
Booting Linux via __start() ...
-> smp_release_cpus()
spinning_
<- smp_release_cpus()
<- setup_system()
[ 2.868103] [drm:radeon_
[ 3.074553] [drm:radeon_
[ 5.060785] lpfc 0002:90:00.0: 0:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
Scanning for Btrfs filesystems
fsck from util-linux 2.26.2
/dev/sdn2 is in use.
e2fsck: Cannot continue, aborting.
fsck exited with status code 8
[ 36.233086] rport-0:0-9: blocked FC remote port time out: removing rport
mount: mounting /dev/sdn2 on /root failed: Device or resource busy
Target filesystem doesn't have requested /sbin/init.
mount: mounting /dev on /root/dev failed: No such file or directory
No init found. Try passing init= bootarg.
BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
Enter 'help' for a list of built-in commands.
(initramfs) ------- ------- ----
-------
This lpar is having multipath disks and boot disk is on a multipath disk.
Boot passes only whenever fsck tries to scan boot disk via : /dev/dm OR /dev/mapper/mpath
Boot Pass scenarios: ------- ------- ------- mpathb" mpathb- part2: clean, 81802/3139584 files, 1040598/12558080 blocks
-------
1. Boot passed when fsck tried scanning "/dev/mapper/
fsck from util-linux 2.26.2
/dev/mapper/
2. Boot passed when fsck tried scanning "/dev/dm-3"
Scanning for Btrfs filesystems
fsck from util-linux 2.26.2
/dev/dm-3: clean, 81802/3139584 files, 1040605/12558080 blocks
Boot fails, whenever fsck is called on /dev/sd
Boot fail scenario: Boot failed when fsck is called on "/dev/sdn" ------- ------- ----
-------
Scanning for Btrfs filesystems
fsck from util-linux 2.26.2
/dev/sdn2 is in use.
e2fsck: Cannot continue, aborting.
fsck exited with status code 8
[ 36.108653] rport-0:0-9: blocked FC remote port time out: removing rport
mount: mounting /dev/sdn2 on /root failed: Device or resource busy
Target filesystem doesn't have requested /sbin/init.
mount: mounting /dev on /root/dev failed: No such file or directory
No init found. Try passing init= bootarg.
BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
Enter 'help' for a list of built-in commands.
(initramfs) ------- ------- ----
-------
Contact info:
----------------
Manju (<email address hidden>) A.P. (<email address hidden>)
== Comment: #12 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-02 13:51:29 ==
Hi Manju and Alton,
I could not reproduce this bug in 2 attempts.
The LPAR booted successfully, using the root=UUID= parameter.
By looking at this message from the description:
> mount: mounting /dev/sdn2 on /root failed: Device or resource busy
It should have happened because multipath udev rules failed to update the /dev/disk/ by-id/< uuid> symlink from /dev/sdn to /dev/dm-X, but multipathing the path was successful (so it got locked/in-use).
If you can reproduce it again, please leave the LPAR in the failing state (in the initramfs), reopen this bug and ping me.
I'd be happy to debug it.
Thanks!
== Comment: #15 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-05 19:59:56 ==
This is probably a race between the resolve_device() call in mountroot() and the multipath discovery triggered by udev rules.
If resolve_device() runs before the root device is multipathed, $ROOT is set to an individual path (eg, /dev/sdf2) rather than its multipah device (eg, /dev/mapper/ mpathb- part2), because the /dev/disk/ by-uuid/ <UUID> symlink is not updated yet.
The multipath discovery finishes after $ROOT is set, so the individual path becomes locked, and afterwards the root mount will be attempted on it -- this fails.
The LPAR is now patched w/ a test fix that is supposed to ensure resolve_device() only starts after udev rules are finished.
Can you try to recreate the issue, please? Thanks!
== Comment: #16 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-05 20:05:37 ==
Console messages
(!) Note: the local-premount messages (Running ... & done.) occur around SCSI device scan/discovery time.
... local-premount ... [ 5.187071] scsi 0:0:0:0: Direct-Access IBM 2107900 .850 PQ: 0 ANSI: 5
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading multipath modules ... [ 5.113397] device-mapper: multipath: version 1.9.0 loaded
Success: loaded module dm-multipath.
Failure: failed to load module dm-emc.
done.
Begin: Discovering multipaths ... done.
done.
Begin: Running /scripts/
[ 5.195814] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 5.203289] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
...
[ 5.878046] sd 0:0:3:3: [sdp] Attached SCSI disk
...
<10-20 SCSI disks via FC>
...
[ 5.923577] device-mapper: multipath round-robin: version 1.0.0 loaded
...
done.
...
If resolve_device() runs before the multipath udev rules
(the rules multipath the root device and update the /dev/disk/by-uuid symlink of $ROOT)
this happens:
Begin: Checking root file system ... fsck from util-linux 2.26.2
/dev/sdf2 is in use.
e2fsck: Cannot continue, aborting.
fsck exited with status code 8
done.
Warning: File system check failed but did not detect errors
mount: mounting /dev/sdf2 on /root failed: Device or resource busy local-bottom ... done. init-bottom ...
done.
Target filesystem doesn't have requested /sbin/init.
Begin: Running /scripts/
Begin: Running /scripts/
...
mount: mounting /dev on /root/dev failed: No such file or directory
done.
No init found. Try passing init= bootarg.
...
(initramfs)
So, /dev/sdf2 is in use ... and hits Device or resource busy.
This comes from $ROOT.
However, the root=UUID= symlink points to the multipath device:
(initramfs) echo $ROOT
/dev/sdf2
(initramfs) cat /proc/cmdline /boot/vmlinux- 4.2.0-12- generic root=UUID= 44bd8a6e- 8613-431a- 9335-879d8cf5d0 e4 ro
BOOT_IMAGE=
(initramfs) ls -l /dev/disk/ by-uuid/ 44bd8a6e- 8613-431a- 9335-879d8cf5d0 e4 by-uuid/ 44bd8a6e- 8613-431a- 9335-879d8cf5d0 e4 -> ../../dm-19
lrwxrwxrwx 1 11 /dev/disk/
(initramfs) ls -l /dev/sdf
brw------- 1 8, 80 /dev/sdf
(initramfs) dmsetup table | grep 8:80
mpathb: 0 104857600 multipath 0 0 1 1 round-robin 0 4 1 8:80 1 8:16 1 8:144 1 8:208 1
It's probably because resolve_device() was racing w/ the multipath discoveries from udev rules.
resolve_device() finished before the /dev/disk/by-uuid/ symlink was updated by multipath discovery.
Code:
initramfs :: /init
log_begin_msg "Mounting root file system"
...
mountroot
log_end_msg
(!) Note: message "Mounting root fs" and call to mountroot()
initramfs :: /scripts/local
mountroot()
{
local_mount_root
}
local_mount_root()
{
...
local_premount
ROOT= $(resolve_ device "$ROOT")
...
}
(!) Note: local_premount() is the last call before resolve_device() is called
local_premount() local-premount" local-premount
{
...
[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/
run_scripts /scripts/
[ "$quiet" != "y" ] && log_end_msg
...
}
So we're testing a call to 'udevadm settle' in /scripts/ local-premount/ multipath script.
== Comment: #17 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-05 20:08:26 ==
== Comment: #18 - Manjunatha H R <email address hidden> - 2015-10-06 06:05:30 ==
Thank you Mauricio for a quick fix!!
Lpar is booting up properly without seeing device or resource busy errors.
== Comment: #20 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-06 09:04:50 ==
Confirmed w/ Manju the # of tests.
10:00:41 AM: Manjunatha H R: Hi Mauricio, I tried around 10 boots..
10:01:07 AM: Manjunatha H R: all times it booted up..
Sounds good.
I'll be sending a patch/mirroring.