Unlike all the other devices MAAS works with, the Intel NVMe device reports a serial number that cannot be found anywhere in /dev/disk/by-id/*. When curtin is supplied a serial number, it uses a heuristic to find the device as follows:
So arguably, this is a bug in the Intel NVMe serial number; the way it populates /dev/disk/* leaves much to be desired.
This is *arguably* a bug in curtin (and maybe MAAS, since we knowingly use the serial number even though `udevadm` can tell us that the serial cannot be found anywhere in /dev/disk/by-id/*), in that we could do a better job dealing with devices backed by not-so-robust kernel drivers. But I think we shouldn't encourage bad behavior on the part of driver writers, so I'm on the fence about whether or not we should fix it.
But mostly, I would argue that this is a bug in the Intel NVMe driver. The way they expose the device to userland is non-standard and arguably broken. When we ran `udevadm info -q all -n nvme0n1` on the device, we got the following pseudo-output:
You can see by the lines that start with "S:" and the "DEVLINKS=" line that the way this device is exposed is very non-standard. One would expect /dev/disk/by-id/* to contain a DEVLINK containing the serial number. Instead they expose a 'nvme-INTEL' link, which is (IMHO) a critical bug, because anyone expecting the things in /dev/disk/by-id/* to be unique will be in for a big surprise when they add a second NVMe device to a machine.
After further troubleshooting with cgregan, we've further narrowed this down.
We ran the following script on the node that was having trouble:
https:/ /gist.github. com/pontillo/ 0b92a7da2fba43f b5dce705be2dcf3 8b
Unlike all the other devices MAAS works with, the Intel NVMe device reports a serial number that cannot be found anywhere in /dev/disk/by-id/*. When curtin is supplied a serial number, it uses a heuristic to find the device as follows:
http:// bazaar. launchpad. net/~curtin- dev/curtin/ trunk/view/ 435/curtin/ commands/ block_meta. py#L270
http:// bazaar. launchpad. net/~curtin- dev/curtin/ trunk/view/ 435/curtin/ block/_ _init__ .py#L601
So arguably, this is a bug in the Intel NVMe serial number; the way it populates /dev/disk/* leaves much to be desired.
This is *arguably* a bug in curtin (and maybe MAAS, since we knowingly use the serial number even though `udevadm` can tell us that the serial cannot be found anywhere in /dev/disk/by-id/*), in that we could do a better job dealing with devices backed by not-so-robust kernel drivers. But I think we shouldn't encourage bad behavior on the part of driver writers, so I'm on the fence about whether or not we should fix it.
But mostly, I would argue that this is a bug in the Intel NVMe driver. The way they expose the device to userland is non-standard and arguably broken. When we ran `udevadm info -q all -n nvme0n1` on the device, we got the following pseudo-output:
nvme0n1: pci0000: 00/0000: 00:xx.0/ 0000:xx: 00.0/nvme/ nvme0/nvme0n1 CVMDxxxxxxxxxxx xxx id/nvme- INTEL /dev/disk/ by-id/nvme- INTEL /dev/SSDxxxxxxx xxx_CVMDxxxxxxx xxxxxxx /dev/nvme0n1 /devices/ pci0000: 00/0000: 00:xx.0/ 0000:xx: 00.0/nvme/ nvme0/nvme0n1 CVMDxxxxxxxxxxx xxx SHORT=CVMDxxxxx xxxxxxxxx D=xxxxxxx
P: /devices/
N: nvme0n1
S: SSDxxxxxxxxxx_
S: disk/by-
E: DEVLINKS=
E: DEVNAME=
E: DEVPATH=
E: DEVTYPE=disk
E: ID_SERIAL=INTEL SSDxxxxxxxxxx_
E: ID_SERIAL_
E: MAJOR=259
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZE
You can see by the lines that start with "S:" and the "DEVLINKS=" line that the way this device is exposed is very non-standard. One would expect /dev/disk/by-id/* to contain a DEVLINK containing the serial number. Instead they expose a 'nvme-INTEL' link, which is (IMHO) a critical bug, because anyone expecting the things in /dev/disk/by-id/* to be unique will be in for a big surprise when they add a second NVMe device to a machine.