2020-08-26 12:32:21 |
Dmitrii Shcherbakov |
description |
I found a race condition which can be avoided by using wildcard rules in device cgroups, however, I do not see a way to enable that in an interface.
There is a use-case for MicroStack where iSCSI targets are added to the host kernel as block devices via iscsid + the iscsi-tcp kernel module.
An immediate idea is to:
* add block-devices interface to nova-compute and libvirtd apps;
* as a result, get major and minor devices of the hot-plugged devices added to device cgroups of Nova and libvirtd (/sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.list).
* This part of the interface makes sure of that: https://github.com/snapcore/snapd/blob/2.46/interfaces/builtin/block_devices.go#L97
As it turns out, this approach is racy since the device is attempted to be used prior to its major and minor number being added to the relevant device cgroup via: /sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.allow
snap-device-helper is responsible for that https://github.com/snapcore/snapd/blob/2.46/cmd/snap-confine/snap-device-helper#L73
In essence, the block special file is created and used prior to the time when snapd runs snap-device-helper and confined applications are not synchronized with the operation of the helper in any way.
In the failure mode I observe consistently, I get "Operation not permitted" which is the EPERM returned from the kernel when it enforces accesses based on what is present in the device cgroup:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/security/device_cgroup.c?h=Ubuntu-5.4.0-44.48#n823
Specific to my use-case, what I see is that Nova tells libvirt to use a block device which fails with EPERM. Then Nova tries to remove the volume it just tried to attach and do `blockdev --flushbufs` in the process which fails as well:
* try: virt_driver.attach_volume (Nova) -> virStorageFileReportBrokenChain (libvirt) -> Cannot access storage file '/dev/sde': Operation not permitted -> libvirt.libvirtError Cannot access storage file '/dev/sde': Operation not permitted
* except: "Driver failed to attach volume..." -> volume_api.attachment_delete -> ... -> flush_device_io -> blockdev --flushbufs /dev/sde -> blockdev: cannot open /dev/sde: Operation not permitted
https://opendev.org/openstack/nova/src/branch/stable/ussuri/nova/virt/block_device.py#L498-L510 (Nova code)
https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/util/virstoragefile.c?h=applied/ubuntu/focal#n4877 ("Cannot access storage" in libvirt)
https://paste.ubuntu.com/p/RTgq8XkzY6/ (logs)
If I add a wildcard rule to allow devices with any minor number and a certain major number to be used, this race condition is avoided.
sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.libvirtd/devices.allow'
sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.nova-compute/devices.allow' |
I found a race condition which can be avoided by using wildcard rules in device cgroups, however, I do not see a way to enable that in an interface.
There is a use-case for MicroStack where iSCSI targets are added to the host kernel as block devices via iscsid + the iscsi-tcp kernel module.
An immediate idea is to:
* add block-devices interface to nova-compute and libvirtd apps;
* as a result, get major and minor devices of the hot-plugged devices added to device cgroups of Nova and libvirtd (/sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.list).
* This part of the interface makes sure of that: https://github.com/snapcore/snapd/blob/2.46/interfaces/builtin/block_devices.go#L97
As it turns out, this approach is racy since the device is attempted to be used prior to its major and minor number being added to the relevant device cgroup via: /sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.allow
snap-device-helper is responsible for that https://github.com/snapcore/snapd/blob/2.46/cmd/snap-confine/snap-device-helper#L73
In essence, the block special file is created and used prior to the time when snapd runs snap-device-helper and confined applications are not synchronized with the operation of the helper in any way.
In the failure mode I observe consistently, I get "Operation not permitted" which is the EPERM returned from the kernel when it enforces accesses based on what is present in the device cgroup:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/security/device_cgroup.c?h=Ubuntu-5.4.0-44.48#n823
Specific to my use-case, what I see is that Nova tells libvirt to use a block device which fails with EPERM. Then Nova tries to remove the volume it just tried to attach and do `blockdev --flushbufs` in the process which fails as well:
* try: virt_driver.attach_volume (Nova) -> virStorageFileReportBrokenChain (libvirt) -> Cannot access storage file '/dev/sde': Operation not permitted -> libvirt.libvirtError Cannot access storage file '/dev/sde': Operation not permitted
* except: "Driver failed to attach volume..." -> volume_api.attachment_delete -> ... -> flush_device_io -> blockdev --flushbufs /dev/sde -> blockdev: cannot open /dev/sde: Operation not permitted
https://opendev.org/openstack/nova/src/branch/stable/ussuri/nova/virt/block_device.py#L498-L510 (Nova code)
https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/util/virstoragefile.c?h=applied/ubuntu/focal#n4877 ("Cannot access storage" in libvirt)
https://paste.ubuntu.com/p/RTgq8XkzY6/ (logs)
If I add a wildcard rule to allow devices with any minor number and a certain major number to be used, this race condition is avoided.
sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.libvirtd/devices.allow'
sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.nova-compute/devices.allow'
---------------------------------------------------------------------
Another simple use-case this is valid for is working with loop devices.
If I have this in an interface:
const connectedPlugAppArmor = `
/dev/loop-control rw,
/dev/loop[0-9]* rw,
`
var microStackConnectedPlugUDev = []string{
`SUBSYSTEM=="block", KERNEL=="loop[0-9]*"`,
`SUBSYSTEM=="misc", KERNEL=="loop-control"`,
}
And try to use `losetup -f` when there are no free loop files available:
fallocate -l $loop_file_size $loop_file
losetup -f $loop_file
I will get "Operation not permitted" during the losetup invocation since the device cgroup entry is not added fast enough.
This is a much simpler reproducer then the one with iSCSI. |
|
2020-09-09 17:38:28 |
Dmitrii Shcherbakov |
description |
I found a race condition which can be avoided by using wildcard rules in device cgroups, however, I do not see a way to enable that in an interface.
There is a use-case for MicroStack where iSCSI targets are added to the host kernel as block devices via iscsid + the iscsi-tcp kernel module.
An immediate idea is to:
* add block-devices interface to nova-compute and libvirtd apps;
* as a result, get major and minor devices of the hot-plugged devices added to device cgroups of Nova and libvirtd (/sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.list).
* This part of the interface makes sure of that: https://github.com/snapcore/snapd/blob/2.46/interfaces/builtin/block_devices.go#L97
As it turns out, this approach is racy since the device is attempted to be used prior to its major and minor number being added to the relevant device cgroup via: /sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.allow
snap-device-helper is responsible for that https://github.com/snapcore/snapd/blob/2.46/cmd/snap-confine/snap-device-helper#L73
In essence, the block special file is created and used prior to the time when snapd runs snap-device-helper and confined applications are not synchronized with the operation of the helper in any way.
In the failure mode I observe consistently, I get "Operation not permitted" which is the EPERM returned from the kernel when it enforces accesses based on what is present in the device cgroup:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/security/device_cgroup.c?h=Ubuntu-5.4.0-44.48#n823
Specific to my use-case, what I see is that Nova tells libvirt to use a block device which fails with EPERM. Then Nova tries to remove the volume it just tried to attach and do `blockdev --flushbufs` in the process which fails as well:
* try: virt_driver.attach_volume (Nova) -> virStorageFileReportBrokenChain (libvirt) -> Cannot access storage file '/dev/sde': Operation not permitted -> libvirt.libvirtError Cannot access storage file '/dev/sde': Operation not permitted
* except: "Driver failed to attach volume..." -> volume_api.attachment_delete -> ... -> flush_device_io -> blockdev --flushbufs /dev/sde -> blockdev: cannot open /dev/sde: Operation not permitted
https://opendev.org/openstack/nova/src/branch/stable/ussuri/nova/virt/block_device.py#L498-L510 (Nova code)
https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/util/virstoragefile.c?h=applied/ubuntu/focal#n4877 ("Cannot access storage" in libvirt)
https://paste.ubuntu.com/p/RTgq8XkzY6/ (logs)
If I add a wildcard rule to allow devices with any minor number and a certain major number to be used, this race condition is avoided.
sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.libvirtd/devices.allow'
sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.nova-compute/devices.allow'
---------------------------------------------------------------------
Another simple use-case this is valid for is working with loop devices.
If I have this in an interface:
const connectedPlugAppArmor = `
/dev/loop-control rw,
/dev/loop[0-9]* rw,
`
var microStackConnectedPlugUDev = []string{
`SUBSYSTEM=="block", KERNEL=="loop[0-9]*"`,
`SUBSYSTEM=="misc", KERNEL=="loop-control"`,
}
And try to use `losetup -f` when there are no free loop files available:
fallocate -l $loop_file_size $loop_file
losetup -f $loop_file
I will get "Operation not permitted" during the losetup invocation since the device cgroup entry is not added fast enough.
This is a much simpler reproducer then the one with iSCSI. |
I found a race condition which can be avoided by using wildcard rules in device cgroups, however, I do not see a way to enable that in an interface.
There is a use-case for MicroStack where iSCSI targets are added to the host kernel as block devices via iscsid + the iscsi-tcp kernel module.
An immediate idea is to:
* add block-devices interface to nova-compute and libvirtd apps;
* as a result, get major and minor devices of the hot-plugged devices added to device cgroups of Nova and libvirtd (/sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.list).
* This part of the interface makes sure of that: https://github.com/snapcore/snapd/blob/2.46/interfaces/builtin/block_devices.go#L97
As it turns out, this approach is racy since the device is attempted to be used prior to its major and minor number being added to the relevant device cgroup via: /sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.allow
snap-device-helper is responsible for that https://github.com/snapcore/snapd/blob/2.46/cmd/snap-confine/snap-device-helper#L73
In essence, the block special file is created and used prior to the time when snapd runs snap-device-helper and confined applications are not synchronized with the operation of the helper in any way.
In the failure mode I observe consistently, I get "Operation not permitted" which is the EPERM returned from the kernel when it enforces accesses based on what is present in the device cgroup:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/security/device_cgroup.c?h=Ubuntu-5.4.0-44.48#n823
Specific to my use-case, what I see is that Nova tells libvirt to use a block device which fails with EPERM. Then Nova tries to remove the volume it just tried to attach and do `blockdev --flushbufs` in the process which fails as well:
* try: virt_driver.attach_volume (Nova) -> virStorageFileReportBrokenChain (libvirt) -> Cannot access storage file '/dev/sde': Operation not permitted -> libvirt.libvirtError Cannot access storage file '/dev/sde': Operation not permitted
* except: "Driver failed to attach volume..." -> volume_api.attachment_delete -> ... -> flush_device_io -> blockdev --flushbufs /dev/sde -> blockdev: cannot open /dev/sde: Operation not permitted
https://opendev.org/openstack/nova/src/branch/stable/ussuri/nova/virt/block_device.py#L498-L510 (Nova code)
https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/util/virstoragefile.c?h=applied/ubuntu/focal#n4877 ("Cannot access storage" in libvirt)
https://paste.ubuntu.com/p/RTgq8XkzY6/ (logs)
If I add a wildcard rule to allow devices with any minor number and a certain major number to be used, this race condition is avoided.
sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.libvirtd/devices.allow'
sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.nova-compute/devices.allow'
---------------------------------------------------------------------
Another simple use-case this is valid for is working with loop devices.
If I have this in an interface:
const connectedPlugAppArmor = `
/dev/loop-control rw,
/dev/loop[0-9]* rw,
`
var microStackConnectedPlugUDev = []string{
`SUBSYSTEM=="block", KERNEL=="loop[0-9]*"`,
`SUBSYSTEM=="misc", KERNEL=="loop-control"`,
}
And try to use `losetup -f` when there are no free loop files available:
fallocate -l $loop_file_size $loop_file
losetup -f $loop_file
I will get "Operation not permitted" during the losetup invocation since the device cgroup entry is not added fast enough.
This is a much simpler reproducer then the one with iSCSI.
---------------------------------------------------------------------
Update (09-09-2020):
Found one more use-case which is LV activation after reboot:
* reboot -> LV Status NOT available;
* lvchange -a y <vgname-for-lvs> -> device-mapper: reload ioctl on (253:3) failed: Operation not permitted |
|