[OSSA-2023-003] Unauthorized volume access through deleted volume attachments (CVE-2023-2088)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
Fix Released
|
Undecided
|
Unassigned | ||
OpenStack Compute (nova) |
Fix Released
|
Undecided
|
Unassigned | ||
Antelope |
Fix Released
|
Undecided
|
Unassigned | ||
Wallaby |
Fix Committed
|
Undecided
|
Unassigned | ||
Xena |
Fix Committed
|
Undecided
|
Unassigned | ||
Yoga |
Fix Released
|
Undecided
|
Unassigned | ||
Zed |
Fix Released
|
Undecided
|
Unassigned | ||
OpenStack Security Advisory |
Fix Released
|
High
|
Jeremy Stanley | ||
OpenStack Security Notes |
Fix Released
|
High
|
Jeremy Stanley | ||
glance_store |
Fix Released
|
Undecided
|
Unassigned | ||
kolla-ansible |
In Progress
|
Undecided
|
Unassigned | ||
Zed |
Fix Released
|
Undecided
|
Unassigned | ||
os-brick |
In Progress
|
Undecided
|
Unassigned |
Bug Description
Hello OpenStack Security Team,
I’m writing to you, as we faced a serious security breach in OpenStack functionality(
In short: we observed that newly created cinder volume(1GB size) was attached to compute node instance, but an instance recognized it as a 115GB volume, which(this 115GB volume) in fact was connected to another instance on the same compute node.
[1. Test environment]
Compute node: OpenStack Ussuri configured with Huawei dorado as a storage backend(
Packages:
v# dpkg -l | grep libvirt
ii libvirt-clients 6.0.0-0ubuntu8.16 amd64 Programs for the libvirt library
ii libvirt-daemon 6.0.0-0ubuntu8.16 amd64 Virtualization daemon
ii libvirt-
ii libvirt-
ii libvirt-
ii libvirt-
ii libvirt0:amd64 6.0.0-0ubuntu8.16 amd64 library for interfacing with different virtualization systems
ii nova-compute-
ii python3-libvirt 6.1.0-1 amd64 libvirt Python 3 bindings
# dpkg -l | grep qemu
ii ipxe-qemu 1.0.0+git-
ii ipxe-qemu-
ii libvirt-
ii qemu 1:4.2-3ubuntu6.23 amd64 fast processor emulator, dummy package
ii qemu-block-
ii qemu-kvm 1:4.2-3ubuntu6.23 amd64 QEMU Full virtualization on x86 hardware
ii qemu-system-common 1:4.2-3ubuntu6.23 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-data 1:4.2-3ubuntu6.23 all QEMU full system emulation (data files)
ii qemu-system-
ii qemu-system-x86 1:4.2-3ubuntu6.23 amd64 QEMU full system emulation binaries (x86)
ii qemu-utils 1:4.2-3ubuntu6.23 amd64 QEMU utilities
# dpkg -l | grep nova
ii nova-common 2:21.2.4-0ubuntu1 all OpenStack Compute - common files
ii nova-compute 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node (KVM)
ii nova-compute-
ii python3-nova 2:21.2.4-0ubuntu1 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:17.0.0-0ubuntu1 all client library for OpenStack Compute API - 3.x
# dpkg -l | grep multipath
ii multipath-tools 0.8.3-1ubuntu2 amd64 maintain multipath block device access
# dpkg -l | grep iscsi
ii libiscsi7:amd64 1.18.0-2 amd64 iSCSI client shared library
ii open-iscsi 2.0.874-
# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_
DISTRIB_
DISTRIB_
Instance OS: Debian-11-amd64
[2. Test scenario]
Already created instance with two volumes attached. First - 10GB for root system, second - 115GB used as vdb. Recognized by compute node as vda - dm-11, vdb - dm-9:
# virsh domblklist 90fas439-
Target Source
-------
vda /dev/dm-11
vdb /dev/dm-9
# multipath -ll
(...)
36e00084100ee7e
size=115G features='0' hwhandler='0' wp=rw
`-+- policy=
|- 14:0:0:4 sdm 8:192 active ready running
|- 15:0:0:4 sdo 8:224 active ready running
|- 16:0:0:4 sdl 8:176 active ready running
`- 17:0:0:4 sdn 8:208 active ready running
(...)
36e00084100ee7e
size=10G features='0' hwhandler='0' wp=rw
`-+- policy=
|- 14:0:0:3 sdq 65:0 active ready running
|- 15:0:0:3 sdr 65:16 active ready running
|- 16:0:0:3 sdp 8:240 active ready running
`- 17:0:0:3 sds 65:32 active ready running
Creating a new instance, with the same OS guest system and 10GB root volume. After successful deployment, creating a new volume(1GB) size and attaching it to newly created instance. We can see after that:
# multipath -ll
(...)
36e00084100ee7e
size=115G features='0' hwhandler='0' wp=rw
`-+- policy=
|- 14:0:0:10 sdao 66:128 failed faulty running
|- 14:0:0:4 sdm 8:192 active ready running
|- 15:0:0:10 sdap 66:144 failed faulty running
|- 15:0:0:4 sdo 8:224 active ready running
|- 16:0:0:10 sdan 66:112 failed faulty running
|- 16:0:0:4 sdl 8:176 active ready running
|- 17:0:0:10 sdaq 66:160 failed faulty running
`- 17:0:0:4 sdn 8:208 active ready running
This way at instance we were able to see a new drive - not 1GB, but 115GB -> so it seems it was incorrectly attached and this way we were able to destroy some data on that volume.
Additionaly we were able to see many errors like that in compute node logs:
# dmesg -T | grep dm-9
[Fri Jan 27 13:37:42 2023] blk_update_request: critical target error, dev dm-9, sector 62918760 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[Fri Jan 27 13:37:42 2023] blk_update_request: critical target error, dev dm-9, sector 33625152 op 0x1:(WRITE) flags 0x8800 phys_seg 6 prio class 0
[Fri Jan 27 13:37:46 2023] blk_update_request: critical target error, dev dm-9, sector 66663000 op 0x1:(WRITE) flags 0x8800 phys_seg 5 prio class 0
[Fri Jan 27 13:37:46 2023] blk_update_request: critical target error, dev dm-9, sector 66598120 op 0x1:(WRITE) flags 0x8800 phys_seg 5 prio class 0
[Fri Jan 27 13:37:51 2023] blk_update_request: critical target error, dev dm-9, sector 66638680 op 0x1:(WRITE) flags 0x8800 phys_seg 12 prio class 0
[Fri Jan 27 13:37:56 2023] blk_update_request: critical target error, dev dm-9, sector 66614344 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Fri Jan 27 13:37:56 2023] blk_update_request: critical target error, dev dm-9, sector 66469296 op 0x1:(WRITE) flags 0x8800 phys_seg 24 prio class 0
[Fri Jan 27 13:37:56 2023] blk_update_request: critical target error, dev dm-9, sector 66586472 op 0x1:(WRITE) flags 0x8800 phys_seg 3 prio class 0
(...)
Unfortunately we do not know what is a perfect test-scenario for it as we could face such issue in less than 2% of our tries, but it looks like a serious security breach.
Additionally we observed that linux kernel is not fully clearing a device allocation(from volume detach), so some of drives names are visible in an output, i.e. lsblk command. Then, after new volume attachment we can see such names(i.e. sdao, sdap, sdan and so on) are reusable by that drive and wrongly mapped by multipath/iscsi to another drive and this way we hit an issue.
Our question is why linux kernel of compute node is not removing devices allocation and this way is leading to a scenario like that? Maybe this can be a solution here.
Thanks in advance for your help and understanding. In case when more details is needed, do not hesitate to contact me.
CVE References
Jeremy Stanley (fungi) wrote : | #1 |
description: | updated |
Changed in ossa: | |
status: | New → Incomplete |
Dan Smith (danms) wrote : | #2 |
I feel like this is almost certainly something that will require involvement from the cinder people. Nova's part in the volume attachment is pretty minimal, in that we get stuff from cinder, pass it to brick, and then configure the guest with the block device we're told (AFAIK). Unless we're messing up the last step, I think it's likely this is not just a Nova thing. Should we add cinder or brick as an affected project or just add some cinder people to the bug here?
Sylvain Bauza (sylvain-bauza) wrote : | #3 |
> Should we add cinder or brick as an affected project or just add some cinder people to the bug here?
I'd be in favor of adding the cinder project which would pull the cinder coresec team, right?
Sylvain Bauza (sylvain-bauza) wrote : | #4 |
In the meantime, could you please provide us the block device mapping information that's stored in the DB and ideally the cinder-side attachment information ?
Putting the bug report to Incomplete, please mark its status back to New when you reply.
Changed in nova: | |
status: | New → Incomplete |
Jan Wasilewski (janwasilewski) wrote : | #5 |
Hi,
below you can find requested information from OpenStack DB. There is no issue right now, but maybe historical tracking could list to some hint? Anyway, issue was related with /dev/vdb drive for instance: 128f1398-
mysql> select * from block_device_
+------
| created_at | updated_at | deleted_at | id | device_name | delete_
Changed in nova: | |
status: | Incomplete → New |
Jeremy Stanley (fungi) wrote : | #6 |
I've added Cinder as an effected project (though maybe it should be os-brick?) and subscribed the Cinder security reviewers for additional input.
Rajat Dhasmana (whoami-rajat) wrote : | #7 |
Hi,
Based on the given information, the strange part is same multipath device is used for the old and new volume 36e00084100ee7e
36e00084100ee7e
size=115G features='0' hwhandler='0' wp=rw
`-+- policy=
|- 14:0:0:4 sdm 8:192 active ready running
|- 15:0:0:4 sdo 8:224 active ready running
|- 16:0:0:4 sdl 8:176 active ready running
`- 17:0:0:4 sdn 8:208 active ready running
36e00084100ee7e
size=115G features='0' hwhandler='0' wp=rw
`-+- policy=
|- 14:0:0:10 sdao 66:128 failed faulty running
|- 14:0:0:4 sdm 8:192 active ready running
|- 15:0:0:10 sdap 66:144 failed faulty running
|- 15:0:0:4 sdo 8:224 active ready running
|- 16:0:0:10 sdan 66:112 failed faulty running
|- 16:0:0:4 sdl 8:176 active ready running
|- 17:0:0:10 sdaq 66:160 failed faulty running
`- 17:0:0:4 sdn 8:208 active ready running
Also it's interesting to note that the paths under the multipath device (sdm, sdo, sdl, sdn) with LUN ID: 4 are also used by the second multipath device whereas it should use LUN 10 paths (which is currently in failed faulty status).
This looks multipath related but it would be helpful if we can get the os-brick logs for this 1GB volume attachment to understand if os-brick is doing something that is resulting in this.
I would also recommend to cleanup the system with any leftover devices of past failed detachments (i.e. flush and remove mpath devices not belonging to any instance) that might be interfering with this. Although I'm not certain if that's the case, it's still to cleanup those devices.
Gorka Eguileor (gorka) wrote : | #8 |
Hi,
I think I know what happened, but there are some things that don't match unless
somebody has manually changed some things in the host (like cleaning up
multipaths).
Bit of context:
- SCSI volumes (iSCSI and FC) on Linux are NEVER removed automatically by the
kernel and must always be removed explicitly. This means that they will
remain in the system even if the remote connection is severed, unless
something in OpenStack removes it.
- The os-brick library has a strong policy of not removing devices from the
system if flushing fails during detach, to prevent data loss.
The `disconnect_volume` method in the os-brick library has an additional
parameter called `force` to allow callers to ignore flushing errors and
ensure that the devices are being removed. This is useful when after failing
the detach the volume is either going to be deleted or into error status.
I don't have the logs, but from what you said my guess is that this is what has
happened:
- Volume with SCSI ID 36e00084100ee7e
host on LUN 10 at some point since the last reboot (sdao, sdap, sdan, sdaq).
- When detaching the volume from the host using os-brick the operation failed
and it wasn't removed, yet Nova still called Cinder to unexport and unmap the
volume. At this point LUN 10 is free on the Huawei array and the volume is
no longer attacheable, but /dev/sda[o-q] are still present, and their SCSI_ID
are still known to multipathd.
- Nova asked Cinder to attach the volume again, and the volume is mapped to LUN
4 (which must have been available as well) and it successfully attaches (sdm,
sdo, sdl, sdn), appears as a multipath, and is used by the VM.
- Nova asks Cinder to export and map the new 1GB volume, and Huawai maps it to
LUN 10, at this point iSCSI detects that the remote LUNs are back and
reconnects to them, which makes the multipathd path checker detect sdao,
sdap, sdan, sdaq are alive on the compute host and they are added to the
existing multipath device mapper using their known SCSI ID.
You should find out why the detach actually failed, but I think I see multiple
issues:
- Nova:
- Should not call Cinder to unmap a volume if the os-brick to disconnect the
volume has failed, as we know this will leave leftover devices that can
cause issues like this.
- If it's not already doing it, Nova should call disconnect_volume method
from os-brick passing force=True when the volume is going to be deleted.
- os-brick:
- Should try to detect when the newly added devices are being added to a
multipath device mapper that has live paths to other LUNs and fail if that
is the case.
- As an improvement over the previous check, os-brick could forcefully remove
those devices that are in the wrong device mapper, force a refresh of their
SCSI IDs and add them back to multipathd to form a new device mapper.
Though personally this is a non trivial and maybe potentially problematic
feature.
In other words, the source of the problem is probably Nova, but os-brick should
try to prevent these possible data leaks.
Cheers,
Gorka.
[1]: https:/
Dan Smith (danms) wrote : | #9 |
I don't see in the test scenario description that any instances had to be deleted or volumes disconnected for this to happen. Maybe the reporter can confirm with logs if this is the case?
I'm still chasing down the nova calls, but we don't ignore anything in the actual disconnect other than "volume not found". I need to follow that up to where we call cinder to see if we're ignoring a failure.
When you say "nova should call disconnect_volume with force=true if the volume is going to be deleted... I'm not sure what you mean by this. Do you mean if we're disconnecting because of *instance* delete and are sure that we don't want to let a failure hold us up? I would think this would be dangerous because just deleting an instance doesn't mean you don't care about the data in the volume.
It seems to me that if brick *has* the information available to it to avoid connecting a volume to the wrong location, that it's the thing that needs to guard against this. Nova has no knowledge of the things underneath brick, so we don't know that wires are going to get crossed. Obviously if we can do stuff to avoid even getting there, then we should.
Jan Wasilewski (janwasilewski) wrote : | #10 |
Hi,
I'm just wondering if there is a chance for me to try to reproduce an issue again with all debug flags set to on. Should I turn on this flag on controllers(cinder, nova) or compute node logs(with debug flags set to on) should be enough to further troubleshoot this issue? If yes, please let me know which flags are needed here, just to speed up further troubleshooting. As I said - this case is not easy to reproduce, I can't even say what is a trigger here, but we faced it 3 or 4 times already.
Thanks in advance for reply and your helps so far.
Best regards,
Jan
Gorka Eguileor (gorka) wrote : | #11 |
Apologies if I wasn't clear enough.
The disconnect call I say it's probably being ignored/swallowed is the one to os-brick, not Cinder. In other words, Nova first calls os-brick to disconnect the volume from the compute host and then always considers this as successful (at least in some scenarios, probably instance destruction). Since it always considers in those scenarios that local disconnect was successful it calls Cinder to unmap/unexport the volume.
The force=True parameter to os-brick's disconnect_volume should only be added when the BDM for the volume has the delete on disconnect flag thingy.
OS-Brick has the information, the problem is that multipathd is the one that is adding the leftover devices that have been reused to the multipath device mapper.
Gorka Eguileor (gorka) wrote : | #12 |
A solution/workaround would be to change /etc/multipath.conf and set "recheck_wwid" to yes.
I haven't actually tested it myself, but the documentation explicitly calls out that it's used to solve this specific issue: "If set to yes, when a failed path is restored, the multipathd daemon rechecks the path WWID. If there is a change in the WWID, the path is removed from the current multipath device, and added again as a new path. The multipathd daemon also checks the path WWID again if it is manually re-added."
I believe this is probably something that is best fixed at the deployment tool level. For example extending the multipathing THT template code [1] to support "recheck_wwid" and defaulting it to yes instead to no like multipath.conf does.
[1]: https:/
Dan Smith (danms) wrote : | #13 |
Okay, thanks for the clarification.
Yeah, recheck_wwid seems like it should *always* be on to prevent potentially reconnecting to the wrong thing!
Jeremy Stanley (fungi) wrote : | #14 |
If that configuration ends up being the recommended solution, we might want to consider drafting a brief security note with guidance for deployers and maintainers of deployment tooling.
Unless I misunderstand the conditions necessary, it sounds like it would be challenging for a malicious user to force this problem to occur. Is that the current thinking? If so, we could probably safely work on the actual text of the note in public.
melanie witt (melwitt) wrote : | #16 |
> The disconnect call I say it's probably being ignored/swallowed is the one to os-brick, not Cinder. In other words, Nova first calls os-brick to disconnect the volume from the compute host and then always considers this as successful (at least in some scenarios, probably instance destruction). Since it always considers in those scenarios that local disconnect was successful it calls Cinder to unmap/unexport the volume.
I just checked and indeed Nova will ignore a volume disconnect error in the case of an instance being deleted [1]:
try:
except Exception as exc:
with excutils.
if cleanup_
# Don't block on Volume errors if we're trying to
# delete the instance as we may be partially created
# or deleted
In all other scenarios, Nova will not proceed further if the disconnect was not successful.
If Nova does proceed past _disconnect_
[1] https:/
[2] https:/
Jan Wasilewski (janwasilewski) wrote : | #17 |
I believe it can be a bit challenging for ubuntu users to introduce recheck_wwid parameter. What I checked already, this parameter is available for multipath-tools, but the package which provides it is on-board with ubuntu 22.04LTS. Older ubuntu releases do not have this possibility and gives an error:
/etc/multipath.conf line XX, invalid keyword: recheck_wwid
I made such assumption based on release documentation:
- for ubuntu 20.04: https:/
- for ubuntu 22.04: https:/
So it seems that partially Yoga, but fully Zed OS release can take such parameter directly, but older releases should manage such change differently.
I know that OpenStack code is independent of linux distros, but just wanted to add this info here, as worth to consider.
Gorka Eguileor (gorka) wrote : | #18 |
I don't know if my assumption is correct or not, because I can't reproduce the multipath device mapper situation from the report (some failed some active) no matter how much I force things to break in different ways.
Since each iSCSI storage backend behaves differently I don't know if I can't reproduce it because the difference in behavior or because the way I'm trying to reproduce it is different. It may even be that multipathd is different in my system.
Unfortunately I don't know if the host where that happened had leftover devices before the leak happened, or what the SCSI IDs of the 2 volumes involved really are.
From os-brick's connect_volume perspective what it did is the right thing, because when it looked at the multipath device containing the newly connected devices it was dm-9, so that's the one that it should return.
How multipath ended up with 2 different volumes in the same device mapper, I don't know.
I don't think "recheck_wwid" would solve the issue because os-brick would be too fast in finding the multipath and it wouldn't give enough time for multipathd to activate the paths and form a new device mapper.
In any case I strongly believe that nova should never proceed to delete the cinder attachment if detaching with os-brick fails because that usually implies data loss.
The exception would be when the cinder volume is going to be delete after disconnecting it, and in that case the disconnect call to os-brick should be always forced, since data loss is irrelevant.
That would ensure that compute nodes are not left with leftover devices that could cause problems.
I'll see if I can find a reasonable improvement in os-brick that would detect this issues and fail the connection, although it's probably going to be a bit of a mess.
Jan Wasilewski (janwasilewski) wrote : | #19 |
@Gorka Eguileor: I can try to reproduce this case with recheck_wwid option set to true when a valid package of multipath-tools will be available for ubuntu 20.04.
What I can add is that it happened only for one compute node, but I've seen similar warnings in other compute nodes in dmesg -T output, which looks dangerously, but so far I haven't faced similar issue there:
[Thu Feb 9 14:28:16 2023] scsi_io_completion: 42 callbacks suppressed
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#2 Sense Key : Illegal Request [current]
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#2 Add. Sense: Logical unit not supported
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#2 CDB: Read(10) 28 00 03 bf ff 00 00 00 08 00
[Thu Feb 9 14:28:16 2023] print_req_error: 42 callbacks suppressed
[Thu Feb 9 14:28:16 2023] print_req_error: I/O error, dev sdgr, sector 62914304
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#2 Sense Key : Illegal Request [current]
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#2 Add. Sense: Logical unit not supported
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#2 CDB: Read(10) 28 00 03 bf ff 00 00 00 01 00
[Thu Feb 9 14:28:16 2023] print_req_error: I/O error, dev sdgr, sector 62914304
[Thu Feb 9 14:28:16 2023] buffer_io_error: 30 callbacks suppressed
[Thu Feb 9 14:28:16 2023] Buffer I/O error on dev sdgr1, logical block 62686976, async page read
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#3 Sense Key : Illegal Request [current]
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#3 Add. Sense: Logical unit not supported
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#3 CDB: Read(10) 28 00 03 bf ff 01 00 00 01 00
[Thu Feb 9 14:28:16 2023] print_req_error: I/O error, dev sdgr, sector 62914305
[Thu Feb 9 14:28:16 2023] Buffer I/O error on dev sdgr1, logical block 62686977, async page read
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#4 Sense Key : Illegal Request [current]
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#4 Add. Sense: Logical unit not supported
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#4 CDB: Read(10) 28 00 03 bf ff 02 00 00 01 00
[Thu Feb 9 14:28:16 2023] print_req_error: I/O error, dev sdgr, sector 62914306
[Thu Feb 9 14:28:16 2023] Buffer I/O error on dev sdgr1, logical block 62686978, async page read
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#5 Sense Key : Illegal Request [current]
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#5 Add. Sense: Logical unit not supported
[Thu Feb 9 14:28:16 2023] sd 15:0:0:98: [sdgr] tag#5 CDB: Read(10) 28 00 03 bf ff 03 00 00 01 00
[Thu Feb 9 14:28:16 2023] print_req...
Gorka Eguileor (gorka) wrote : | #20 |
Don't bother trying with recheck_wwid, as it won't work due to the speed of os-brick.
Gorka Eguileor (gorka) wrote : | #21 |
I have finally been able to reproduce the issue.
So far I have been able to identify 3 different ways to create similar situations to the reported one, and it was what I thought, leftover devices from a 'nova delete' call.
Took me longer to figure it out because it requires an iSCSI Cinder driver that uses shared targets, and the one I use doesn't.
After I locally modified the cinder driver code to do target sharing and then force a disconnect error on specific Nova calls to os-brick I was able to work it out.
I have a local patch that detects these issues and fixes them the best it can, but I wouldn't like to backport that, because the fixing is a bit scary as a backport.
So I'll split the code into 2 patches:
- The backportable patch that detects and prevents the connection if a potential leak is detected. To fix this manual intervention will be necessary.
- Another patch that extends the previous code to try to fix things when possible.
melanie witt (melwitt) wrote : | #22 |
> In any case I strongly believe that nova should never proceed to delete the cinder attachment if detaching with os-brick fails because that usually implies data loss.
> The exception would be when the cinder volume is going to be delete after disconnecting it, and in that case the disconnect call to os-brick should be always forced, since data loss is irrelevant.
> That would ensure that compute nodes are not left with leftover devices that could cause problems.
Understood. I guess that must mean that the reported bug scenario is a volume that is *not* delete_
I think we could probably propose a patch in nova to not delete the attachment if it's instance delete + not delete_
Gorka Eguileor (gorka) wrote : | #23 |
Hi Melanie,
In my opinion there should be 2 code changes to prevent leaving devices behind:
- Instance deletion operation should fail like the normal volume-detach call when the disconnect_volume call fails, even if the instance is left in a "weird" state, manual intervention is usually necessary to fix things.
This manual intervention does not necessarily mean doing something to the volume, it can be fixing the network.
- Any Cinder volume with delete_
The tricky part here is that not all os-brick connectors support the force parameter, so when the call fails we have to decide whether to halt the operation and wait for human intervention, or just log it and continue as we are doing today.
We could make an effort in os-brick to increase coverage of the force parameter.
Thanks,
Gorka.
Dan Smith (danms) wrote : | #24 |
Our policy is that instance delete should never fail, and I think that's the experience the users expect. Perhaps we need to still mark the instance deleted immediately and continue retrying the volume detach in a periodic until it succeeds, but that's the only thing I can see working.
Sylvain Bauza (sylvain-bauza) wrote : | #25 |
Agree with Dan, we shouldn't raise an exception on instance delete but rather possibly make some status available for knowing whether the volume was eventually detached.
For example, we accept to delete an instance if the compute goes down (as the user may not know that the underlying compute is in a bad state) and we only delete the instance when the compute is back.
That being said, I don't really see how we can easily fix this in a patch as we should discuss this correctly. Would a LOG statement adverting that the volume connection is still present would help ?
melanie witt (melwitt) wrote : | #27 |
We definitely should not allow a delete to fail from a user's perspective.
My suggestion of a patch to not delete an attachment when detach fails during instance delete if delete_
We could consider doing a periodic like Dan mentions. We already do similar with our "cleanup running deleted instances" periodic. The volume attachment cleanup could be hooked into that if it doesn't already do it.
From what I can tell, our periodic is already capable of taking care of it, but it's not enabled [1][2]:
elif action == 'reap':
bdms = objects.
try:
def _cleanup_
for bdm in bdms:
if detach and bdm.volume_id:
if bdm.volume_id and bdm.delete_
if original_exception is not None and raise_exc:
raise original_exception
Currently we're calling _cleanup_volumes with detach=False. Not sure what the reason for that is but if we determine there should be no problems with it, we can change it to detach=True in combination with not deleting the attachment on instance delete if delete_
[1] https:/
[2] https:/
Gorka Eguileor (gorka) wrote : | #28 |
What is the reason why Nova has the policy that deleting the instance should never fail?
I'm talking about the instance record, not the VM itself, because I agree that the VM should always be deleted to free resources.
From my perspective deleting the instance record would result in a very weird user experience and in users manually creating the same situation we are trying to avoid.
- User requests instance deletion
- Calls to disconnect_volume fails
- Nova removes everything it can and at the end even the instance record, while it keeps trying to disconnect the device in the background.
- User wants to use the volume again but sees that it's in-use in Cinder
- Looks for the instance in Nova thinking that something may have gone wrong, but not seeing it there thinks it's a problem between cinder and nova.
- Runs the `cinder delete-attachment` command to return the volume to available state.
We end up in the same situation as we were before, with leftover devices.
Dan Smith (danms) wrote : | #29 |
Because the user wants to delete a thing in our supposed "elastic infrastructure". They want their quota back, they want to stop being billed for it, they want the IP for use somewhere else, or whatever. They don't care that we can't delete it because of some backend failure - that's not their problem. That's why we have the ability to queue the delete even if the compute is down - that's how important it is.
It's also not at all about deleting the VM, it's about the instance going away from the perspective of the user (i.e. marking the instance record as deleted). The instance record is what determines if they're billed for it, if their quota is used, etc. We "charge" the user the same whether the VM is running or not. Further, even if we have stopped the VM, we cannot re-assign the resources committed to that VM until the deletion completes in the backend. Another scenario that infuriates operators is "I've deleted a thing, the compute node should be clear, but the scheduler tells me I can't boot something else there."
Your example workflow is exactly why I feel like the solution to this problem can't (entirely) be one of preventing a delete if we fail to detach. Because the admins will just force-delete/
It seems to me that there *must* be some way to ensure that we never attach a volume to the wrong place. Regardless of how we get there, there must be some positive affirmation that we're handing precious volume data to the right person.
Gorka Eguileor (gorka) wrote : | #30 |
The quota/billing issue is a matter of Nova code. In cinder we resolve it by having a flag for resources (volume and snapshots) to reflect whether they consume quota or not.
The same thing could be done in Nova to reflect what resources are actually consumed by the instance (IPs, VMs, GPUs, etc) and therefore billable.
Users not caring about backend errors would be, in my opinion, naive thinking on their part, since they DO CARE about their persistent data being properly written and they want to avoid data loss, data corruption, and data leakage above all else.
I assume users would also want to have a consistent view of their resources, so if a volume says it's attached to an instance the instance should still exist, otherwise there is an invalid reference.
Data leak/corruption may be prevented in some cases with the code I'm working on for os-brick (although some drivers are missing the feature required), but that won't prevent data loss. For that Nova would need to do the sensible thing.
I'm going to do some additional testings today, because this report is about something that happens accidentally, but I believe there is a way to actually exploit this to gain access to other users data. Though fixing that would require yet another bunch of code.
In other words, there are 3 different to fix here:
- Nova doing the right thing to prevent data corruption/
- os-brick detection of the right volume to prevent data leak.
- Prevent intentional data leak.
Jeremy Stanley (fungi) wrote : | #31 |
If there is indeed a way for a normal user (not an operator) of the environment to cause this information leak to happen and then take advantage of it, we should find a way to prevent at least that aspect before making this report public.
If it's not a condition that a normal user can intentionally cause to happen, then it's probably fine to fix this in public instead.
Sylvain Bauza (sylvain-bauza) wrote : | #32 |
Gorka, Nova even doesn't really know about the Cinder backends, it just uses os-brick.
So, when Nova asks to attach a volume, only os-brick knows whether it's the right volume. That's why I think it's important to have brick to be able to say 'no'.
Dan Smith (danms) wrote : | #33 |
Right, we have to trust os-brick to give us a block device that is actually the thing we're supposed to attach to the guest.
I'm really concerned about what sounds like a very loose association between what we pass to brick from cinder and what we get back from brick in terms of a block device. Isn't there some way for brick to walk the multipath device and the backing iSCSI/FC devices to check WWNs or something to ensure that it's consistent and points to what we expect?
Sylvain Bauza (sylvain-bauza) wrote : | #34 |
> If there is indeed a way for a normal user (not an operator) of the environment to cause this information leak to happen and then take advantage of it, we should find a way to prevent at least t hat aspect before making this report public.
Well, I'm trying hard to find a possible attack vector from a malicious user and I don't see any.
I don't disagree with the bug report as it can potentially leak data to any instance, but I don't know how someone could take benefit of this information.
Here, I'm just one voice and I leave others to chime in, but I'm in favor of making this report public so we can discuss the potential solutions with the stakeholders and any operator having concerns about it.
Gorka Eguileor (gorka) wrote : | #35 |
Let me summarize things:
1. The source of the problem reported in this bug is that Nova has been doing something wrong since forever. I've been bringing this up for the past 7 years, and every single time we end up in the same place, nova giving priority to instance deletion over everything else.
2. There are some things that os-brick can do to try to detect when Nova doesn't do its job right, but this is equivalent to a taxi driver asking passengers to learn to fall because the car is not going to stop when they want to get off. It's a lot harder to do and it doesn't sound all that reasonable.
3. There is an attack vector that can be exploited and it's pretty easy to do (I've done it locally) but it's separate from the issue reported here and it hasn't existed for as long as the that one. I would resolve this in a different way than the workaround mentioned in #2.
Seeing as we are back to the same conversation of the past 7 years, we'll probably end up in the same place, so I'll just do my best to resolve the attack vector and also introduce code to resolve Nova's mistakes.
Gorka Eguileor (gorka) wrote : | #36 |
Oh, I failed to clarify something. The user exploit case can be made secure (as far as I can tell), but for the scenario in this bug's description, the only secure solution is fixing nova, the os-brick code I'm working on will only reduce the window were the data is leaked or can be corrupted.
Sylvain Bauza (sylvain-bauza) wrote : | #37 |
Gorka, I don't want to debate on projects's responsibility, but I'd rather focus on the data leakage, which is the subject of this security report.
The fact that a volume detach can leave residue if a flush error occurs is certainly not ideal, but this isn't a security problem *UNTIL* the remaining devices are reused.
To me, it appears that the data leal occurs on the attach and not on the detach and I'd rather prefer to see os-brick avoiding this situation.
That being said, I think Melanie, Dan and I agreed on trying to find a way to asynchronously clean up the devices (see comments #24 #25 and #27) and that can be discussed publicly, but again, this won't help with the data leakage that occurs on the attach command.
Dan Smith (danms) wrote : | #38 |
Okay Gorka and I just had a nice long chat about things and I think we made some progress on understanding the (several) ways we can get into this situation and came up with some action items. I'll try to summarize here and I'll look for Gorka to correct me if I get anything wrong.
I think that we're now on the same page that delete of a running instance is much more of a forceful act than some might think, and that we expect to try to be graceful with that, but with a limited amount of patience before we kill it with fire. That maps to us actually always calling force=True when we do the detachment. Even with force=True, brick *tries* to flush and disconnect gracefully, but if it can't, will cut things off at the knees. Thus, if we did force=True now, we wouldn't get into the situation the bug describes because we would *definitely* have cleaned up at that point.
It sounds like there are some robustification steps that can be made in brick to do more validation of the full chain from instance-
Gorka also described another way to get into this situation, which is much more exploitable by the user, and I'll let him describe it in more detail. But the short story is that cinder should not let users delete attachments for instances that nova says are running (i.e. not deleted).
Multipathd, while well-intentioned, also has some behavior that is counterproductive when recovering from various situations where paths to a device get disconnected. Enabling the recheck_wwid thing in multipathd should be a recommended flag to have enabled to reduce the likelihood of that happening. Especially in the case where nova has allowed a blind delete due to a downed compute node, we need multipathd to not "help" by reattaching things without extra checks.
So, the action items roughly are:
1. Nova should start passing force=True in our call to brick detach for instance delete
2. Recommend the recheck_wwid flag for multipathd, and get deployment tools to enable it
3. Robustification of brick's attach workflow to do some extra sanity checks
4. Cinder should refuse to allow users to delete an attachment for an active volume
Based on the cinder user-exploitable attack vector, it sounds to me like we should keep this bug private on that basis until we have at least the cinder/nova validation step in place. We could create another one for just that scenario, but publicizing the accidental scenario and discussion we have in this bug now might be enough of a suggestion that more people would figure out the user-oriented attack.
Gorka Eguileor (gorka) wrote : | #39 |
Sylvain, the data leak/corruption presented in this bug report is caused by the detach on the nova side.
It may happen when we do the attach, but it is 100% caused by the detach problem, so just focusing on the attach part is not right considering the RCA is the leftover devices from the detach.
Sylvain Bauza (sylvain-bauza) wrote : | #40 |
Gorka, I eventually understood all the problems we have and what Dan wrote at comment #38 look good to me as action items.
Yeah, we need to keep this bug private for a bit until we figure out a solid plan for fixing those 4 items and yeah, we need to both force-delete the attachment while we also try to solidify the attachment calls.
melanie witt (melwitt) wrote : | #41 |
I'm attaching a potential patch for nova to use force=True when calling os-brick disconnect_volume() when an instance is being deleted.
Only the libvirt and hyperv drivers are calling os-brick disconnect_volume() that I found, and it's part of the driver.destroy() path.
This change ended up being larger than expected ... I aimed to add basic test coverage for passing the force kwarg through and there are a lot of volume drivers.
If anyone wants something changed or otherwise finds issues in the patch, please let me know.
Gorka Eguileor (gorka) wrote : | #42 |
Hi Melanie,
I have tried the patch and works as expected, resolving the most common case of having leftover devices on Compute nodes. Thanks!!
Dan mentioned that the delete of an instance is more in line with a power removal of a computer than a shutdown, and that's why using `force=True` makes sense because it will try to do it cleanly if possible but data loss is possible.
I looked at the API docs [1] for the delete operation and I don't see this idea stated there. Should we update the docs to explicitly state that deleting an instance can result in data loss?
Cheers.
melanie witt (melwitt) wrote : | #43 |
Hi Gorka,
Thank you for trying out the patch!
I agree more detailed docs could be helpful and have proposed a doc update for review:
Gorka Eguileor (gorka) wrote : | #44 |
This is the patch I've prepared for Cinder to prevent users from exploiting the data leak issue or even to unintentionally leave leftover devices by deleting the cinder attachment record.
With the nova patch and this one we cover most of the scenarios, but not all, since I've been told that there are scenarios where an instance is deleted without contact with the actual
I have to cleanup the os-brick code, write the unit tests, and see how the "recheck_wwid" multipath config option interacts with it.
I also have to try and see if the issue also happens in FC, in which case I would need to modify the os-brick patch and also write a new one to add support for the "force" parameter in the "disconnect_volume" method.
Since there are some calls to Nova I would appreciate reviews from the Nova team to confirm that I didn't miss anything.
Gorka Eguileor (gorka) wrote : | #45 |
I can't reproduce the issue using FC with an HPE 3PAR array, debugging it I found that the compute node receives a signal after the LUN has been remapped (this didn't happen in my iSCSI tests):
Feb 17 13:05:20 localhost.
Feb 17 13:05:20 localhost.
This is detected as a "change" in the block device:
Feb 17 13:05:20 localhost.
Which triggers the code that uses an SCSI command to get the volume's WWID and then updates sysfs to reflect it.
Feb 17 13:05:20 localhost.
After that rule another one for multipath is triggered to tell multipathd that it needs to check a device:
Feb 17 13:05:20 localhost.
Multipathd detects that the WWID has changed (because sysfs has been updated):
Feb 17 13:05:20 localhost.
And then reconfigures the old multipath device mapper to remove this device:
Feb 17 13:05:20 localhost.
Feb 17 13:05:20 localhost.
Feb 17 13:05:20 localhost.
And finally the new device mapper is formed:
Feb 17 13:05:21 localhost.
I don't know if this is standard FCP behavior or if this is storage array specific and other storage arrays may not behave like this. I'm trying to get access to a different FC array to confirm.
Sean McGinnis (sean-mcginnis) wrote : | #46 |
> I can't reproduce the issue using FC with an HPE 3PAR array, debugging it I found that the compute node receives a signal after the LUN has been remapped
This makes sense. On fibre channel fabrics, any time a LUN is added or removed an RSCN (https:/
So in this case we are somewhat protected by the storage protocol itself.
Gorka Eguileor (gorka) wrote : | #47 |
Thanks Sean.
Those RSCNs should be the equivalente of the iSCSI AEN messages, that usually trigger the automatic scan of LUNs on the initiator side.
Those aren't happening in OpenStack iSCSI because I added a feature in Open iSCSI that we use os-brick to be able to disable them and only allow manual scans, that way we don't get leftover devices on the compute node when there's a race condition: a volume mapping to that compute happens on Cinder right after a 'volume_disconnect' has happened on that same compute node.
I'll have to check why we haven't seen that situation in FC, because if it's detecting new LUNs and acting on them then we should also get leftover devices
The only explanation I can think of is that maybe in FC the scan is not for all the LUNs but only the LUNs that are currently present in the host.
Simon from Pure is looking to see if he can give me access to a system to double check it also behaves like that.
Gorka Eguileor (gorka) wrote : | #48 |
I did some additional testings related to my latest comment and the results are:
- LUN change notifications do not trigger a rescan in FCP, which is good because then we cannot have race conditions between detach and attach. That had been our understanding so far.
- The message that prevents the leak with FCP by triggering the udev rule is the "Power-on Reset" SCSI Sense code that is sent from the array, so I still need to check if this is common practice or not. Tomorrow I'll check it in one of Pure's arrays.
Gorka Eguileor (gorka) wrote : | #49 |
Bad news, as I feared the "Power-on Reset" that was "saving" us in FCP is not standard, and Pure storage array using FCP does not send it.
This means that we are not safe for FC and need to fix these issues in that os-brick connector as well. :-(
Sylvain Bauza (sylvain-bauza) wrote : | #50 |
FWIW, I reviewed melanie's patch in comment #41 and I'm ~+2 with it.
Yeah, it was a larger than expected patch given we need to modify the signature for all the volume drivers :)
Gorka, do you want me to review your patch too, even if you found some issues with FC backends ?
Gorka Eguileor (gorka) wrote : | #51 |
Sylvian, yes please, I would appreciate your review, as the Cinder patch is agnostic to the protocol.
The FC issue was relevant for the leak-prevention os-brick patch that I've been working on.
Attached patch for os-brick adds support to "disconnect_volume" on the FC connector for the "force" parameter. This is necessary for the Nova patch to also cover the FC cases.
Gorka Eguileor (gorka) wrote : | #52 |
This patch is the os-brick leak prevention code that tries to detect and prevent data leak and corruption. It applies on top of the previous os-brick FC patch.
As I see it we have multiple situations that can lead to leak/corruption:
- The CVE that any normal user can exploit: Addressed by the Cinder patch.
- Unintended issue caused when deleting an instance if the detach fails: Addressed by the Nova and os-brick FC patches.
- Other scenarios: Such as when an instance is destroyed without access to the compute node and then access to the node is restored and we work with it without manually cleaning things up. This is covered by the os-brick large patch.
I would say that the current 4 patches cover 99% of the problematic cases. We can cover another 0.5% of the cases if we add "recheck_wwid yes" to multipath.conf when using the latest os-brick patch, but that's something we can work in the open in tripleo.
This last os-brick patch is kind of a big one, which together with the things it does makes it a bit risky to backport it, so it may be wise to not backport it right away.
In other words, in my opinion we should just backport the cinder, nova, and FC os-brick patch.
Nick Tait (nickthetait) wrote : | #53 |
It is not apparent to me who is waiting on what right now.
Gorka, could you help me better understand what is required for an attacker to exploit this? I made a rough guess at CVSS score: https:/
* Could this be executed remotely?
* What is the level of complexity to exploit?
* Could an attacker exploit this multiple times and eventually gain control of all images within the OpenStack deployment?
* Attacker would need at least a basic user account right?
Fungi, what are your thoughts on security classification? Possibly A or B1? Is it too early to pick a disclosure date?
Jeremy Stanley (fungi) wrote : | #54 |
We have attached patches at this point for cinder, nova and (2 for) os-brick. It's not yet clear that there's consensus from the reviewers on this bug that the proposed fixes are sufficient and appropriate for backporting (at least to officially maintained stable branches, so as far back as stable/xena right now). Assuming the chosen fixes are suitable for backport, class A seems like the closest fit based on hints in comments #35 and #38 that there is an easily-exploitable condition for a normal user of the environment (but as of yet the details have not been explained that I've seen here). Of course, before I can attempt to summarize this set of risks into an appropriate impact description, we'll need more information on that.
Following our current 90-day maximum embargo policy we have at most 8 weeks to figure this out, but of course it would be better to have it over and done with at the soonest opportunity. Basically if we can get consensus on the patches and a clearer explanation for the exploit scenarios and possible mitigations, then I'll apply for a CVE assignment from MITRE with that information. In parallel, we'll need clean patches for all of the above fixes backported at least as far as stable/xena. Once we have all that, we'll pick a disclosure date roughly a week out and send advance copies of the description and patches to downstream stakeholders so they can begin preparing their own packages.
Note that an additional wrinkle is the looming OpenStack 2023.1 coordinated release, which means that stable/2023.1 branches have already been created and we'll need backports from master to those as well (though I expect they'll be identical to the master branch patches in most cases). We'll also need to make sure to list the OpenStack 2023.1 release versions as affected since I highly doubt we'll publish in time to make one of the final RCs.
Dan Smith (danms) wrote : | #55 |
I think this is *network* not *local* right? A user can trigger this via the API. They have to be authenticated, so they can't just be some random person, but they can cause the system to give them access to *other* users' data. Doesn't that also mean the "scope" is "changed"? Meaning, my guess is that it should have this scoring:
https:/
Gorka, I haven't tested your patch myself, but you and I did discuss it earlier. Looking at it now, I'm wondering: how cinder can redirect or check with nova for a regular volume detach? If nova is the one doing the volume detach (via cinder) how does cinder know not to just redirect back to nova (creating a loop)? Is there some state cascade that we rely on to know that the detach has gone through nova at some point?
Gorka Eguileor (gorka) wrote : | #56 |
Hi Nick,
> It is not apparent to me who is waiting on what right now.
I'm waiting on reviews, though Rajat suggested to me that I do a video session to explain the whole issue to facilitate reviews and assesment.
> * Could this be executed remotely?
Yes a normal user with normal credentials can exploit it.
> * What is the level of complexity to exploit?
Trivial.
Basically create a VM, attach one of your volumes to it, ask Cinder to delete the attachment record for the volume, then wait for another volume from any user to be attached to the same host and read the data.
This only works for iSCSI drivers that share targets, and some FC drivers.
> * Could an attacker exploit this multiple times and eventually gain control of all images within the OpenStack deployment?
The attacker would have access to volumes as long as they are present on the host.
So if owner of the volume detaches it, or the instance is migrated to another host, then access to the volume is lost.
* Attacker would need at least a basic user account right?
Yes
Gorka Eguileor (gorka) wrote : | #57 |
Hi Jeremy,
There are multiple cases/scenarios captured in this bug:
- User exploitable scenario.
- Unintentional scenarios that can happen after destroying a VM with an
attached volume fails to cleanly detach the volume.
- Other scenarios.
The summary of the user exploitable vulnerability would be something like:
A normal user can get access other users/projects volumes that are connected to
the same compute host where they are running an instance.
This issue doesn't affect every OpenStack deployment, for the exploit to
work there needs to be the right combination of nova configuration,
storage transport protocol, cinder driver approach to mapping volumes,
and storage array behavior.
I don't have access to all storage types supported by OpenStack, so I've
only looked into: iSCSI, FCP, NVMe-oF, and RBD.
It is my believe that this only affects SCSI based transport protocols
(iSCSI and FCP) and only under the following conditions:
- For iSCSI the Cinder driver needs to be using what we call shared
targets: the same iSCSI target and portal tuple is used to present
multiple volumes on a compute host.
- For FCP it depends on the storage array:
- Pure: Affected.
- 3PAR: Unaffected, because it sends the "Power-on" message that
triggers a udev rule that tells multipathd to make appropriate
changes.
The way to reproduce the issue is very straightforward, it's all about
telling Cinder to delete an attachment record from a volume attached to
a VM instead of doing the detachment the right way via Nova. Then when
the next volume from that same backend is attached to the host our VM
will have access to it.
I'll give the steps using a devstack deployment, but the same would
happen on a Triple-O deployment.
The only pre-requirement is that Cinder is configured to use one of the
storage array and driver combinations that is affected by this, as this
happens both in single paths as well as multipath attachments.
Steps for the demo user to have access to a volume owned by the admin
user to have access :
$ . openrc demo demo
$ nova boot --flavor cirros256 --image cirros-
$ cinder create --name demo 1
$ openstack server add volume myvm demo
# The next 2 lines are the exploit which delete the attachment record
$ attach_
$ cinder --os-volume-
$ . openrc admin admin
$ nova boot --flavor cirros256 --image cirros-
$ cinder create --name admin 1
$ openstack server add volume admin_vm admin
# Both VMs use the same volume, so the demo VM can read the admin volume
$ sudo virsh domblklist instance-00000001
$ sudo virsh domblklist instance-00000002
The patches that have been submitted are related to the different
scenarios/cases described before:
- User exploitable scenario ==> Cinder patch
- Unintentional scenarios that can happen after destroying a VM with an
attached volume fails to cleanly detach the volume ==> Nova and small
os-brick patch
- Other scenarios ==> Huge os-brick patch
The "recheck_wwid yes...
Gorka Eguileor (gorka) wrote : | #58 |
Hi Dan,
> how cinder can redirect or check with nova for a regular volume detach?
The code is using the the "service_token" field from the context to detect if the request is coming from an OpenStack service (nova or glance), and if that's the case it processes the request.
If it's not coming from a service it does a couple of checks to allow manual cleanup requests. So it allows user delete attachment calls under the following circumstances:
- If the attachment record doesn't have an instance id.
- If the attachment record doesn't have connection information.
- If it has an instance, but the instance doesn't exist in Nova.
- If the attachment record in Nova's instance has a different ID from the one in the attachment.
melanie witt (melwitt) wrote (last edit ): | #59 |
I did a little testing of the cinder patch with a local devstack, looking for any way I could delete the cinder attachment without going through nova.
Unfortunately I found it appears I can bypass the redirect by sending the X-Service-Token header with my regular token. So it looks like we need to do a little more to validate whether it's nova calling. Not sure if we can maybe pull nova's user_id from keystone and then verify that as well or instead? Or maybe there is some other better way?
(later) Update: I dug around and found out why it's possible to easily fake a service token and it's because [keystone_
"""
Upgrade Notes
Set the service_token_roles to a list of roles that services may have. The likely list is service or admin. Any service_token_roles may apply to accept the service token. Ensure service users have one of these roles so interservice communication continues to work correctly. When verified, set the service_
"""
By default any authenticated user can send their valid token as a "X-Service-Token" and keystone will accept it as a valid service token.
If I however set in cinder.conf:
[keystone_
service_
My below repro attempt will be rejected with:
{"error": {"code": 401, "title": "Unauthorized", "message": "The request you have made requires authentication."}}
So either way we need a different way to verify whether it is nova calling DELETE /attachments/
[1] https:/
Repro steps:
Show that user "demo" does not have any service roles:
$ source openrc admin admin
$ openstack user list -f json
[
{
"ID": "a34218d9c4774d
"Name": "demo"
}
]
$ openstack role assignment list --user a34218d9c4774df
[
{
"Role": "member",
"User": "demo@Default",
"Group": "",
"Project": "invisible_
"Domain": "",
"System": "",
"Inherited": false
},
{
"Role": "anotherrole",
"User": "demo@Default",
"Group": "",
"Project": "demo@Default",
"Domain": "",
"System": "",
"Inherited": false
},
{
"Role": "creator",
"User": "demo@Default",
"Group": "",
"Project": "demo@Default",
"Domain": "",
"System": "",
"Inherited": false
},
{
"Role": "member",
"User": "demo@Default",
"Group": "",
"Project": "demo@Default",
"Domain": "",
"System": "",
"Inherited": false
}
]
Begin repro:
$ source openrc demo demo
$ openstack volume create --size 1 test2004555 -f json
{
"attachments": [],
"availability
"bootable": "false",
"consistencyg
"created_at": "2023-03-
"description": null,
"encrypted": false,
"id": "d66c2b17-
"multiattach": false,
"name": "test2004555",
"properties": {},
"replication_
"size": 1,...
Gorka Eguileor (gorka) wrote : | #60 |
Hi Melanie,
Thank you very much for testing the Cinder code, finding the loophole, and providing such detailed instructions.
I incorrectly assumed that keystonemiddleware would not only check that the service token in the header is valid, but that it was actually that of a service role.
I have changed the code to actually check that between the roles from the service token (if a valid one is provided) is actually that of a service.
I'll look on Monday if the new approach also works on older releases (in case we need a different approach for the backports) and also for Glance using Cinder as a backend (in case glance is not sending the service token).
Cheers.
Nick Tait (nickthetait) wrote : | #61 |
Dan, OK I agree with you on network exploitable.
The CVSS user gide gives a relevant example of scope change. See item 1 of section 3.5 on https:/
Given this I'm leaning towards a score of 8.8 https:/
melanie witt (melwitt) wrote : | #62 |
Hi Gorka,
I tried the new version of the cinder patch and it's working well from the nova point of view.
The new check for the service role prevents the X-Service-Token header bypass and there should not be any way to fake the roles because the roles on the RequestContext are extracted only from a validated token response from keystone which will return the roles internally associated with the token. (I tried sending my own X-Service-Roles header and it [correctly] did not work).
Other than that, upon review I noticed there are a few typos in the unit tests in the patch, for example "mock_action_
I also got one unit test fail when I ran them (test__
Thank you for fixing up the patch so fast!
Nick Tait (nickthetait) wrote : | #63 |
Thanks Gorka and Melanie for your development & testing efforts!
Quick quesion: Would it be possible for an administrator to disable deletion via cinder? this might serve as a mitigation
I took a crack at further condensing the vuln details below.
Impact: An openstack user could gain control of volumes from other users/projects. However, the scope of exposed images is limited to the compute host where the instance is running. Only SCSI based transport protocols are believed to be affected, but not all storage types have been tested.
Affected storage types: iSCSI and FCP
Unaffected storage types: NVMe-oF and RBD
Preconditions:
- For iSCSI the Cinder driver needs to be using "shared targets" where the same iSCSI target and portal tuple is used to present multiple volumes on a compute host.
- For FCP it depends on the storage array:
- Pure: Affected.
- 3PAR: Unaffected.
Attack scenario:
Use cinder to delete an attachment record from a volume which has already been attached to a VM
Gorka Eguileor (gorka) wrote : | #64 |
Thanks Melanie for catching those.
I had forgotten to update the tests and there were also some mistakes in the unit tests due to the misspellings.
I have deleted the old cinder patch and attached and updated one fixing the unit tests issues.
The code works as expected with Glance using Cinder as a backend as well.
Now I'll see if this approach works with older releases, since I don't know when services started sending the service token to each other.
Gorka Eguileor (gorka) wrote : | #65 |
I just realized that the cinder-patch needs improvements, because the presence of a service token in the request (and by extension the service roles in the context) depends on the deployment options, and some deployments may not have the "send_service_
I'll give the patch another thought and add code for that scenario.
My initial idea is to check current actions on the instance to determine if the request is coming from the service or not, though I'm not familiar with all the nova actions that can trigger a cinder detach action.
Gorka Eguileor (gorka) wrote : | #66 |
Hi Nick,
I've been looking at possible mitigations without code changes and there is a way with configuration changes and policy changes. Steps would be:
1- Configure cinder and nova to use the "service_user" and to send the token ("send_
2- Get the service uuid for the cinder and nova service users
3- If using Cinder as a glance backend, get the uuid for the "cinder_
4- Write the /etc/cinder/
Assuming that the user names for each of the services match the service name we can get their uuid with:
$ openstack user show nova -f value -c id
$ openstack user show cinder -f value -c id
$ openstack user show glance -f value -c id
The policy I would recommend writing is:
"is_nova_
"is_cinder_
"is_glance_
"is_service": "rule:is_
"volume:
A much smaller policy is possible, but I like the one above and is the one that have tested. This one probably works as well, assuming everything has been configured as mentioned above:
"volume:
These policies don't prevent:
- Admins shooting themselves in the foot
- Unintentional issues like the one originally reported in this case.
They should prevent the user induced vulnerability.
Cheers,
Gorka.
[1]: https:/
Gorka Eguileor (gorka) wrote : | #67 |
Hi Nick,
I like your vulnerability details, though there are a couple of small comments I'd like to make:
- "user could gain control of volumes" ==> It's more like they can gain read/write access to the volumes, but not control, because they cannot delete the volumes, take snapshots, etc.
- "the scope of exposed images" ==> This may be misleading, because when I hear the word "images" in the context of OpenStack I think of Glance images, not Cinder volumes.
- I feel like we are singling out Pure as the only affected FCP driver just because that's the one I could get my hands on. Maybe we can rephrase it:
- Drivers using FCP will be affected unless the array sends the "Power-on Reset" SCSI Sense code when mapping the volume. In our limited testings only a 3PAR array sent it, but this doesn't mean that all 3PARs will do.
Cheers,
Gorka.
Sylvain Bauza (sylvain-bauza) wrote : | #68 |
I quite like Gorkar's policy workarounds using the service_user tokens. That would help our operators to just modify their configurations without needing to upgrade some z-release and then the exploit wouldn't be possible.
I also looked at https:/
https:/
For this specific reason, unless we change the fix to use other APIs from Nova that are more older (but honestly, I don't really know which ones) or we explain in the vulnerability details that you need to use the policy workarounds if you're older than Xena.
Brian Rosmaita (brian-rosmaita) wrote : | #69 |
@Gorka: nice work finding the policy-based workaround!
The service_* properties have been exposed in oslo.context since 2.12.0 (Ocata) (commit 2eafb0eb6b0898), which, coincidentally is when the Attachments API that allows the exploit was introduced.
oslo.policy has been supporting a yaml policy file since 1.10.0 (Newton) (commit 83d209e9ed1a1f7f70) , so we'd only need to provide an example yaml file.
One thing we should mention is that for safety, the policy file should be explicitly mentioned in the configuration file for each service as the value of the [oslo_policy] policy_file option. That's because since Queens, if a policy_file isn't found, the policies defined in code are used, and until Wallaby or Xena, the default value for policy_file in most services was policy.json (which would mean that a policy.yaml file would be ignored in the default configuration). Likewise, in recent releases, a policy.json file is ignored in the default configuration, so it's safest to configure this explicitly.
melanie witt (melwitt) wrote : | #70 |
> I just realized that the cinder-patch needs improvements, because the presence of a service token in the request (and by extension the service roles in the context) depends on the deployment options, and some deployments may not have the "send_service_
Hm. I wonder if we could instead only check whether the user requesting has the "service" role (if "service" in RequestContext.
Technically a deployment could give any project or role to their service users (and omit any) ... so I'm not sure whether it's reasonable to assume any of the project names or role names or user names.
I just can't think of another real way to verify the identity of the caller other than openstack credentials. There has to be a source of truth for verifying the identity of any caller.
> I'll give the patch another thought and add code for that scenario.
My initial idea is to check current actions on the instance to determine if the request is coming from the service or not, though I'm not familiar with all the nova actions that can trigger a cinder detach action.
I'm not sure how nova actions could be a reliable way to know if nova called the detach API. There isn't a unique identifier sent to cinder that cinder could use to validate a request matches a server action. Each server action contains the request_id that performed it, but that wouldn't get sent to cinder unless it's sent as the global_request_id. Nova will send the request_id as the global_request_id only if there is not a global_request_id already in the RequestContext. So that wouldn't work if anyone sent a global_request_id when they called nova.
Other than that, you could only try to correlate the request based on server action timestamp unless I'm missing something.
Dan Smith (danms) wrote : | #71 |
I definitely think that relying on server actions for something as important as this is a bad idea. We could easily change, break, or reorder code in that path without having any idea of the security implications...
Brian Rosmaita (brian-rosmaita) wrote : | #72 |
following melwitt, comment #70
> Hm. I wonder if we could instead only check whether the user
> requesting has the "service" role (if "service" in
> RequestContext.
> leave the service_token part out of it.
I'm afraid that if we try to figure out the source of the request
ourselves somehow, we'll be subject to some kind of request forgery
exploit. I think sticking with the service_token is the safest course
of action. The upside is that all the concerned services use the
keystone middleware that supports send_service_token, so other than
configuring each service correctly, there's no software upgrade or
anything involved.
> Technically a deployment could give any project or role to their
> service users (and omit any) ... so I'm not sure whether it's
> reasonable to assume any of the project names or role names or user
> names.
I agree with you completely here, and for the reasons you state, we
won't be able to provide a script to do this automatically. We'll
have to provide clear documentation of how to configure this correctly.
But the plus side is that while the send_user_token stuff may not be
configured at a site, the must be some kind of service user configured
for each service (at least I think so?), and we can refer to the config
options by name in explaining what to do to configure send_user_token
and make the policy file changes.
melanie witt (melwitt) wrote : | #73 |
> I'm afraid that if we try to figure out the source of the request
> ourselves somehow, we'll be subject to some kind of request forgery
> exploit. I think sticking with the service_token is the safest course
> of action. The upside is that all the concerned services use the
> keystone middleware that supports send_service_token, so other than
> configuring each service correctly, there's no software upgrade or
> anything involved.
Yeah sorry, I was responding to the idea that we would have to: 1) accommodate the scenario where the deployer has *not* configured send_service_
I would much rather be able to use the service_token and require deployers to have send_service_
[service_user]
send_
what will we do if it's False (the default)? Make nova services exit if it's not set to True with an error logged to say it's now required? If we don't do anything, when nova calls the detach API it would create a loop, as Dan mentioned in an earlier comment.
> I agree with you completely here, and for the reasons you state, we
> won't be able to provide a script to do this automatically. We'll
> have to provide clear documentation of how to configure this correctly.
> But the plus side is that while the send_user_token stuff may not be
> configured at a site, the must be some kind of service user configured
> for each service (at least I think so?), and we can refer to the config
> options by name in explaining what to do to configure send_user_token
> and make the policy file changes.
I would expect that even if a deployment has left [service_
Jeremy Stanley (fungi) wrote : | #74 |
Just a reminder, our embargo policy promises a maximum of 90 days from initial report of a suspected vulnerability, and per the preamble in the bug description, that's... "This embargo shall not extend past 2023-05-03 and will be made public by or on that date even if no fix is identified."
That's four weeks from yesterday, so ideally we'll have fixes and an advisory ready to provide advance copies to downstream stakeholders at least a full week prior to that, which basically gives us only three weeks to wrap up the debate over patches and prepare all relevant backports (at least as far back as stable/yoga since stable/xena will be transitioning to extended maintenance before then, but also backporting to stable/xena if possible would be nicer to our users).
melanie witt (melwitt) wrote : | #75 |
Thanks Jeremy.
IMHO there's not a clearly great solution here that will work for every deployment configuration. So I think we'll have to choose the least bad option, unfortunately.
Dan and I chatted about this bug today and I will try to summarize what we talked about to try and move things forward. We don't have much time ...
Of the options we have:
1) Redirect all non-service user DELETE /attachments requests to Nova
Problems with it:
* Requires non-default deployment configuration [1]
a) There must be a 'service' role in keystone and it must be assigned to the Nova and Glance users
b) The Cinder service must be configured to enforce service token roles:
[keystone_
service_
service_token_roles = service (this is the default)
c) The Nova service must be configured to send service tokens:
[service_user]
send_service_
(plus username, password, project, etc)
* Consequence of not having the non-default configuration:
There would be a forever loop between Nova and Cinder when Nova attempts any DELETE /attachments calls.
2) Reject all non-service user DELETE /attachments requests
Problems with it:
a-c) Same as option 1)
* Consequence of not having the non-default configuration:
All DELETE /attachments requests will be rejected by Cinder until the deployment is configured as required.
3) Do not accept DELETE /attachments requests on the public API endpoint
Problems with it:
a) Nova would need to be configured to call the private API endpoint for DELETE /attachments
* Consequence of not having the non-default configuration:
All DELETE /attachments requests will be rejected/ignored by Cinder until the deployment is configured as required.
4) Change default Cinder API policy to admin-only for DELETE /attachments
a) The Nova and Glance users must be configured as admin users
* Consequence of not having the non-default configuration:
All DELETE /attachments requests will be rejected/ignored by Cinder until the deployment is configured as required.
5) Other ideas?
Please feel free to correct me if I've got anything wrong here.
[1] https:/
Sylvain Bauza (sylvain-bauza) wrote : | #76 |
As a security workaround, I'd recommend option #4 for operators wanting to be quickly safe until we find a better solution.
Gorka Eguileor (gorka) wrote : | #78 |
I could be wrong, but option #4 shouldn't work, because the requests from Nova come with the user credentials, not with the nova or glance users.
Gorka Eguileor (gorka) wrote : | #79 |
The new Cinder patch changes our approach to reject the dangerous requests with 409 error and also protects the volume action REST API endpoint that has 2 operations that could be used for the attack.
The commit message has more details.
melanie witt (melwitt) wrote : | #80 |
> I could be wrong, but option #4 shouldn't work, because the requests from Nova come with the user credentials, not with the nova or glance users.
No, you are right, sorry. For some reason I had been thinking Nova called the attachment delete API with an elevated RequestContext but it doesn't.
So option #4 (if I've not made another mistake!) would have to be instead:
4) Change default Cinder API policy (in the code) to admin-only for DELETE /attachments and terminate_
I'm probably missing something but with this option a configuration change would not be needed. It would however obviously allow admins to delete attachments without going through Nova.
Gorka Eguileor (gorka) wrote : | #81 |
Forgot to update the release notes in my previous Cinder patch. Updated it now with upgrades and critical section notes.
Gorka Eguileor (gorka) wrote : | #82 |
Forgot to remove 2 methods that were no longer being used in the cinder patch.
Nick Tait (nickthetait) wrote : | #83 |
Spoke with Dan Smith today and finally understood just how urgent this issue is. This revised my scoring to a 9.1 https:/
Tentatively reserved CVE-2023-2088. Jeremy, if you still want to get a CVE direct from mitre I'll reject my one, no big deal.
Brian Rosmaita (brian-rosmaita) wrote : | #84 |
Reviewed the cinder patch (5fe7d14c097260). Code and tests look good. Just a few minor things:
api-ref/
nit: s/cinder api/Block Storage API/
cinder/exception.py
nit: s/through nova/using the Compute API/
releasenote:
1. in 'critical': s/token services/service tokens/
2. in 'security': s/other/another/
3. in 'upgrade': s/service it's/service if it's/
4. in 'upgrade': the role, option, and section names should be in double-backticks (they're in single backticks, which will render as italics instead of monospace font)
actually, forget 3 & 4 and maybe rewrite the upgrade section slightly:
upgrade:
- |
Nova must be `configured to send service tokens
<https:/
**and** cinder must be configured to recognize at least one of the roles
that the nova service user has been assigned in keystone. By default,
cinder will recognize the ``service`` role, so if the nova service user
is assigned a differently named role in your cloud, you must adjust your
cinder configuration file (``service_
in the ``keystone_
configured correctly in this regard, detaching volumes will no longer
work (`Bug #2004555 <https:/
Brian Rosmaita (brian-rosmaita) wrote : | #85 |
Another comment about the cinder patch: I looked through the tempest and cinder-
https:/
This test should now raise a 409 when detach is called. I'm not sure what the best way to handle this is. Possibly talk to the QA team and merge a skip test bug #2004555 now, and then fix the test as soon as the cinder patch lands?
Dan Smith (danms) wrote : | #86 |
I dunno that referencing an embargoed bug is really the best plan before disclosure. However, I suspect we could convince them to just do it without a strong justification if we explain (privately) what's going on.
However, I think that the race is really to disclosure and getting patches up and not necessarily a race to land them, right? If we had a patch ready to go to do the skip (or just fix the test), we could pre-arrange with them to get it +2+W on the same timeline as everything else. With proper Depends-On linkage, that should be okay right?
Brian Rosmaita (brian-rosmaita) wrote : | #87 |
Yeah, yeah, my point was that we need a skip test with some kind of acceptable notation. We can't just fix the test because the cinder patch can't pass tempest with the current test, and tempest with a fixed test will be broken until the cinder patch lands.
So we'll need (I can post these patches):
1. tempest skip patch: cinder patch goes green for tempest with depends-on this patch
2. tempest fix patch: should be green with depends-on(cinder patch)
We post 1, 2, and cinder patch simultaneously to show that everything works, and then the merge order will be 1, cinder patch, 2.
If that sounds OK, I'll attach the patches and then we can add a tempest core to this bug.
Dan Smith (danms) wrote : | #88 |
Yeah, I think that's the best plan.
Gorka Eguileor (gorka) wrote : | #89 |
Latest changes to the cinder patch:
- Updated the exception message
- Rewrote the api-ref section for the delete attachment
- Added missing api-ref text for the terminate connection and the force detach actions
- Added a docstring to the `is_service` method
- Amended the commit message that had a second Change-Id
- Updated the release notes as per comment #85
- Added an issues section to the release notes
The only code change in this patch should be the error message returned with the 409 error.
Gorka Eguileor (gorka) wrote : | #90 |
Brian, as far as I know mentioned test should not fail, because devstack deploys Nova to send the service token.
Gorka Eguileor (gorka) wrote : | #91 |
Resolve conflict with latest master code
Dan Smith (danms) wrote : | #92 |
Gorka, I think the test *will* fail because we're not actually using nova there. We're creating an attachment with a server_id directly and then trying to detach it as a user. It's basically testing the seam and scenario that we're changing here.
Gorka Eguileor (gorka) wrote : | #93 |
Quick update.
The test wouldn't have failed if it were creating the attachment directly in Cinder and then deleting the attachment, even if it had the instance uuid, because Cinder would see that there is no nova instance or that the instance doesn't have the volume attached or that it's not using the attachment record.
Unfortunately we've found, looking at that tempest test, that there is yet another way to detach volumes in Cinder: using the "os-detach" volume action. So I need to update the cinder patch to also protect that endpoint.
We have also determined that Glance could expose the wrong image contents because it's not passing `force=True` on the os-brick `disconnect_
Dan Smith (danms) wrote (last edit ): | #94 |
Okay, but it *does* create a server in nova and uses that uuid for the attachment. So if cinder does check nova, it will find an instance. I haven't looked deeply at the cinder patch, but you're saying because nova doesn't think the instance is actually attached to the volume, it will allow the delete? If so, then cool.
Presumably we want to also have a tempest test added to ensure that if we create/attach through nova and try to delete the attachment as a user, we get the expected 409. Not critical before disclosure I suppose, but I think we probably want that eventually.
Gorka Eguileor (gorka) wrote : | #95 |
+1 to adding tempest tests to confirm that dangerous calls are not allowed (failing by getting 409, 401, or 403 errors) depending on the configuration options.
Brian Rosmaita (brian-rosmaita) wrote : | #96 |
Adding glance_store patch.
Gorka Eguileor (gorka) wrote : | #97 |
Updated Cinder patch that also covers the `detach` volume action.
Brian Rosmaita (brian-rosmaita) wrote : | #99 |
Jeremy Stanley (fungi) wrote : | #100 |
Since the writeup for this is going to be extremely involved, it doesn't make much sense to draft and review it in bug comments. Let's use https:/
Brian Rosmaita (brian-rosmaita) wrote : | #101 |
Reviewed the os-brick FC force disconnect support patch. Release note reads well and generates correctly. Code and tests LGTM.
The only thing I noticed was in os_brick/
melanie witt (melwitt) wrote : | #102 |
Added release note, docs, and upgrade status check to the nova patch.
Not sure if the above should be a separate patch, I can split the patch if so.
Dan Smith (danms) wrote : | #103 |
Melanie, the updated nova patch looks pretty good to me and thanks for adding the nova-status check. I was thinking we'd do that after, but it's definitely nice to have it right away. I agree the patch is pretty massive right now, and under normal circumstances I'd split it up of course. However, I imagine some would argue the backporting will be easier and faster as a monolith.
One other thing, I think we should add the service user stuff to the install guide sections as well. I think that's where people likely get started for the bare minimum required config, so I think they'd probably find it weird to have a required chunk of config tucked at the bottom of the admin guide (which is linked from the top under "maintenance"). What do you think?
Jeremy Stanley (fungi) wrote : | #104 |
It's not just the backporting that's easier with fewer patches. Keep in mind that we're going to be distributing advance copies of these to downstream stakeholders (like cloud operators and the private linux-distros mailing list) as file attachments in E-mail, so the more patches those recipients need to juggle and worry about sequencing the greater the risk something goes wrong for them.
melanie witt (melwitt) wrote : | #105 |
Ack Dan and Jeremy, that was kind of my thinking too, that normally we would split it up but that keeping the number of patches to a minimum may be the right move for this.
Dan, initially I wasn't sure where required config should go in the docs so I just picked something. I agree the install guide would be better, so I'll move it there.
Gorka Eguileor (gorka) wrote : | #106 |
- tempest-2004555.patch Edit (16.6 KiB, text/plain)
Additional tempest negative tests to verify that the detach, force detach, terminate connection, and attachment delete operations are protected.
Gorka Eguileor (gorka) wrote : | #107 |
Added doc changes and modified the patch so that tempest runs without changes (at least tox -eintegrated-
Ghanshyam Mann (ghanshyammann) wrote : | #108 |
From tempest test 'test_attach_
These were very old tests and should have written from cinder standalone service perspective where no server creation/passing needed. Irrespective of this bug we should modify these test as close to user operations.
For any other test failing, we can check what operation test verify and based on that we can go for the skip test way. Below is the process for Tempest test skip/modification to land service side bug
- https:/
melanie witt (melwitt) wrote : | #109 |
- nova-2004555-master_to_yoga.patch Edit (52.2 KiB, text/plain)
Added service user token configuration instructions to the install guides.
melanie witt (melwitt) wrote : | #110 |
nova-2004555.patch applies cleanly to Bobcat, Antelope, Zed, Yoga
nova-2004555-
Brian Rosmaita (brian-rosmaita) wrote : | #111 |
reviewed cinder patch cc20649efa7383f495
Primary issue is that (at least on my reading) the api-ref and the commit message/release note conflict over whether users are allowed to make the 3 action API calls. The api-ref says that they can, with success dependent on satisfying a safe delete (as for the Attachments API delete call), but the commit message/relnote say the action calls are service-only. The code looks like it's implementing what the api-ref says.
docs: changes are good, read well, render correctly in HTML, links all work (only discuss configuration, so not affected by the above)
api-ref: nit: os-detach, os-force_detach are missing the 409 in the response codes list (only mentioning it because you have it for the os-terminate_
cinder/
nit: line 2583 (reason=) s/atachment/
cinder/
nit: line 1040: mock_deletion_
cinder/
nit: test_attachment
cinder/
in test_owner_
release note: if you revise, the single backticks produce italics; you need double backticks for monospace font
The code in volume/api.py looks fine and the tests are thorough
Gorka Eguileor (gorka) wrote : | #112 |
Ghanshyam, test "test_attach_
Gorka Eguileor (gorka) wrote : | #113 |
Updated patch to support force disconnect on FC driver.
Changes:
- Always display the log message
- Easier to read (using the retry decorator)
- Exponential backoff between retries
Gorka Eguileor (gorka) wrote : | #115 |
Brian thanks for the review, I have updated the patch with your suggestions (comment #111).
I may have changed the phrasing in a later patch than the one you reviewed, but I believe that in the latest one the api-ref, commit message, comments in code, and release note, all say the same thing, I just used different wording since the audience is different.
For example the release note is very brief and it reads: "cinder now rejects user attachment delete requests for attachments that are being used by nova instances".
Being used by a nova instance means that the instance exists, that it has the volume attached, and that the volume attachment in the instance is using that particular attachment.
Good call on the missing 409 response codes in the api-ref, unintentional deleted line on the test, etc.
Gorka Eguileor (gorka) wrote : | #116 |
Gorka Eguileor (gorka) wrote : | #117 |
Gorka Eguileor (gorka) wrote : | #118 |
Gorka Eguileor (gorka) wrote : | #119 |
Gorka Eguileor (gorka) wrote : | #121 |
Gorka Eguileor (gorka) wrote : | #123 |
Gorka Eguileor (gorka) wrote : | #124 |
Gorka Eguileor (gorka) wrote : | #125 |
Rajat Dhasmana (whoami-rajat) wrote : | #126 |
Hi Brian,
One comment regarding the glance store patch, we also have another disconnect_volume call in the attachment_
This file is for handling multiattach volumes where we only disconnect from os-brick if we are on the last attachment.
I also checked the backport patches by Gorka for Zed, Yoga and Xena and they do handle this case so we shouldn't require revised backports.
Brian Rosmaita (brian-rosmaita) wrote : | #127 |
I should note that the latest cinder-master patch (f6f4b77b213935
Brian Rosmaita (brian-rosmaita) wrote : | #128 |
@Rajat: some refactoring by an excellent software engineer (i.e., you) in 2023.1 restructured the code so that the multiattach manager is actually calling the method that now contains the force, so it only needs to be changed in that one place.
Brian Rosmaita (brian-rosmaita) wrote : | #129 |
Verified that glance_
Gorka Eguileor (gorka) wrote : | #130 |
Brian Rosmaita (brian-rosmaita) wrote : | #131 |
Revisions to osbrick-fc patch LGTM. Nice restructuring of multipath_
Jeremy Stanley (fungi) wrote : | #132 |
We're 5 weekdays away from our self-imposed publication deadline, so unless we've got the text and backported patches ready to distribute downstream tomorrow, we should probably push that out. Our vulnerability management policy[*] states, "Embargoes for privately-submitted reports of suspected vulnerabilities shall not last more than 90 days, except under unusual circumstances." While we didn't start making headway on this report as early as I would have preferred, the cross-project nature of the problem and broad impact does qualify as an "unusual circumstance" in my opinion so I'm proposing we extend the deadline by a week to Wednesday, May 10 in order to complete proper due diligence of review and testing of the proposed solutions. Are there any objections?
As for where we are now... It appears we have consensus and no new concerns raised on patches for the cinder, glance_store, nova, os-brick and tempest repositories, with backports as far as the stable/xena branch (even though we assume stable/yoga will be the oldest non-EM branch by the time we publish, since stable/xena was supposed to reach EM a week ago). For the document, we seem to have most of the details filled in but are still working to finalize the prose for accuracy and clarity. I think once everyone following is happy enough with what's there, we'll be ready to pick a publication date and distribute advance copies of the document and patches to our downstream stakeholders.
Jeremy Stanley (fungi) wrote : | #133 |
(Sorry, I forgot to footnote the relevant policy URL.)
[*] https:/
Jeremy Stanley (fungi) wrote : | #134 |
And just a reminder to anyone who missed the link in comment #100, we're using https:/
Nick Tait (nickthetait) wrote : | #135 |
No complaints from me on delaying disclosure date.
Nick Tait (nickthetait) wrote : | #136 |
FYI that launchpad seems to display comment numbers that are out of order. I believe the content is correctly ordered and dated, but just the numberings are wrong. At one point I saw 68, 69, 70, 16, 17, 18 ... 38, 39, 40, 71, 72, 73 but currently it shows me 13, 14, 41, 42 ... 93, 94, 16, 17 ... 39, 40, 95, 96
melanie witt (melwitt) wrote : | #137 |
- nova-2004555-xena.patch Edit (50.1 KiB, text/plain)
Added cherry-pick and conflicts lines to commit message.
Brian Rosmaita (brian-rosmaita) wrote : | #138 |
I think the text in https:/
Jeremy Stanley (fungi) wrote : | #139 |
Seems like we have consensus on the draft text in the etherpad sufficient for me to assemble downstream advance notice and publication, and agreement on the approach in the supplied patches and backports. On the assumption that the testing being performed by involved parties has turned up no additional problems, I propose that we schedule publication for 15:00 UTC on Wednesday, May 10 with 5 business day advance notification to downstream stakeholders on Wednesday, May 3. Are there any objections?
description: | updated |
summary: |
- [ussuri] Wrong volume attachment - volumes overlapping when connected - through iscsi on host + Unauthorized volume access through deleted volume attachments + (CVE-2023-2088) |
Changed in ossa: | |
status: | Incomplete → In Progress |
importance: | Undecided → High |
assignee: | nobody → Jeremy Stanley (fungi) |
Dan Smith (danms) wrote : Re: Unauthorized volume access through deleted volume attachments (CVE-2023-2088) | #140 |
No objection from me.
Sylvain Bauza (sylvain-bauza) wrote : | #141 |
I barely was able to look at this bug report given the long discussions, but I'm eventually OK with the etherpad, since it explains both the problem, a workaround and then the fix.
Operators can then choose between upgrading their services or modifying their existing enviroments.
Brian Rosmaita (brian-rosmaita) wrote : | #142 |
The description of the short-term mitigation strategy via policy/config change is clear and has been tested, so I think we're ready to go.
Jeremy Stanley (fungi) wrote : | #143 |
Please consider the contents of the draft etherpad we were using effectively frozen as of now, since I'm working on incorporating it into the advance notification for downstream stakeholders in preparation for sending later today. If you notice any significant problems with the information or text, please raise it here in the bug report since we'll have to consider whether we treat further corrections as errata. Same goes for any adjustments to the patches currently attached to the bug report. Thanks!
Jeremy Stanley (fungi) wrote : | #144 |
Just to confirm, the osbrick-
Alan Bishop (alan-bishop) wrote : | #145 |
I wish I could offer more details and a definitive answer, but Gorka is on holiday this week and so I have to take a stab at answering this one. I believe the osbrick-
Brian Rosmaita (brian-rosmaita) wrote : | #146 |
@Jeremy: the osbrick-leak patch addresses some possible corner cases, but is too risky to backport as it may cause regressions. Our current thinking is that it should be worked on in public as a patch to master after this issue has been made public, and can go through the normal review and CI process. (To answer your question, it wasn't supplanted by the osbrick-fc patch, it is a child of that patch.)
Jeremy Stanley (fungi) wrote : | #147 |
Thanks Alan and Brian for clarification on the leak patch. I didn't attach it to the downstream notification, which seems to have been the right call. It makes sense to treat that as a master branch only hardening fix after publication.
As for the downstream notification, it took a little longer than I intended to massage it into the shape of our templated communications and map/copy the patches to the branch-specific name format we've standardized, but it was sent to our private embargo-notice ML and the private linux-distros ML a little before 01:00 UTC.
Jeremy Stanley (fungi) wrote : | #148 |
A quick reminder: We're scheduled to make this information public at 15:00 UTC tomorrow (Wednesday, May 10). I'll be switching the bug report to Public Security a few minutes before that, so the devs involved can start pushing patches/backports into Gerrit at that time (I'll also comment on the bug letting everyone know to start). Once everything has been pushed to Gerrit, so that we know what the change URLs are for all of them, I can publish the advisory and accompanying security note to the security.
Jeremy Stanley (fungi) wrote : | #149 |
Since we have a lot of patches to get pushed for this, I've gone ahead and opened the bug up about 30 minutes early. Please begin pushing the fixes/backports to Gerrit at your earliest opportunity so I can include the links for them in our advisory publication. Thanks!
description: | updated |
information type: | Private Security → Public Security |
Changed in ossn: | |
assignee: | nobody → Jeremy Stanley (fungi) |
importance: | Undecided → High |
status: | New → In Progress |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance_store (master) | #150 |
Fix proposed to branch: master
Review: https:/
Changed in glance-store: | |
status: | New → In Progress |
Changed in cinder: | |
status: | New → In Progress |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master) | #151 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/2023.1) | #152 |
Fix proposed to branch: stable/2023.1
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/zed) | #153 |
Fix proposed to branch: stable/zed
Review: https:/
summary: |
- Unauthorized volume access through deleted volume attachments - (CVE-2023-2088) + [OSSA-2023-003] Unauthorized volume access through deleted volume + attachments (CVE-2023-2088) |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/yoga) | #154 |
Fix proposed to branch: stable/yoga
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/xena) | #155 |
Fix proposed to branch: stable/xena
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (master) | #156 |
Related fix proposed to branch: master
Review: https:/
Changed in os-brick: | |
status: | New → In Progress |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (master) | #157 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/2023.1) | #158 |
Related fix proposed to branch: stable/2023.1
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/zed) | #159 |
Related fix proposed to branch: stable/zed
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/yoga) | #160 |
Related fix proposed to branch: stable/yoga
Review: https:/
Changed in nova: | |
status: | New → In Progress |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master) | #161 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/xena) | #162 |
Related fix proposed to branch: stable/xena
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance_store (stable/2023.1) | #163 |
Fix proposed to branch: stable/2023.1
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master) | #164 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance_store (stable/zed) | #165 |
Fix proposed to branch: stable/zed
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance_store (stable/yoga) | #166 |
Fix proposed to branch: stable/yoga
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance_store (stable/xena) | #167 |
Fix proposed to branch: stable/xena
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2023.1) | #168 |
Fix proposed to branch: stable/2023.1
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/2023.1) | #169 |
Related fix proposed to branch: stable/2023.1
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/zed) | #170 |
Fix proposed to branch: stable/zed
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/zed) | #171 |
Related fix proposed to branch: stable/zed
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/yoga) | #172 |
Fix proposed to branch: stable/yoga
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/yoga) | #173 |
Related fix proposed to branch: stable/yoga
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena) | #174 |
Fix proposed to branch: stable/xena
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/xena) | #175 |
Related fix proposed to branch: stable/xena
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby) | #176 |
Fix proposed to branch: stable/wallaby
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/wallaby) | #177 |
Related fix proposed to branch: stable/wallaby
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ossa (master) | #178 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to ossa (master) | #179 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit d62fe374e42538e
Author: Jeremy Stanley <email address hidden>
Date: Wed May 10 14:39:22 2023 +0000
Add OSSA-2023-003 (CVE-2023-2088)
Change-Id: Iab9cca074c2928
Closes-Bug: #2004555
Changed in ossa: | |
status: | In Progress → Fix Released |
Changed in ossn: | |
status: | In Progress → Fix Released |
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance_store (stable/2023.1) | #180 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/2023.1
commit a7eed0263e436f8
Author: Brian Rosmaita <email address hidden>
Date: Tue Apr 18 11:22:27 2023 -0400
Add force to os-brick disconnect
In order to be sure that devices are being removed from the host,
we should be using the 'force' parameter with os-brick's
disconnect_
Closes-bug: #2004555
Change-Id: I63d09ad9ef465b
(cherry picked from commit 1d8033e54e009bb
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance_store (master) | #181 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 1d8033e54e009bb
Author: Brian Rosmaita <email address hidden>
Date: Tue Apr 18 11:22:27 2023 -0400
Add force to os-brick disconnect
In order to be sure that devices are being removed from the host,
we should be using the 'force' parameter with os-brick's
disconnect_
Closes-bug: #2004555
Change-Id: I63d09ad9ef465b
Changed in glance-store: | |
status: | In Progress → Fix Released |
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to glance_store (stable/2023.1) | #182 |
Related fix proposed to branch: stable/2023.1
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (master) | #183 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 570df49db9de303
Author: Gorka Eguileor <email address hidden>
Date: Wed Mar 1 13:08:16 2023 +0100
Support force disconnect for FC
This patch adds support for the force and ignore_errors on the
disconnect_
connector.
Related-Bug: #2004555
Change-Id: Ia74ecfba03ba23
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/2023.1) | #184 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/2023.1
commit ffb76e10bca1a2b
Author: Gorka Eguileor <email address hidden>
Date: Wed Mar 1 13:08:16 2023 +0100
Support force disconnect for FC
This patch adds support for the force and ignore_errors on the
disconnect_
connector.
Related-Bug: #2004555
Change-Id: Ia74ecfba03ba23
(cherry picked from commit 570df49db9de303
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance_store (stable/zed) | #185 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/zed
commit e9d2509926445fd
Author: Brian Rosmaita <email address hidden>
Date: Tue Apr 18 11:22:27 2023 -0400
Add force to os-brick disconnect
In order to be sure that devices are being removed from the host,
we should be using the 'force' parameter with os-brick's
disconnect_
Closes-bug: #2004555
Change-Id: I63d09ad9ef465b
(cherry picked from commit 1d8033e54e009bb
(cherry picked from commit a7eed0263e436f8
Conflicts:
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance_store (stable/yoga) | #186 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit 28301829777d4b1
Author: Brian Rosmaita <email address hidden>
Date: Tue Apr 18 11:22:27 2023 -0400
Add force to os-brick disconnect
In order to be sure that devices are being removed from the host,
we should be using the 'force' parameter with os-brick's
disconnect_
Closes-bug: #2004555
Change-Id: I63d09ad9ef465b
(cherry picked from commit 1d8033e54e009bb
(cherry picked from commit a7eed0263e436f8
Conflicts: glance_
(cherry picked from commit e9d2509926445fd
Conflicts:
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance_store (stable/xena) | #187 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/xena
commit 1f447bc184500e0
Author: Brian Rosmaita <email address hidden>
Date: Tue Apr 18 11:22:27 2023 -0400
Add force to os-brick disconnect
In order to be sure that devices are being removed from the host,
we should be using the 'force' parameter with os-brick's
disconnect_
Closes-bug: #2004555
Change-Id: I63d09ad9ef465b
(cherry picked from commit 1d8033e54e009bb
(cherry picked from commit a7eed0263e436f8
Conflicts: glance_
(cherry picked from commit e9d2509926445fd
Conflicts: glance_
(cherry picked from commit 28301829777d4b1
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/yoga) | #188 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit 111b3931a2db1d5
Author: Gorka Eguileor <email address hidden>
Date: Wed Mar 1 13:08:16 2023 +0100
Support force disconnect for FC
This patch adds support for the force and ignore_errors on the
disconnect_
connector.
Related-Bug: #2004555
Change-Id: Ia74ecfba03ba23
(cherry picked from commit 570df49db9de303
Conflicts:
tags: | added: in-stable-yoga |
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master) | #189 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit db455548a12beac
Author: melanie witt <email address hidden>
Date: Wed Feb 15 22:37:40 2023 +0000
Use force=True for os-brick disconnect during delete
The 'force' parameter of os-brick's disconnect_volume() method allows
callers to ignore flushing errors and ensure that devices are being
removed from the host.
We should use force=True when we are going to delete an instance to
avoid leaving leftover devices connected to the compute host which
could then potentially be reused to map to volumes to an instance that
should not have access to those volumes.
We can use force=True even when disconnecting a volume that will not be
deleted on termination because os-brick will always attempt to flush
and disconnect gracefully before forcefully removing devices.
Closes-Bug: #2004555
Change-Id: I3629b84d3255a8
Changed in nova: | |
status: | In Progress → Fix Released |
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master) | #190 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 41c64b94b0af333
Author: melanie witt <email address hidden>
Date: Tue May 9 03:11:25 2023 +0000
Enable use of service user token with admin context
When the [service_user] section is configured in nova.conf, nova will
have the ability to send a service user token alongside the user's
token. The service user token is sent when nova calls other services'
REST APIs to authenticate as a service, and service calls can sometimes
have elevated privileges.
Currently, nova does not however have the ability to send a service user
token with an admin context. This means that when nova makes REST API
calls to other services with an anonymous admin RequestContext (such as
in nova-manage or periodic tasks), it will not be authenticated as a
service.
This adds a keyword argument to service_
enable callers to provide a user_auth object instead of attempting to
extract the user_auth from the RequestContext.
The cinder and neutron client modules are also adjusted to make use of
the new user_auth keyword argument so that nova calls made with
anonymous admin request contexts can authenticate as a service when
configured.
Related-Bug: #2004555
Change-Id: I14df2d55f4b2f0
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/zed) | #191 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/zed
commit e00d3ca753db6f6
Author: Gorka Eguileor <email address hidden>
Date: Wed Mar 1 13:08:16 2023 +0100
Support force disconnect for FC
This patch adds support for the force and ignore_errors on the
disconnect_
connector.
Related-Bug: #2004555
Change-Id: Ia74ecfba03ba23
(cherry picked from commit 570df49db9de303
tags: | added: in-stable-zed |
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to glance_store (stable/zed) | #192 |
Related fix proposed to branch: stable/zed
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to glance_store (stable/yoga) | #193 |
Related fix proposed to branch: stable/yoga
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/2023.1) | #194 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/2023.1
commit efb01985db88d63
Author: melanie witt <email address hidden>
Date: Wed Feb 15 22:37:40 2023 +0000
Use force=True for os-brick disconnect during delete
The 'force' parameter of os-brick's disconnect_volume() method allows
callers to ignore flushing errors and ensure that devices are being
removed from the host.
We should use force=True when we are going to delete an instance to
avoid leaving leftover devices connected to the compute host which
could then potentially be reused to map to volumes to an instance that
should not have access to those volumes.
We can use force=True even when disconnecting a volume that will not be
deleted on termination because os-brick will always attempt to flush
and disconnect gracefully before forcefully removing devices.
Closes-Bug: #2004555
Change-Id: I3629b84d3255a8
(cherry picked from commit db455548a12beac
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master) | #195 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 6df1839bdf28810
Author: Gorka Eguileor <email address hidden>
Date: Thu Feb 16 15:57:15 2023 +0100
Reject unsafe delete attachment calls
Due to how the Linux SCSI kernel driver works there are some storage
systems, such as iSCSI with shared targets, where a normal user can
access other projects' volume data connected to the same compute host
using the attachments REST API.
This affects both single and multi-pathed connections.
To prevent users from doing this, unintentionally or maliciously,
cinder-api will now reject some delete attachment requests that are
deemed unsafe.
Cinder will process the delete attachment request normally in the
following cases:
- The request comes from an OpenStack service that is sending the
service token that has one of the roles in `service_
- Attachment doesn't have an instance_uuid value
- The instance for the attachment doesn't exist in Nova
- According to Nova the volume is not connected to the instance
- Nova is not using this attachment record
There are 3 operations in the actions REST API endpoint that can be used
for an attack:
- `os-terminate_
- `os-detach`: Detach a volume
- `os-force_detach`: Force detach a volume
In this endpoint we just won't allow most requests not coming from a
service. The rules we apply are the same as for attachment delete
explained earlier, but in this case we may not have the attachment id
and be more restrictive. This should not be a problem for normal
operations because:
- Cinder backup doesn't use the REST API but RPC calls via RabbitMQ
- Glance doesn't use this interface anymore
Checking whether it's a service or not is done at the cinder-api level
by checking that the service user that made the call has at least one of
the roles in the `service_
retrieved from keystone by the keystone middleware using the value of
the "X-Service-Token" header.
If Cinder is configured with `service_
an attacker provides non-service valid credentials the service will
return a 401 error, otherwise it'll return 409 as if a normal user had
made the call without the service token.
Closes-Bug: #2004555
Change-Id: I612905a1bf4a17
Changed in cinder: | |
status: | In Progress → Fix Released |
Maksim Malchuk (mmalchuk) wrote : | #196 |
Related fix proposed to branch: master
Review: https:/
Changed in kolla-ansible: | |
status: | New → In Progress |
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master) | #197 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master) | #198 |
Change abandoned by "Sven Kieske <email address hidden>" on branch: master
Review: https:/
Reason: duplicate of https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/2023.1) | #199 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/2023.1
commit dd6010a9f7bf8cb
Author: Gorka Eguileor <email address hidden>
Date: Thu Feb 16 15:57:15 2023 +0100
Reject unsafe delete attachment calls
Due to how the Linux SCSI kernel driver works there are some storage
systems, such as iSCSI with shared targets, where a normal user can
access other projects' volume data connected to the same compute host
using the attachments REST API.
This affects both single and multi-pathed connections.
To prevent users from doing this, unintentionally or maliciously,
cinder-api will now reject some delete attachment requests that are
deemed unsafe.
Cinder will process the delete attachment request normally in the
following cases:
- The request comes from an OpenStack service that is sending the
service token that has one of the roles in `service_
- Attachment doesn't have an instance_uuid value
- The instance for the attachment doesn't exist in Nova
- According to Nova the volume is not connected to the instance
- Nova is not using this attachment record
There are 3 operations in the actions REST API endpoint that can be used
for an attack:
- `os-terminate_
- `os-detach`: Detach a volume
- `os-force_detach`: Force detach a volume
In this endpoint we just won't allow most requests not coming from a
service. The rules we apply are the same as for attachment delete
explained earlier, but in this case we may not have the attachment id
and be more restrictive. This should not be a problem for normal
operations because:
- Cinder backup doesn't use the REST API but RPC calls via RabbitMQ
- Glance doesn't use this interface
Checking whether it's a service or not is done at the cinder-api level
by checking that the service user that made the call has at least one of
the roles in the `service_
retrieved from keystone by the keystone middleware using the value of
the "X-Service-Token" header.
If Cinder is configured with `service_
an attacker provides non-service valid credentials the service will
return a 401 error, otherwise it'll return 409 as if a normal user had
made the call without the service token.
Closes-Bug: #2004555
Change-Id: I612905a1bf4a17
(cherry picked from commit 6df1839bdf28810
Conflicts:
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/2023.1) | #200 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/2023.1
commit 1f781423ee4224c
Author: melanie witt <email address hidden>
Date: Tue May 9 03:11:25 2023 +0000
Enable use of service user token with admin context
When the [service_user] section is configured in nova.conf, nova will
have the ability to send a service user token alongside the user's
token. The service user token is sent when nova calls other services'
REST APIs to authenticate as a service, and service calls can sometimes
have elevated privileges.
Currently, nova does not however have the ability to send a service user
token with an admin context. This means that when nova makes REST API
calls to other services with an anonymous admin RequestContext (such as
in nova-manage or periodic tasks), it will not be authenticated as a
service.
This adds a keyword argument to service_
enable callers to provide a user_auth object instead of attempting to
extract the user_auth from the RequestContext.
The cinder and neutron client modules are also adjusted to make use of
the new user_auth keyword argument so that nova calls made with
anonymous admin request contexts can authenticate as a service when
configured.
Related-Bug: #2004555
Change-Id: I14df2d55f4b2f0
(cherry picked from commit 41c64b94b0af333
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to glance_store (master) | #201 |
Related fix proposed to branch: master
Review: https:/
Jeremy Stanley (fungi) wrote : | #202 |
I was contacted privately by an operator who read the advisory and was unable to reproduce the failure in their iSCSI based deployment. They suspect that the fact they're not relying on multipathd is protecting them from the vulnerability. Is anyone able to confirm whether this affects iSCSI environments without multipathd? If it doesn't, I'll look into issuing an errata update clarifying the scope of the vulnerability further.
Dan Smith (danms) wrote : | #203 |
I think it does *not* depend on multipathd, but as noted in the text, it doesn't apply to *all* iSCSI deployments for various reasons.
Gorka Eguileor (gorka) wrote : | #204 |
I just realized that for iSCSI based systems only those using "shared targets" are affected, like I mentioned on comment #57. We forgot to mention it in the final errata.
Regarding the multipathing, using multipathing there are additional issues that can lead to leaks, and we expect all production environments to use multipathing.
It could also be that they haven't properly checked it or that their storage system is issuing the "Power-on or device reset Unit Attention event" that prevents the issue from happening, like I observed in an HPE 3PAR FC system.
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/zed) | #205 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/zed
commit cb4682fb8369122
Author: Gorka Eguileor <email address hidden>
Date: Thu Feb 16 15:57:15 2023 +0100
Reject unsafe delete attachment calls
Due to how the Linux SCSI kernel driver works there are some storage
systems, such as iSCSI with shared targets, where a normal user can
access other projects' volume data connected to the same compute host
using the attachments REST API.
This affects both single and multi-pathed connections.
To prevent users from doing this, unintentionally or maliciously,
cinder-api will now reject some delete attachment requests that are
deemed unsafe.
Cinder will process the delete attachment request normally in the
following cases:
- The request comes from an OpenStack service that is sending the
service token that has one of the roles in `service_
- Attachment doesn't have an instance_uuid value
- The instance for the attachment doesn't exist in Nova
- According to Nova the volume is not connected to the instance
- Nova is not using this attachment record
There are 3 operations in the actions REST API endpoint that can be used
for an attack:
- `os-terminate_
- `os-detach`: Detach a volume
- `os-force_detach`: Force detach a volume
In this endpoint we just won't allow anything that is not coming from a
service. This should not be a problem because:
- Cinder backup doesn't use the REST API but RPC calls via RabbitMQ
- Glance doesn't use this interface
Checking whether it's a service or not is done at the cinder-api level
by checking that the service user that made the call has at least one of
the roles in the `service_
retrieved from keystone by the keystone middleware using the value of
the "X-Service-Token" header.
If Cinder is configured with `service_
an attacker provides non-service valid credentials the service will
return a 401 error, otherwise it'll return 409 as if a normal user had
made the call without the service token.
Closes-Bug: #2004555
Change-Id: I612905a1bf4a17
(cherry picked from commit 6df1839bdf28810
Conflicts:
(cherry picked from commit dd6010a9f7bf8cb
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/yoga) | #206 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit a66f4afa22fc5a0
Author: Gorka Eguileor <email address hidden>
Date: Thu Feb 16 15:57:15 2023 +0100
Reject unsafe delete attachment calls
Due to how the Linux SCSI kernel driver works there are some storage
systems, such as iSCSI with shared targets, where a normal user can
access other projects' volume data connected to the same compute host
using the attachments REST API.
This affects both single and multi-pathed connections.
To prevent users from doing this, unintentionally or maliciously,
cinder-api will now reject some delete attachment requests that are
deemed unsafe.
Cinder will process the delete attachment request normally in the
following cases:
- The request comes from an OpenStack service that is sending the
service token that has one of the roles in `service_
- Attachment doesn't have an instance_uuid value
- The instance for the attachment doesn't exist in Nova
- According to Nova the volume is not connected to the instance
- Nova is not using this attachment record
There are 3 operations in the actions REST API endpoint that can be used
for an attack:
- `os-terminate_
- `os-detach`: Detach a volume
- `os-force_detach`: Force detach a volume
In this endpoint we just won't allow most requests not coming from a
service. The rules we apply are the same as for attachment delete
explained earlier, but in this case we may not have the attachment id
and be more restrictive. This should not be a problem for normal
operations because:
- Cinder backup doesn't use the REST API but RPC calls via RabbitMQ
- Glance doesn't use this interface
Checking whether it's a service or not is done at the cinder-api level
by checking that the service user that made the call has at least one of
the roles in the `service_
retrieved from keystone by the keystone middleware using the value of
the "X-Service-Token" header.
If Cinder is configured with `service_
an attacker provides non-service valid credentials the service will
return a 401 error, otherwise it'll return 409 as if a normal user had
made the call without the service token.
Closes-Bug: #2004555
Change-Id: I612905a1bf4a17
(cherry picked from commit 6df1839bdf28810
Conflicts:
(cherry picked from commit dd6010a9f7bf8cb
(cherry picked from commit cb4682fb8369122
Conflicts:
Jeremy Stanley (fungi) wrote : | #207 |
Can someone in Red Hat Security please switch the assigned CVE to published status? A down-side to the VMT not getting CVE assignments directly through MITRE is that MITRE apparently also refuses to process requests to switch them to public once we publish our advisories. It would be very nice for this to not still be in "reserved" state, as we're now 48 hours past the original publication.
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/glance_store 4.4.0 | #208 |
This issue was fixed in the openstack/
Nick Tait (nickthetait) wrote : | #209 |
I did submit the record to MITRE yesterday, its waiting on them to be reviewed/posted.
Jeremy Stanley (fungi) wrote : | #210 |
Thanks Nick. I notified MITRE about the publication on Wednesday when we posted it (per our process this normally works when we were the ones to originally request the assignment from them), but they responded today telling me to talk to you, so I suppose it's in Limbo for the time being.
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/wallaby) | #211 |
Related fix proposed to branch: stable/wallaby
Review: https:/
Nick Tait (nickthetait) wrote : | #212 |
okay its live now https:/
Zakhar Kirpichenko (kzakhar) wrote : | #213 |
The following packages were updated on Wallaby compute nodes:
python3-nova:amd64 (3:23.2.
python3-
nova-compute-
nova-common:amd64 (3:23.2.
os-brick-
nova-compute-
nova-compute:amd64 (3:23.2.
nova-compute is now unable to detach volumes from instances:
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
2023-05-13 05:53:00.128 3219193 ERROR oslo_messaging.
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/zed) | #214 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/zed
commit 8b4b99149a35663
Author: melanie witt <email address hidden>
Date: Wed Feb 15 22:37:40 2023 +0000
Use force=True for os-brick disconnect during delete
The 'force' parameter of os-brick's disconnect_volume() method allows
callers to ignore flushing errors and ensure that devices are being
removed from the host.
We should use force=True when we are going to delete an instance to
avoid leaving leftover devices connected to the compute host which
could then potentially be reused to map to volumes to an instance that
should not have access to those volumes.
We can use force=True even when disconnecting a volume that will not be
deleted on termination because os-brick will always attempt to flush
and disconnect gracefully before forcefully removing devices.
Closes-Bug: #2004555
Change-Id: I3629b84d3255a8
(cherry picked from commit db455548a12beac
(cherry picked from commit efb01985db88d63
Jeremy Stanley (fungi) wrote : | #215 |
Zakhar: Make sure your packages include the nova patch for OSSA-2023-003 errata #1: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/zed) | #216 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/zed
commit 0d6dd6c67f56c9d
Author: melanie witt <email address hidden>
Date: Tue May 9 03:11:25 2023 +0000
Enable use of service user token with admin context
When the [service_user] section is configured in nova.conf, nova will
have the ability to send a service user token alongside the user's
token. The service user token is sent when nova calls other services'
REST APIs to authenticate as a service, and service calls can sometimes
have elevated privileges.
Currently, nova does not however have the ability to send a service user
token with an admin context. This means that when nova makes REST API
calls to other services with an anonymous admin RequestContext (such as
in nova-manage or periodic tasks), it will not be authenticated as a
service.
This adds a keyword argument to service_
enable callers to provide a user_auth object instead of attempting to
extract the user_auth from the RequestContext.
The cinder and neutron client modules are also adjusted to make use of
the new user_auth keyword argument so that nova calls made with
anonymous admin request contexts can authenticate as a service when
configured.
Related-Bug: #2004555
Change-Id: I14df2d55f4b2f0
(cherry picked from commit 41c64b94b0af333
(cherry picked from commit 1f781423ee4224c
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/xena) | #217 |
Related fix proposed to branch: stable/xena
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/wallaby) | #218 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit a77ea13ef199154
Author: Sean Mooney <email address hidden>
Date: Wed May 10 20:58:47 2023 +0100
always add service_user section to nova.conf
As of I3629b84d3255a8
now requires the service_user section to be configured
to address CVE-2023-2088. This change adds
the service user section to the nova.conf template in
the nova and nova-cell roles.
Related-Bug: #2004555
Signed-off-by: Sven Kieske <email address hidden>
Change-Id: I2189dafca070ac
tags: | added: in-stable-wallaby |
Dan Smith (danms) wrote : | #219 |
Zakhar, which volume driver are you using?
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/xena) | #220 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/xena
commit 03c12abbcc107bf
Author: Sean Mooney <email address hidden>
Date: Wed May 10 20:58:47 2023 +0100
always add service_user section to nova.conf
As of I3629b84d3255a8
now requires the service_user section to be configured
to address CVE-2023-2088. This change adds
the service user section to the nova.conf template in
the nova and nova-cell roles.
Related-Bug: #2004555
Signed-off-by: Sven Kieske <email address hidden>
Change-Id: I2189dafca070ac
(cherry picked from commit a77ea13ef199154
tags: | added: in-stable-xena |
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/yoga) | #221 |
Related fix proposed to branch: stable/yoga
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/yoga) | #222 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit 4d8efa2d196f72f
Author: melanie witt <email address hidden>
Date: Wed Feb 15 22:37:40 2023 +0000
Use force=True for os-brick disconnect during delete
The 'force' parameter of os-brick's disconnect_volume() method allows
callers to ignore flushing errors and ensure that devices are being
removed from the host.
We should use force=True when we are going to delete an instance to
avoid leaving leftover devices connected to the compute host which
could then potentially be reused to map to volumes to an instance that
should not have access to those volumes.
We can use force=True even when disconnecting a volume that will not be
deleted on termination because os-brick will always attempt to flush
and disconnect gracefully before forcefully removing devices.
Closes-Bug: #2004555
Change-Id: I3629b84d3255a8
(cherry picked from commit db455548a12beac
(cherry picked from commit efb01985db88d63
(cherry picked from commit 8b4b99149a35663
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/yoga) | #223 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit 98c3e3707c08a07
Author: melanie witt <email address hidden>
Date: Tue May 9 03:11:25 2023 +0000
Enable use of service user token with admin context
When the [service_user] section is configured in nova.conf, nova will
have the ability to send a service user token alongside the user's
token. The service user token is sent when nova calls other services'
REST APIs to authenticate as a service, and service calls can sometimes
have elevated privileges.
Currently, nova does not however have the ability to send a service user
token with an admin context. This means that when nova makes REST API
calls to other services with an anonymous admin RequestContext (such as
in nova-manage or periodic tasks), it will not be authenticated as a
service.
This adds a keyword argument to service_
enable callers to provide a user_auth object instead of attempting to
extract the user_auth from the RequestContext.
The cinder and neutron client modules are also adjusted to make use of
the new user_auth keyword argument so that nova calls made with
anonymous admin request contexts can authenticate as a service when
configured.
Related-Bug: #2004555
Change-Id: I14df2d55f4b2f0
(cherry picked from commit 41c64b94b0af333
(cherry picked from commit 1f781423ee4224c
(cherry picked from commit 0d6dd6c67f56c9d
melanie witt (melwitt) wrote : | #224 |
> Looks like it doesn't know about the "force" keyword that's being passed.
Hi Zakhar,
I checked through and found one missing kwarg for the LibvirtNetVolum
I had incorrectly thought the Xena and Wallaby patches were identical but there is a slight difference. Apologies for that.
I have updated the gerrit patch review with the change:
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/yoga) | #225 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit cb105dc293ff1cd
Author: Sean Mooney <email address hidden>
Date: Wed May 10 20:58:47 2023 +0100
always add service_user section to nova.conf
As of I3629b84d3255a8
now requires the service_user section to be configured
to address CVE-2023-2088. This change adds
the service user section to the nova.conf template in
the nova and nova-cell roles.
Related-Bug: #2004555
Signed-off-by: Sven Kieske <email address hidden>
Change-Id: I2189dafca070ac
(cherry picked from commit a77ea13ef199154
(cherry picked from commit 03c12abbcc107bf
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/zed) | #226 |
Related fix proposed to branch: stable/zed
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/zed) | #227 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/zed
commit efe6650d09441b0
Author: Sean Mooney <email address hidden>
Date: Wed May 10 20:58:47 2023 +0100
always add service_user section to nova.conf
As of I3629b84d3255a8
now requires the service_user section to be configured
to address CVE-2023-2088. This change adds
the service user section to the nova.conf template in
the nova and nova-cell roles.
Related-Bug: #2004555
Signed-off-by: Sven Kieske <email address hidden>
Change-Id: I2189dafca070ac
(cherry picked from commit a77ea13ef199154
(cherry picked from commit 03c12abbcc107bf
(cherry picked from commit cb105dc293ff1cd
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (master) | #228 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit ddadaa282e72cc4
Author: Sean Mooney <email address hidden>
Date: Wed May 10 20:58:47 2023 +0100
always add service_user section to nova.conf
As of I3629b84d3255a8
now requires the service_user section to be configured
to address CVE-2023-2088. This change adds
the service user section to the nova.conf template in
the nova and nova-cell roles.
Related-Bug: #2004555
Signed-off-by: Sven Kieske <email address hidden>
Change-Id: I2189dafca070ac
(cherry picked from commit a77ea13ef199154
(cherry picked from commit 03c12abbcc107bf
(cherry picked from commit cb105dc293ff1cd
(cherry picked from commit efe6650d09441b0
OpenStack Infra (hudson-openstack) wrote : Related fix merged to glance_store (master) | #229 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit ce86bf38239e396
Author: Brian Rosmaita <email address hidden>
Date: Thu May 11 12:12:51 2023 -0400
Update 'extras' for cinder driver
Raise the min version of os-brick to include the fix for
CVE-2023-2088.
Change-Id: If3dba01d5cbb3a
Related-bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Related fix merged to glance_store (stable/2023.1) | #230 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/2023.1
commit 4f4de2348f38a62
Author: Brian Rosmaita <email address hidden>
Date: Wed May 10 15:49:52 2023 -0400
Update 'extras' for cinder driver
Raise the min version of os-brick to include the fix for
CVE-2023-2088.
Change-Id: I4433df9414129a
Related-bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Related fix merged to glance_store (stable/zed) | #231 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/zed
commit 02ab740fbf2a2fb
Author: Brian Rosmaita <email address hidden>
Date: Wed May 10 20:13:57 2023 -0400
Update 'extras' for cinder driver
Raise the min version of os-brick to include the fix for
CVE-2023-2088.
Change-Id: I6c55fc943d26a8
Related-bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Related fix merged to glance_store (stable/yoga) | #232 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit 712eb6df3b79009
Author: Brian Rosmaita <email address hidden>
Date: Wed May 10 20:17:36 2023 -0400
Update 'extras' for cinder driver
Raise the min version of os-brick to include the fix for
CVE-2023-2088.
Change-Id: Ic8bc4d7ae7e38e
Related-bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena) | #233 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/xena
commit b574901500d9364
Author: melanie witt <email address hidden>
Date: Wed Feb 15 22:37:40 2023 +0000
Use force=True for os-brick disconnect during delete
The 'force' parameter of os-brick's disconnect_volume() method allows
callers to ignore flushing errors and ensure that devices are being
removed from the host.
We should use force=True when we are going to delete an instance to
avoid leaving leftover devices connected to the compute host which
could then potentially be reused to map to volumes to an instance that
should not have access to those volumes.
We can use force=True even when disconnecting a volume that will not be
deleted on termination because os-brick will always attempt to flush
and disconnect gracefully before forcefully removing devices.
Conflicts:
NOTE(melwitt): The conflicts are because change
Ic314b26695
not in Xena.
Closes-Bug: #2004555
Change-Id: I3629b84d3255a8
(cherry picked from commit db455548a12beac
(cherry picked from commit efb01985db88d63
(cherry picked from commit 8b4b99149a35663
(cherry picked from commit 4d8efa2d196f72f
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/xena) | #234 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/xena
commit 6cc4e7fb9ac4960
Author: melanie witt <email address hidden>
Date: Tue May 9 03:11:25 2023 +0000
Enable use of service user token with admin context
When the [service_user] section is configured in nova.conf, nova will
have the ability to send a service user token alongside the user's
token. The service user token is sent when nova calls other services'
REST APIs to authenticate as a service, and service calls can sometimes
have elevated privileges.
Currently, nova does not however have the ability to send a service user
token with an admin context. This means that when nova makes REST API
calls to other services with an anonymous admin RequestContext (such as
in nova-manage or periodic tasks), it will not be authenticated as a
service.
This adds a keyword argument to service_
enable callers to provide a user_auth object instead of attempting to
extract the user_auth from the RequestContext.
The cinder and neutron client modules are also adjusted to make use of
the new user_auth keyword argument so that nova calls made with
anonymous admin request contexts can authenticate as a service when
configured.
Related-Bug: #2004555
Change-Id: I14df2d55f4b2f0
(cherry picked from commit 41c64b94b0af333
(cherry picked from commit 1f781423ee4224c
(cherry picked from commit 0d6dd6c67f56c9d
(cherry picked from commit 98c3e3707c08a07
Zakhar Kirpichenko (kzakhar) wrote : | #235 |
I apologize for the late response. My volumes are Ceph RBD, not sure which driver Nova uses internally.
Thanks for your feedback and fixes, everyone!
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/glance_store 4.3.1 | #236 |
This issue was fixed in the openstack/
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/glance_store 3.0.1 | #237 |
This issue was fixed in the openstack/
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/glance_store 4.1.1 | #238 |
This issue was fixed in the openstack/
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.2.0 | #239 |
This issue was fixed in the openstack/nova 25.2.0 release.
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 26.2.0 | #240 |
This issue was fixed in the openstack/nova 26.2.0 release.
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 22.1.0 | #241 |
This issue was fixed in the openstack/cinder 22.1.0 release.
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 27.1.0 | #242 |
This issue was fixed in the openstack/nova 27.1.0 release.
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 20.3.0 | #243 |
This issue was fixed in the openstack/cinder 20.3.0 release.
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 21.3.0 | #244 |
This issue was fixed in the openstack/cinder 21.3.0 release.
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master) | #245 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby) | #246 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit 5b4cb92aa8adab2
Author: melanie witt <email address hidden>
Date: Wed Feb 15 22:37:40 2023 +0000
Use force=True for os-brick disconnect during delete
The 'force' parameter of os-brick's disconnect_volume() method allows
callers to ignore flushing errors and ensure that devices are being
removed from the host.
We should use force=True when we are going to delete an instance to
avoid leaving leftover devices connected to the compute host which
could then potentially be reused to map to volumes to an instance that
should not have access to those volumes.
We can use force=True even when disconnecting a volume that will not be
deleted on termination because os-brick will always attempt to flush
and disconnect gracefully before forcefully removing devices.
Conflicts:
NOTE(melwitt): The conflicts are because change
Ic314b26695
not in Xena.
NOTE(melwitt): The difference from the cherry picked change is because
of the following additional affected volume driver in Wallaby:
* nova/virt/
Closes-Bug: #2004555
Change-Id: I3629b84d3255a8
(cherry picked from commit db455548a12beac
(cherry picked from commit efb01985db88d63
(cherry picked from commit 8b4b99149a35663
(cherry picked from commit 4d8efa2d196f72f
(cherry picked from commit b574901500d9364
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ossa (master) | #247 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 136b24c5ddfaff6
Author: Jeremy Stanley <email address hidden>
Date: Mon May 15 18:52:55 2023 +0000
Add errata 3 for OSSA-2023-003
Since this only impacts the fix for stable/wallaby which is not
under normal maintenance, we'll dispose with the usual errata
announcements.
Change-Id: Ibd0d1d796012fb
Related-Bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/wallaby) | #248 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit 48150a6fbab7e2a
Author: melanie witt <email address hidden>
Date: Tue May 9 03:11:25 2023 +0000
Enable use of service user token with admin context
When the [service_user] section is configured in nova.conf, nova will
have the ability to send a service user token alongside the user's
token. The service user token is sent when nova calls other services'
REST APIs to authenticate as a service, and service calls can sometimes
have elevated privileges.
Currently, nova does not however have the ability to send a service user
token with an admin context. This means that when nova makes REST API
calls to other services with an anonymous admin RequestContext (such as
in nova-manage or periodic tasks), it will not be authenticated as a
service.
This adds a keyword argument to service_
enable callers to provide a user_auth object instead of attempting to
extract the user_auth from the RequestContext.
The cinder and neutron client modules are also adjusted to make use of
the new user_auth keyword argument so that nova calls made with
anonymous admin request contexts can authenticate as a service when
configured.
Related-Bug: #2004555
Change-Id: I14df2d55f4b2f0
(cherry picked from commit 41c64b94b0af333
(cherry picked from commit 1f781423ee4224c
(cherry picked from commit 0d6dd6c67f56c9d
(cherry picked from commit 98c3e3707c08a07
(cherry picked from commit 6cc4e7fb9ac4960
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (master) | #249 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/xena) | #250 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/xena
commit 68fdc323369943f
Author: Gorka Eguileor <email address hidden>
Date: Thu Feb 16 15:57:15 2023 +0100
Reject unsafe delete attachment calls
Due to how the Linux SCSI kernel driver works there are some storage
systems, such as iSCSI with shared targets, where a normal user can
access other projects' volume data connected to the same compute host
using the attachments REST API.
This affects both single and multi-pathed connections.
To prevent users from doing this, unintentionally or maliciously,
cinder-api will now reject some delete attachment requests that are
deemed unsafe.
Cinder will process the delete attachment request normally in the
following cases:
- The request comes from an OpenStack service that is sending the
service token that has one of the roles in `service_
- Attachment doesn't have an instance_uuid value
- The instance for the attachment doesn't exist in Nova
- According to Nova the volume is not connected to the instance
- Nova is not using this attachment record
There are 3 operations in the actions REST API endpoint that can be used
for an attack:
- `os-terminate_
- `os-detach`: Detach a volume
- `os-force_detach`: Force detach a volume
In this endpoint we just won't allow most requests not coming from a
service. The rules we apply are the same as for attachment delete
explained earlier, but in this case we may not have the attachment id
and be more restrictive. This should not be a problem for normal
operations because:
- Cinder backup doesn't use the REST API but RPC calls via RabbitMQ
- Glance doesn't use this interface
Checking whether it's a service or not is done at the cinder-api level
by checking that the service user that made the call has at least one of
the roles in the `service_
retrieved from keystone by the keystone middleware using the value of
the "X-Service-Token" header.
If Cinder is configured with `service_
an attacker provides non-service valid credentials the service will
return a 401 error, otherwise it'll return 409 as if a normal user had
made the call without the service token.
Closes-Bug: #2004555
Change-Id: I612905a1bf4a17
(cherry picked from commit 6df1839bdf28810
Conflicts:
(cherry picked from commit dd6010a9f7bf8cb
(cherry picked from commit cb4682fb8369122
Conflicts:
(cherry picked from commit a66f4afa22fc5a0
Conflicts:
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master) | #251 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 1101402b8fda742
Author: Gorka Eguileor <email address hidden>
Date: Wed May 17 13:42:41 2023 +0200
Doc: Improve service token
This patch extends a bit the documentation for the service token
configuration, since there have been complains about its clarity and
completeness.
Related-Bug: #2004555
Change-Id: Id89497d068c164
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria) | #252 |
Fix proposed to branch: stable/victoria
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/xena) | #253 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/xena
commit 70493735d2f9952
Author: Gorka Eguileor <email address hidden>
Date: Wed Mar 1 13:08:16 2023 +0100
Support force disconnect for FC
This patch adds support for the force and ignore_errors on the
disconnect_
connector.
Related-Bug: #2004555
Change-Id: Ia74ecfba03ba23
(cherry picked from commit 570df49db9de303
Conflicts:
(cherry picked from commit 111b3931a2db1d5
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/wallaby) | #254 |
Related fix proposed to branch: stable/wallaby
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/victoria) | #255 |
Related fix proposed to branch: stable/victoria
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/ussuri) | #256 |
Related fix proposed to branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/train) | #257 |
Related fix proposed to branch: stable/train
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/wallaby) | #258 |
Related fix proposed to branch: stable/wallaby
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/victoria) | #259 |
Related fix proposed to branch: stable/victoria
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/ussuri) | #260 |
Related fix proposed to branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/train) | #261 |
Related fix proposed to branch: stable/train
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/wallaby) | #262 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit 5dcda6b961fa765
Author: Brian Rosmaita <email address hidden>
Date: Wed Jun 7 18:29:20 2023 -0400
[stable-
The Cinder project team does not intend to backport a fix for
CVE-2023-2088 to stable/wallaby, so add a warning to the README
so that consumers are aware of the vulnerability of this branch
of the os-brick code.
Change-Id: I6345a5a3a7c08c
Related-bug: #2004555
tags: | added: in-stable-ussuri |
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/ussuri) | #263 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/ussuri
commit 2845871c87fc4e6
Author: Brian Rosmaita <email address hidden>
Date: Wed Jun 7 18:29:20 2023 -0400
[stable-
The Cinder project team does not intend to backport a fix for
CVE-2023-2088 to stable/ussuri, so add a warning to the README
so that consumers are aware of the vulnerability of this branch
of the os-brick code.
Change-Id: Ie54cfc6697b4e5
Related-bug: #2004555
tags: | added: in-stable-victoria |
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/victoria) | #264 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/victoria
commit 78a0ea24a586139
Author: Brian Rosmaita <email address hidden>
Date: Wed Jun 7 18:29:20 2023 -0400
[stable-
The Cinder project team does not intend to backport a fix for
CVE-2023-2088 to stable/victoria, so add a warning to the README
so that consumers are aware of the vulnerability of this branch
of the os-brick code.
Change-Id: I37da3be26c7099
Related-bug: #2004555
tags: | added: in-stable-train |
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/train) | #265 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/train
commit 0cc7019eec2b58f
Author: Brian Rosmaita <email address hidden>
Date: Wed Jun 7 18:29:20 2023 -0400
[stable-
The Cinder project team does not intend to backport a fix for
CVE-2023-2088 to stable/train, so add a warning to the README
so that consumers are aware of the vulnerability of this branch
of the os-brick code.
Change-Id: I6d04c164521b72
Related-bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/train) | #266 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/train
commit 299553a4fe281cd
Author: Brian Rosmaita <email address hidden>
Date: Wed Jun 7 18:01:12 2023 -0400
[stable-
The Cinder project team does not intend to backport a fix for
CVE-2023-2088 to stable/train, so add a warning to the README
so that consumers are aware of the vulnerability of this branch
of the cinder code.
Change-Id: I1621e3d3d9272a
Related-bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/victoria) | #267 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/victoria
commit 63d7848a9548180
Author: Brian Rosmaita <email address hidden>
Date: Wed Jun 7 18:01:12 2023 -0400
[stable-
The Cinder project team does not intend to backport a fix for
CVE-2023-2088 to stable/victoria, so add a warning to the README
so that consumers are aware of the vulnerability of this branch
of the cinder code.
Change-Id: I2866b0ca1511a5
Related-bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/ussuri) | #268 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/ussuri
commit 60f705d722fc6b7
Author: Brian Rosmaita <email address hidden>
Date: Wed Jun 7 18:01:12 2023 -0400
[stable-
The Cinder project team does not intend to backport a fix for
CVE-2023-2088 to stable/ussuri, so add a warning to the README
so that consumers are aware of the vulnerability of this branch
of the cinder code.
Change-Id: I5c55ab7ca6c85d
Related-bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/wallaby) | #269 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit 2fef6c41fa8c5ea
Author: Brian Rosmaita <email address hidden>
Date: Wed Jun 7 18:01:12 2023 -0400
[stable-
The Cinder project team does not intend to backport a fix for
CVE-2023-2088 to stable/wallaby, so add a warning to the README
so that consumers are aware of the vulnerability of this branch
of the cinder code.
Change-Id: I83b52320762505
Related-bug: #2004555
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 23.0.0.0rc1 | #270 |
This issue was fixed in the openstack/cinder 23.0.0.0rc1 release candidate.
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 28.0.0.0rc1 | #271 |
This issue was fixed in the openstack/nova 28.0.0.0rc1 release candidate.
Since this report concerns a possible security risk, an incomplete
security advisory task has been added while the core security
reviewers for the affected project or projects confirm the bug and
discuss the scope of any vulnerability along with potential
solutions.