Cannot mount old encrypted volume to an instance with Invalid password, cannot unlock any keyslot

Bug #1996622 reported by Jan Wasilewski
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Barbican
New
Undecided
Unassigned

Bug Description

Description
===========
After an upgrade of barbican from ussuri to yoga version there is no possibility to attach encrypted volumes created before an upgrade to any instance, because of an error: "libvirt.libvirtError: internal error: unable to execute QEMU command 'blockdev-add': Invalid password, cannot unlock any keyslot". Encrypted volumes created after an upgrade are able to attach to instances, without such error.
So far there is no workaround. Tried to detach and attach volume again, tried to convert a volume to an image and back to volume, but no luck.

Steps to reproduce
==================
1. Have already created encrypted volume
2. Execute command:
openstack server add volume my-new-instance my-old-encrypted-volume
3. Check attachments details by:
openstack server show my-new-instance

Expected result
===============
my-old-encrypted-volume visible in volumes_attached list. Inside VM OS newly attached drive should be visible

Actual result
=============
my-old-encrypted-volume is not visible in volumes_attached list. During attachment I'm able to see such errors in nova-compute logs: https://paste.openstack.org/show/bNbPOHiQJOq8OsKZ5Gn2/
Barbican logs or cinder logs are not saying anything wrong. What is more, I can correctly retrieve a payload of a key from barbican and secret, which is used for keeping passphrase for a my-old-encrypted-volume, by command:
barbican secret get --payload_content_type application/octet-stream secret-id-and-href --file my_symmetric_key.key

The same procedure, executed for a freshly created volume is working fine - new encrypted disk is visible inside instance OS.

Environment
===========
1. Exact version of OpenStack you are running. See the following
# dpkg -l | grep nova
ii nova-api 2:21.2.4-0ubuntu1 all OpenStack Compute - API frontend
ii nova-common 2:21.2.4-0ubuntu1 all OpenStack Compute - common files
ii nova-conductor 2:21.2.4-0ubuntu1 all OpenStack Compute - conductor service
ii nova-novncproxy 2:21.2.4-0ubuntu1 all OpenStack Compute - NoVNC proxy
ii nova-scheduler 2:21.2.4-0ubuntu1 all OpenStack Compute - virtual machine scheduler
ii python3-nova 2:21.2.4-0ubuntu1 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:17.0.0-0ubuntu1 all client library for OpenStack Compute API - 3.x

# dpkg -l | grep barbican
ii barbican-api 2:14.0.0-0ubuntu1~cloud0 all OpenStack Key Management Service - API Server
ii barbican-common 2:14.0.0-0ubuntu1~cloud0 all OpenStack Key Management Service - common files
ii barbican-keystone-listener 2:14.0.0-0ubuntu1~cloud0 all OpenStack Key Management Service - Keystone Listener
ii barbican-worker 2:14.0.0-0ubuntu1~cloud0 all OpenStack Key Management Service - Worker Node
ii python3-barbican 2:14.0.0-0ubuntu1~cloud0 all OpenStack Key Management Service - Python 3 files
ii python3-barbicanclient 5.2.0-0ubuntu1~cloud0 all OpenStack Key Management API client - Python 3.x

2. Which hypervisor did you use?
Libvirt:
# dpkg -l | grep libvirt
ii libvirt-daemon 6.0.0-0ubuntu8.16 amd64 Virtualization daemon
ii libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.16 amd64 Virtualization daemon QEMU connection driver
ii libvirt-daemon-driver-storage-rbd 6.0.0-0ubuntu8.16 amd64 Virtualization daemon RBD storage driver
ii libvirt0:amd64 6.0.0-0ubuntu8.16 amd64 library for interfacing with different virtualization systems
ii python3-libvirt 6.1.0-1 amd64 libvirt Python 3 bindings

2. Which storage type did you use?
iSCSI Huawei dorado

3. Which networking type did you use?
Neutron linuxbridge

Logs & Configs
==============
An error message from nova-compute log: https://paste.openstack.org/show/bNbPOHiQJOq8OsKZ5Gn2/

Tags: cinder volumes
description: updated
description: updated
Revision history for this message
Jan Wasilewski (janwasilewski) wrote (last edit ):

After further troubleshooting I realized that issue is related to barbican. Seems that value key inside vault, received an incorrect luks passphrase(do not know how right now) and when volume is trying to be migrated, it receives a wrong luks passphrase and nova responses with error code which is a correct behavior. Seems that adding a new index to vault was done incorrectly in the past(in ussuri version?) and was fixed in the meantime as it is not visible anymore. Or it's not fixed and a key value can be somehow retrieved "modified" in a future? As I can see that vault key value was not modified from the beginning, seems like it was delivered to nova as a different value. So far I cannot find a relation for that in code, but definitely it's not a nova issue.

affects: nova → barbican
Revision history for this message
Erwan MALIK (erwan-m) wrote (last edit ):
Download full text (3.8 KiB)

We got the same kind of issue after upgrading barbican from Train to Xena.

You could check this :
1. get the secret id of the LUKS volume that nova refuse to mount either with cinder volume show or
select id, encryption_key_id from cinder.volumes where id = '<volume_uuid>';

2. list the attributes of your secret (you can compare it with a secret linked to a newly encrypted LUKS volume (one that is OK))
select * from barbican.secret_store_metadata where secret_id = '<secret_uudi>' and deleted = 0;
# (pay attention to secret_store_metadata.key = 'version')

3. retrieve the secret's payload (in a file or with "-p")
openstack secret get --payload_content_type application/octet-stream <secret_id> --file test.key

4. check the content of the retrieved payload:
- if you get binary data it should work (your issue is not the one I'll describe)
- if you get base64 data this is the issue I'll describe

If your retrieved payload is base64 encoded chances are that you're hit by a double encoding issue.
- your old secret symmetric key is stored in base64
- barbican thinks (wrongly) that your secret is stored in binary/plaintext and it will b64encode it (at this step the payload is thus a base64 of base64 data)
- before providing you with the final payload barbican will b64decode it... once...
In the end you get a base64 encoded key and nova will be unable to unlock the LUKS volume with that one

If this is your issue: activate the debug mode of barbican API then try to migrate the vm with the "old" encrypted volume (the one with the issue)
See if you have that kind of message "Retrieving legacy secret" and/or "Encoding legacy Castellan-generated key"

You could check the code of castellan_secret_store.py (barbican_api if you use kolla) and look for "get_secret" function (this one checks if your secret have a "version" attribute and call _ensure_legacy_base64 if it is missing) and "_ensure_legacy_base64":
"""
    def _ensure_legacy_base64(self, secret):
        """Ensure secret data is base64 encoded

        This method ensures that secrets that were stored prior to the fix
        for Story 2008335 are base64 encoded.
        """
        payload = secret.get_encoded()
        if isinstance(secret, key.Key):
            # Keys generated by Castellan are not base64-encoded.
            # Both symmetric and asymmetric keys returned by Castellan
            # are subclasses of key.Key
            LOG.debug("Encoding legacy Castellan-generated key")
            return base64.b64encode(payload)
        else:
            # Objects stored by Barbican are stored as opaque_data.OpaqueData
            # in Castellan. They should already be base64-encoded so we
            # check here to make sure.
            LOG.debug("Validating base64 encoding")
"""
[a link to the code of Xena-oem]
https://opendev.org/openstack/barbican/src/tag/xena-eom/barbican/plugin/castellan_secret_store.py#L58-L84

Once you're there you could try this:
- patch this function and replace the "return base64.b64encode(payload)" by the same try/except block used in the second branch (it tries to b64decode the payload and returns the payload on success and b64encode of the payload otherwise...

Read more...

Revision history for this message
Erwan MALIK (erwan-m) wrote :

If you encounter this issue after a migration to Xena and beyond (the code is currently in every branch from Xena to 2024.2 and master), see if you have this fix:
https://opendev.org/openstack/barbican/commit/b9daa100d03afda70e2ed9bfa91d0895fd385229#diff-e642a16db0b278d9ba1ea527def5f47b525e7002
(commit : b9daa100d03afda70e2ed9bfa91d0895fd385229)

It works well for the new secrets/LUKS volume but it could break operations on old LUKS volume.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.