cinder-volume fails to allocate volume due to missing iscsi InitiatorName

Bug #1825809 reported by Pedro Guimarães
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Cinder Charm
Fix Released
Medium
Unassigned
OpenStack Cinder Pure Storage Charm
Invalid
Undecided
Unassigned

Bug Description

Distro: bionic-rocky
Running OpenStack on top of OpenStack for lab purposes.

This bug is a following on my investigations first described at: https://bugs.launchpad.net/charm-cinder/+bug/1825159/comments/2

Deploying cinder-volume separated from other cinder services, as explained on: https://jujucharms.com/cinder/

When running a successfully created LVM (checked via lsblk) with
openstack server add volume MACHINE VOL

Cinder-Volume fails because missing "initiator" parameter is missing on RPC call [1].

That happens even if I deploy cinder-volume directly to a machine, with no other charms or services alongside.

Looking into that machine, I can see that tgt is Running, however, iscsid is inactive.
If I restart iscsid, I can get its InitiatorName on: /etc/iscsi/initiatorname.iscsi

However, even after restarting with following order:
systemctl restart iscsid
systemctl restart tgt
systemctl restart cinder-volume

I still see same failure as [1].

I can force cinder-volume to work if I inject /etc/iscsi/initiatorname.iscsi as "initiator" parameter on RPC call.

To test that, I (1) stopped both jujud and cinder-volume services;
(2) create a super user on rabbitmq [2]
(3) ran openstack server add volume .... --> that pulls the message on rabbitmq
(4) Pop up the rabbitmq original message and re-add with my injected parameter, by running [3]
(5) Restart cinder-volume.

(3) - (5) need to happen in less than 60s or attachment_id table token will expire.

Then, volume gets successfully attached to my machine.
I still cannot find a viable solution for cinder to pick-up initiatorName on the first place with only configurations as described on jujucharms page.
I do know we can set specific flags to set iscsi initiator, but I rather not do it outside our standard configs and relations.

[1] Full log: https://pastebin.canonical.com/p/4WnrKvKNvp/

2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/volume/manager.py", line 4328, in _connection_create
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server self.driver.validate_connector(connector)
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/volume/drivers/lvm.py", line 835, in validate_connector
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server return self.target_driver.validate_connector(connector)
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/volume/targets/iscsi.py", line 299, in validate_connector
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server raise exception.InvalidConnectorException(missing='initiator')
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server cinder.exception.InvalidConnectorException: Connector doesn't have required information: initiator
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/volume/manager.py", line 4411, in attachment_update
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server connector)
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server File "/usr/lib/python3/dist-packages/cinder/volume/manager.py", line 4330, in _connection_create
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server raise exception.InvalidInput(reason=six.text_type(err))
2019-04-18 19:48:18.231 91736 ERROR oslo_messaging.rpc.server cinder.exception.InvalidInput: Invalid input received: Connector doesn't have required information: initiator

[2] on rabbitmq-server:
sudo rabbitmqctl add_user test test
sudo rabbitmqctl set_user_tags test administrator
sudo rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*"

[3] Run following script at rabbitmq-server machine (<email address hidden> is the target-queue):

#!/bin/bash

export payload=$(curl -XPOST -d'{"count":1,"requeue":"false","encoding":"auto"}' http://test:test@localhost:<email address hidden>/get | awk -F"\"payload\":\"" '{print $2}' | awk -F"\",\"payload_encoding\"" '{print $1}' | sed 's%\\\\\\\"connector\\\\\\\": {%\\\\\\\"connector\\\\\\\": {\\\\\\\"initiator\\\\\\\": \\\\\\\"iqn.1993-08.org.debian:01:f7558af36127\\\\\\\", %g')
payload="\"$payload\""
echo "$payload"
curl -XPOST -d'{"properties": {"expiration":"60000","priority":0,"delivery_mode":2,"headers":{},"content_encoding":"utf-8","content_type":"application/json"}, "routing_key":"<email address hidden>","payload":'"$payload"' ,"payload_encoding":"string"}' http://test:test@localhost:15672/api/exchanges/openstack/amq.default/publish

Revision history for this message
Ryan Beisner (1chb1n) wrote :

In pretty much all charm bug cases, an example bundle is the first thing I reference. Can you please post a sanitized example bundle (reproducer)?

Also, pastebin is ephemeral, and that information will disappear, perhaps even before this bug is resolved. Can you please attach a raw text log to the bug instead?

Thank you.

Revision history for this message
Pedro Guimarães (pguimaraes) wrote :
Download full text (14.7 KiB)

Ryan, here is the bundle: https://pastebin.canonical.com/p/DD4YhgDTpX/

This is the relevant part of the bundle (rest came from openstack-bundles):
  cinder:
    annotations:
      gui-x: '750'
      gui-y: '0'
    charm: cs:cinder-276
    num_units: 1
    options:
      enabled-services: api,scheduler
      glance-api-version: 2
      openstack-origin: cloud:bionic-rocky
      worker-multiplier: 0.25
#### config-flags: "scheduler_default_filters = AvailabilityZoneFilter,CapacityFilter,CapabilitiesFilter,InstanceLocalityFilter"
    to:
    - 2
  cinder-volume:
    charm: cs:cinder-276
    num_units: 1
    options:
      enabled-services: volume
      block-device: /dev/vde
      overwrite: 'true'
      glance-api-version: 2
      openstack-origin: cloud:bionic-rocky
      worker-multiplier: 0.25
#### config-flags: "scheduler_default_filters = AvailabilityZoneFilter,CapacityFilter,CapabilitiesFilter,InstanceLocalityFilter"
    to:
    - 1

For the logs:

2019-04-18 19:47:35.681 91736 DEBUG cinder.volume.manager [req-9221f149-f45c-4947-8801-5499940b7a64 efc745ee079343b0bbb945cb49bf29c3 28a7747fc8dd4f358a58bb95d4cac292 - e05d95fc0fd44b969eaaa66e89ccccde e05d95fc0fd44b969eaaa66e89ccccde] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeRefTask;volume:create' (486503f1-c5d3-470d-b0c1-2becab9f6fea) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python3/dist-packages/taskflow/listeners/logging.py:194
2019-04-18 19:47:35.705 91736 DEBUG oslo_db.sqlalchemy.engines [req-9221f149-f45c-4947-8801-5499940b7a64 efc745ee079343b0bbb945cb49bf29c3 28a7747fc8dd4f358a58bb95d4cac292 - e05d95fc0fd44b969eaaa66e89ccccde e05d95fc0fd44b969eaaa66e89ccccde] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/engines.py:308
2019-04-18 19:47:35.728 91736 DEBUG cinder.volume.manager [req-9221f149-f45c-4947-8801-5499940b7a64 efc745ee079343b0bbb945cb49bf29c3 28a7747fc8dd4f358a58bb95d4cac292 - e05d95fc0fd44b969eaaa66e89ccccde e05d95fc0fd44b969eaaa66e89ccccde] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeRefTask;volume:create' (486503f1-c5d3-470d-b0c1-2becab9f6fea) transitioned into state 'SUCCESS' from state 'RUNNING' with result 'Volume(_name_id=None,admin_metadata={},attach_status='detached',availability_zone='nova',bootable=False,cluster=<?>,cluster_name=None,consistencygroup=<?>,consistencygroup_id=None,created_at=2019-04-18T19:47:35Z,deleted=False,deleted_at=None,display_description=None,display_name=None,ec2_id=None,encryption_key_id=None,glance_metadata=<?>,group=<?>,group_id=None,host='juju-87a5c8-default-1@LVM#LVM',id=53ef8b0f-8450-48cc-b6ac-a15ef02bc566,launched_at=None,metadata={},migration_status=None,multiattach=False,previous_status=None,project_id='28a7747fc8dd4f358a58bb95d4cac292',provider_auth=None,provider_geometry=None,provider_id=None,provider_location=None,replication_driver_data=None,replication_extended_status=None,replication_status=None,scheduled_at=2019-04-18T19:47:36Z,service_uuid=None,...

Changed in charm-cinder:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Trent Lloyd (lathiat) wrote :

Ran into this today testing charm-cinder-purestorage integration.

Summary of my findings:
- Some operations want to connect and use the volume direct from the 'cinder' application (and sometimes glance also I believe), rather than from nova-compute, such as creating a volume to do boot-from-volume using purestorage as the volume source.

- On nova-compute the 'iscsid' service is enabled and started by the nova-compute charm

- On cinder the 'iscsid' service is not enabled and started by charm-cinder-purestorage (or charm-cinder). So /etc/iscsi/initiatorname.iscsi still contains Generate=true and isn't done until iscsid is started.

- But even if it was started, the 'iscsi' client cannot function inside an LXD container, so the iscsid service fails to start due to the ConditionVirtualization=!private-users as iscsi access mounts a block volume and connects the kernel to a network service - so isn't a generally safe LXD option.

- This is likely to work in openstack-on-openstack labs (where cinder is a full VM with a real kernel) and fail on openstack-on-metal where cinder is inside an LXD

This doesn't seem documented anywhere I can find. Going to target this bug at charm-cinder-purestorage as well.

Likely only solution is to deploy glance and cinder to KVM instead of LXD. I'm not sure how well that is supported particularly with multiple spaces, etc. Need to look into it.

Changed in charm-cinder-purestorage:
status: New → Confirmed
Revision history for this message
Derek Robertson (rober546) wrote :

Finding exact same issue as Trent described above.

A config option on the cinder charm to enable iscsid would be useful.

I have cinder with two backends - ceph + iscsi array.

If I create a volume on the iscsi array from an image source on the ceph backend, the cinder unit has to map a device to the iscsi array to copy the image across. This fails because iscsid is disabled.

Revision history for this message
Trent Lloyd (lathiat) wrote :

Nobuto pointed out to me the iscsid is normlaly activated by iscsi.socket when an iscsiadm command is called. During activation the pre-start check by systemd calls /lib/open-iscsi/startup-checks.sh which generates the IQN

The problem is openstack wants to get the IQN from that file beofre it calls an iscsiadm command. So chicken and egg type problem.

Could possibly call /lib/open-iscsi/startup-checks.sh to generate the IQN from one of the charms.. this is also affect other iSCSI charms such as cinder-oceanstor

Revision history for this message
Nobuto Murata (nobuto) wrote :

> Nobuto pointed out to me the iscsid is normlaly activated by iscsi.socket when an iscsiadm command is called. During activation the pre-start check by systemd calls /lib/open-iscsi/startup-checks.sh which generates the IQN
>
> The problem is openstack wants to get the IQN from that file beofre it calls an iscsiadm command. So chicken and egg type problem.
>
> Could possibly call /lib/open-iscsi/startup-checks.sh to generate the IQN from one of the charms.. this is also affect other iSCSI charms such as cinder-oceanstor
> See full activity log

I'm just not sure if generating a unique initiator name is enough or if iscsid is really required to be up and running always. If the latter, we should enable it on boot (i.e., systemctl enable iscsid) not just starting it on the charm installation time. Otherwise it will be broken again after a reboot.

As a side note, charm-nova-compute is doing one restart on config-changed, but doesn't enable the systemd unit itself.

Revision history for this message
Nobuto Murata (nobuto) wrote :

If this is the place of getting the initiator name (I'm not 100% sure), then generating the name by /lib/open-iscsi/startup-checks.sh or iscsid.servie would be enough.

https://github.com/openstack/os-brick/blob/a65a3261bc6f25b0c889490107ee866c2ee1d2bb/os_brick/initiator/connectors/iscsi.py#L997-L1010
> if line.startswith('InitiatorName='):
> return line[line.index('=') + 1:].strip()

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-cinder (master)
Changed in charm-cinder:
status: Triaged → In Progress
Revision history for this message
Nobuto Murata (nobuto) wrote :

Subscribing ~field-medium.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-cinder (master)

Reviewed: https://review.opendev.org/c/openstack/charm-cinder/+/810339
Committed: https://opendev.org/openstack/charm-cinder/commit/bde329d973f2d66f800a15d0f4721fc949a609b1
Submitter: "Zuul (22348)"
Branch: master

commit bde329d973f2d66f800a15d0f4721fc949a609b1
Author: Nobuto Murata <email address hidden>
Date: Wed Sep 22 13:53:33 2021 +0900

    Make sure iscsid has a unique InitiatorName

    os_brick may require InitiatorName in /etc/iscsi/initiatorname.iscsi
    before iscsid is invoked via iscsid.socket with iscsiadm. Cloud images
    including MAAS ones have "GenerateName=yes" instead of "InitiatorName="
    on purpose not to clone the initiator name. Let's initialize it so
    Cinder units can be fully ready to accept iSCSI based subordinate and
    storage backend charms.

    Closes-Bug: 1825809
    Change-Id: I413bbb29dd609e0c86ac3691556f37a9fcc13439

Changed in charm-cinder:
status: In Progress → Fix Committed
Revision history for this message
Nobuto Murata (nobuto) wrote :

Now that a patch to initiate the IQN has been merged to charm-cinder as a principle charm, I'm marking the task for subordinate charm(s) as Invalid since there is no action is required from the subordinate side.

Changed in charm-cinder-purestorage:
status: Confirmed → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-cinder (stable/21.10)

Fix proposed to branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-cinder/+/819838

Changed in charm-cinder:
milestone: none → 21.10
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-cinder (stable/21.10)

Reviewed: https://review.opendev.org/c/openstack/charm-cinder/+/819838
Committed: https://opendev.org/openstack/charm-cinder/commit/82bc1e227b33b3ba7a548dab5e4bdab638f9529e
Submitter: "Zuul (22348)"
Branch: stable/21.10

commit 82bc1e227b33b3ba7a548dab5e4bdab638f9529e
Author: Nobuto Murata <email address hidden>
Date: Wed Sep 22 13:53:33 2021 +0900

    Make sure iscsid has a unique InitiatorName

    os_brick may require InitiatorName in /etc/iscsi/initiatorname.iscsi
    before iscsid is invoked via iscsid.socket with iscsiadm. Cloud images
    including MAAS ones have "GenerateName=yes" instead of "InitiatorName="
    on purpose not to clone the initiator name. Let's initialize it so
    Cinder units can be fully ready to accept iSCSI based subordinate and
    storage backend charms.

    Closes-Bug: 1825809
    Change-Id: I413bbb29dd609e0c86ac3691556f37a9fcc13439
    (cherry picked from commit bde329d973f2d66f800a15d0f4721fc949a609b1)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.