cinder-volume: "Failed to re-export volume, setting to ERROR" with "tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected" on service startup

Bug #1987663 reported by Mauricio Faria de Oliveira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cinder (Ubuntu)
Status tracked in Mantic
Bionic
Won't Fix
Medium
Mauricio Faria de Oliveira
Focal
Confirmed
Medium
Mauricio Faria de Oliveira
Jammy
Confirmed
Medium
Mauricio Faria de Oliveira
Lunar
Confirmed
Medium
Mauricio Faria de Oliveira
Mantic
In Progress
Medium
Mauricio Faria de Oliveira

Bug Description

[Impact]

 * The cinder-volume service might fail to re-export volumes
   in-use on startup if tgt.service isn't fully started yet.

 * This affects the 'lvm' driver with 'tgtadm' target helper
   (which runs 'tgtadm' commands that need the service ready).

 * Snippets from /var/log/cinder/cinder-volume.log:

   Failed to re-export volume, setting to ERROR.
   ...
   Command: tgtadm --lld iscsi --op show --mode target
   ...
   Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
   Transport endpoint is not connected\n'

 * This issue is more common in openstack compute nodes
   with networking (ovs/ovn) that takes long to startup,
   which might delay the startup of tgt.service _after_
   cinder-volume.service.

[Test Steps]

 * Steps to reproduce are detailed in comment #3.
   Summary:

 * Install mysql, rabbitmq-server, keystone, and cinder
   (controller and storage nodes; backup node unneeded).

 * Configure cinder-volume (storage node) for LVM backend
   and tgtadm iSCSI helper (tgt.service).

 * Create a cinder volume, and configure it as 'in-use'.

 * Simulate a start delay on tgt.service with a drop-in.

 * Restart services: cinder-volume.service tgt.service

 * Check sequence of service startup.

 * Check status of the cinder volume:
   'in-use' (expected) or 'error' (bug).

 * Check log file /var/log/cinder/cinder-volume.log for
   'tgtadm: failed to send request hdr to tgt daemon'.

[Regression Potential]

 * The fix introduces systemd unit 'After=' and 'Wants='
   properties for tgt.service in cinder-volume.service,
   thus might delay the boot process (multi-user.target).

     $ systemctl show cinder-volume.service | grep WantedBy=
     WantedBy=multi-user.target

 * However, the boot process already waits on tgt.service
   anyway, thus the difference (if any) should not be big,
   and would provide more correct behavior.

     $ systemctl show tgt.service | grep WantedBy=
     WantedBy=multi-user.target

 * If tgt.service is not present (tgt package not installed)
   _no errors_ occur, as both 'After=' and 'Wants=' are weak
   ordering/dependency properties (man 5 systemd.unit).

[Other Info]

 * The fix uses a systemd service drop-in snippet because
   the service unit is generated by openstack-pkg-tools
   (pkgos-gen-systemd-unit) based on the 'init' service,
   and it only emits 'Wants=' for network-online.target.

 * Changing that in openstack-pkg-tools changes behavior
   in stable releases, and only manifest at build time,
   for many openstack packages that have no issues now.

 * We'll continue to pursue the general improvement in
   Debian, so it comes into Ubuntu development release,
   but for the Ubuntu stable releases, this should do.

[Original Bug Description]

Real-world example in comment #2.

Related branches

Changed in cinder (Ubuntu):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
description: updated
Changed in cinder (Debian):
status: Unknown → Fix Released
summary: - "Failed to re-export volume, setting to ERROR" with "tgtadm: failed to
- send request hdr to tgt daemon, Transport endpoint is not connected" on
- cinder-volume service start
+ cinder-volume: "Failed to re-export volume, setting to ERROR" with
+ "tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is
+ not connected" on service startup
description: updated
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (5.2 KiB)

Real-world example:

1) Unit `ovs-vswitchd.service` took 2 minutes to start up.
2) That delayed `network.target` (`ovs-vswitchd.service` has `Before=network.target`).
3) That delayed `tgt.service` too (it has `After=network.target`).

4) BUT that did _not_ delay `cinder-volume.service` (it does _not_ have `After=tgt.service`)
5) So it failed to talk to tgtd with tgtadm, and volume re-export failed.

 $ cat sos_commands/logs/journalctl_--no-pager_--catalog_--boot \
   | grep -E -B1 'Subject: Unit (cinder-volume.service|tgt.service|ovs-vswitchd.service|network.target)'

 Aug 10 06:23:15 <HOST> systemd[1]: Started OpenStack Cinder Volume.
 -- Subject: Unit cinder-volume.service has finished start-up
 --
 <<< ERROR in cinder-volume.log >>>

 Aug 10 06:23:16 <HOST> systemd[1]: Starting Open vSwitch Forwarding Unit...
 -- Subject: Unit ovs-vswitchd.service has begun start-up
 --
 <<< DELAY of 2 minutes >>>

 Aug 10 06:25:17 <HOST> systemd[1]: Started Open vSwitch Forwarding Unit.
 -- Subject: Unit ovs-vswitchd.service has finished start-up
 --
 Aug 10 06:25:17 <HOST> systemd[1]: Reached target Network.
 -- Subject: Unit network.target has finished start-up
 --
 Aug 10 06:25:17 <HOST> systemd[1]: Starting (i)SCSI target daemon...
 -- Subject: Unit tgt.service has begun start-up
 --
 Aug 10 06:25:19 <HOST> systemd[1]: Started (i)SCSI target daemon.
 -- Subject: Unit tgt.service has finished start-up

 <<< NOW tgtd is running >>>

@ var/log/cinder/cinder-volume.log

  9901 2022-08-10 06:23:21.515 3939 INFO cinder.service [-] Starting cinder-volume node (version 12.0.9)
 ...
  9932 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager [req-e67a852a-6cce-4073-a2de-d3487c85d586 - - - - -] Failed to re-export volume, setting to ERROR.: ProcessExecutionError: Unexpected error while running command.
  9933 Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
  9934 Exit code: 107
  9935 Stdout: u''
  9936 Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
  9937 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Traceback (most recent call last):
  9938 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 486, in init_host
  9939 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.driver.ensure_export(ctxt, volume)
  9940 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 826, in ensure_export
  9941 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.target_driver.ensure_export(context, volume, volume_path)
  9942 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/iscsi.py", line 261, in ensure_export
  9943 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager old_name=None, **portals_config)
  9944 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 818, in _wrapper
  9945 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return r.call(f, ...

Read more...

description: updated
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (13.1 KiB)

Steps to Reproduce (Mantic)

# LXD VM

$ lxc launch --vm --config limits.cpu=2 --config limits.memory=3GiB ubuntu:mantic cinder-mantic
$ lxc shell cinder-mantic

# Apt

apt remove -y unattended-upgrades
apt update

# Network

echo '127.0.0.1 controller' >>/etc/hosts

# MySQL

apt install -y mysql-server python3-pymysql

# RabbitMQ

apt install -y rabbitmq-server

rabbitmqctl add_user openstack RABBIT_PASS
rabbitmqctl set_permissions openstack ".*" ".*" ".*"

# Keystone

cat <<EOF | mysql
CREATE DATABASE keystone;
CREATE USER 'keystone'@'localhost' IDENTIFIED BY 'KEYSTONE_DBPASS';
GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'localhost';
EOF

apt install -y keystone

@ /etc/keystone/keystone.conf

[database]
connection = mysql+pymysql://keystone:KEYSTONE_DBPASS@controller/keystone

[token]
provider = fernet

...

   # mantic has some exceptions ignored in these, but worked fine:

su -s /bin/sh -c "keystone-manage db_sync" keystone # exit code 0
keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone
keystone-manage credential_setup --keystone-user keystone --keystone-group keystone
keystone-manage bootstrap --bootstrap-password ADMIN_PASS \
  --bootstrap-admin-url http://controller:5000/v3/ \
  --bootstrap-internal-url http://controller:5000/v3/ \
  --bootstrap-public-url http://controller:5000/v3/ \
  --bootstrap-region-id RegionOne

echo 'ServerName controller' >>/etc/apache2/apache2.conf
systemctl restart apache2.service

# Openstack Client

cat <<EOF >openstack.rc
export OS_USERNAME=admin
export OS_PASSWORD=ADMIN_PASS
export OS_PROJECT_NAME=admin
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default
export OS_AUTH_URL=http://controller:5000/v3
export OS_IDENTITY_API_VERSION=3
EOF

source openstack.rc

apt install -y python3-openstackclient

openstack project create --domain default \
  --description "Service Project" service

+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | Service Project |
| domain_id | default |
| enabled | True |
| id | 229ef5671d0b4136b32f1a60584ab725 |
| is_domain | False |
| name | service |
| options | {} |
| parent_id | default |
| tags | [] |
+-------------+----------------------------------+

# openstack project list
+----------------------------------+---------+
| ID | Name |
+----------------------------------+---------+
| 0e204837047a4323b8f57ed23bffa4f8 | admin |
| 229ef5671d0b4136b32f1a60584ab725 | service |
+----------------------------------+---------+

# Cinder

cat <<EOF | mysql
CREATE DATABASE cinder;
CREATE USER 'cinder'@'localhost' IDENTIFIED BY 'CINDER_DBPASS';
GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost';
EOF

openstack user create --domain default --password CINDER_USERPASS cinder
openstack role add --project service --user cinder admin

open...

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (5.9 KiB)

Copy/paste of commands to check before/after:

... Reproducer (15-second startup delay in tgt.service)

 FILE=/etc/systemd/system/tgt.service.d/start-delay.conf
 mkdir -p $(dirname $FILE)
 cat <<EOF > $FILE
 [Service]
 ExecStartPre=$(which sleep) 15
 EOF
 systemctl daemon-reload

Before:
---

... Restart both services at the same time:

 # date; systemctl restart cinder-volume.service tgt.service; date
 Mon Oct 16 21:57:12 UTC 2023
 Mon Oct 16 21:57:27 UTC 2023

... Notice that cinder-volume.service is Started _BEFORE_ tgt.service

 # journalctl -b -u cinder-volume.service -u tgt.service | grep Start | tail -3
 Oct 16 21:57:12 cinder-mantic systemd[1]: Started cinder-volume.service - OpenStack Cinder Volume.
 Oct 16 21:57:12 cinder-mantic systemd[1]: Starting tgt.service - (i)SCSI target daemon...
 Oct 16 21:57:27 cinder-mantic systemd[1]: Started tgt.service - (i)SCSI target daemon.

.. Log error:

 # grep 'ERROR cinder.volume.manager' /var/log/cinder/cinder-volume.log
 ...
 2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager [None req-ce03264a-6765-41de-8016-a6f27d2685e4 - - - - - -] Failed to re-export volume, setting to ERROR.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
 ...
 2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
 2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Command: tgtadm --lld iscsi --op show --mode target
 2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Exit code: 107
 2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Stdout: ''
 2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Stderr: 'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
 2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager

... Volume error:

 # cinder list
 +--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
 | ID | Status | Name | Size | Consumes Quota | Volume Type | Bootable | Attached to |
 +--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
 | 17c4f736-058b-4cfc-9864-a64ab0995957 | error | test-volume | 1 | True | __DEFAULT__ | false | |
 +--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+

... Undo error state:

 # systemctl restart cinder-volume.service
 # cinder reset-state --state in-use test-volume
 # cinder list

 +--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
 | ID | Status | Name | Size | Consumes Quota | Volume Type | Bootable | Attached to |
 +--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
 | 17c4f736-058b-4cfc-9864-a64ab0995957 | in-use | test-volume | 1 | True | __DEFAULT__ | false | |
 +---------...

Read more...

Changed in cinder (Ubuntu Bionic):
status: New → Won't Fix
Changed in cinder (Ubuntu Focal):
status: New → Confirmed
Changed in cinder (Ubuntu Jammy):
status: New → Confirmed
Changed in cinder (Ubuntu Lunar):
status: New → Confirmed
no longer affects: cinder (Debian)
Changed in cinder (Ubuntu Bionic):
importance: Undecided → Medium
Changed in cinder (Ubuntu Focal):
importance: Undecided → Medium
Changed in cinder (Ubuntu Jammy):
importance: Undecided → Medium
Changed in cinder (Ubuntu Lunar):
importance: Undecided → Medium
Changed in cinder (Ubuntu Bionic):
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in cinder (Ubuntu Focal):
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in cinder (Ubuntu Jammy):
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in cinder (Ubuntu Lunar):
assignee: nobody → Mauricio Faria de Oliveira (mfo)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.