[20.05] removal/addition of a unit to the rmq cluster results in: Services not running that should be: cinder-scheduler

Bug #1879491 reported by Dmitrii Shcherbakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Cinder Charm
Triaged
Medium
Unassigned

Bug Description

While testing remove/add functionality for rabbitmq (https://github.com/openstack-charmers/zaza-openstack-tests/pull/287) I encountered a test failure due to an issue with cinder-scheduler service which clearly appeared after rabbitmq changes:

ubuntu@dmitriis-bastion:~/rabbitmq-server$ juju show-status-log cinder/0
Time Type Status Message
19 May 2020 10:41:32Z juju-unit idle
19 May 2020 10:42:17Z juju-unit executing running amqp-relation-changed hook
19 May 2020 10:42:28Z juju-unit idle
19 May 2020 10:42:42Z juju-unit executing running amqp-relation-changed hook
19 May 2020 10:43:11Z juju-unit idle
19 May 2020 10:47:49Z juju-unit executing running amqp-relation-changed hook
19 May 2020 10:47:59Z juju-unit idle
19 May 2020 10:48:50Z juju-unit executing running amqp-relation-changed hook
19 May 2020 10:49:00Z juju-unit idle
19 May 2020 10:49:24Z juju-unit executing running amqp-relation-changed hook
19 May 2020 10:49:52Z juju-unit idle
19 May 2020 10:50:50Z juju-unit executing running amqp-relation-departed hook
19 May 2020 10:51:18Z juju-unit idle
19 May 2020 10:54:12Z juju-unit executing running amqp-relation-joined hook
19 May 2020 10:54:21Z juju-unit executing running amqp-relation-changed hook
19 May 2020 10:54:49Z workload active Unit is ready
19 May 2020 10:55:01Z juju-unit idle
19 May 2020 10:56:32Z juju-unit executing running amqp-relation-changed hook
19 May 2020 10:56:44Z juju-unit idle
19 May 2020 12:04:00Z workload blocked Services not running that should be: cinder-scheduler

/var/log/cinder/cinder-scheduler.log
2020-05-19 10:49:44.540 706 ERROR oslo.messaging._drivers.impl_rabbit [-] [006d7b30-362c-4d50-9120-df014d0fd4c5] AMQP server on 10.5.0.24:5671 is unreachable: [Errno
111] ECONNREFUSED. Trying again in 1 seconds. Client port: None
2020-05-19 10:49:45.554 706 ERROR oslo.messaging._drivers.impl_rabbit [-] [006d7b30-362c-4d50-9120-df014d0fd4c5] AMQP server on 10.5.0.6:5671 is unreachable: [Errno 1
11] ECONNREFUSED. Trying again in 10 seconds. Client port: None
2020-05-19 10:49:48.866 706 ERROR oslo.messaging._drivers.impl_rabbit [-] [aae6d6ee-38a8-4e9e-bd80-34b5009dcd67] AMQP server on 10.5.0.36:5671 is unreachable: [Errno
111] ECONNREFUSED. Trying again in 1 seconds. Client port: None
...
2020-05-19 10:54:55.469 5934 ERROR oslo_service.service AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.

grep rabbit /etc/cinder/cinder.conf
transport_url = rabbit://<pwd-redacted>@10.5.0.36:5672,cinder:<pwd-redacted>@10.5.0.53:5672,cinder:<pwd-redacted>@10.5.0.6:5672/openstack
[oslo_messaging_rabbit]
transport_url = rabbit://cinder:kf8JHG2GjXJV5cy8pFW6cHpzpLCZZr6bywcnMVgt5zWz3BTyRgL8M2KTqZdzMgNd@10.5.0.36:5672,cinder:kf8JHG2GjXJV5cy8pFW6cHpzpLCZZr6bywcnMVgt5zWz3BTyRgL8M2KTqZdzMgNd@10.5.0.53:5672,cinder:kf8JHG2GjXJV5cy8pFW6cHpzpLCZZr6bywcnMVgt5zWz3BTyRgL8M2KTqZdzMgNd@10.5.0.6:5672/openstack

juju status |grep 10.5.0.36
rabbitmq-server/0 active idle 3 10.5.0.36 5672/tcp Unit is ready and clustered
  nrpe/2 active idle 10.5.0.36 icmp,5666/tcp ready
3 started 10.5.0.36 e38d7fac-73bf-46e3-986a-682956fe8705 xenial nova ACTIVE

stat /etc/cinder/cinder.conf
  File: '/etc/cinder/cinder.conf'
  Size: 2260 Blocks: 8 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 275113 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 115/ cinder) Gid: ( 119/ cinder)
Access: 2020-05-19 10:56:43.190368208 +0000
Modify: 2020-05-19 10:56:43.090368110 +0000
Change: 2020-05-19 10:56:43.090368110 +0000
 Birth: -

See juju-crashdump-488d1258-8486-4fa9-b9ac-5f706fbd4ebc.tar.xz.

Running `sudo systemctl status cinder-scheduler` seems to bring the service back up successfully, so it is a timing issue.

Tags: scaleback
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Ryan Beisner (1chb1n)
tags: added: scaleback
James Page (james-page)
Changed in charm-cinder:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.