[bionic-rocky->stein] Rocky to Stein upgrade on bionic results in "Services not running that should be: designate-zone-manager, designate-pool-manager"

Bug #1928451 reported by Przemyslaw Lal
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Designate Charm
Triaged
High
Unassigned

Bug Description

After upgrading Designate Rocky to Stein on bionic, Designate leader unit entered "blocked" state with the following status message: "Services not running that should be: designate-zone-manager, designate-pool-manager".

Despite these service being removed in patch [0] for Rocky+ on Bionic, charm complains about these two not running.

Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas foundations-maas 2.8.10 unsupported 09:34:56Z

* snip *

App Version Status Scale Charm Store Rev OS Notes
designate 8.0.1 blocked 3 designate jujucharms 53 ubuntu
hacluster-designate active 3 hacluster jujucharms 76 ubuntu

Unit Workload Agent Machine Public address Ports Message
designate/3 active idle 19/lxd/11 10.243.229.157 9001/tcp Unit is ready
  hacluster-designate/5 active idle 10.243.229.157 Unit is ready and clustered
designate/4* blocked idle 20/lxd/10 10.243.229.170 9001/tcp Services not running that should be: designate-zone-manager, designate-pool-manager
  hacluster-designate/4* active idle 10.243.229.170 Unit is ready and clustered
designate/6 active idle 1/lxd/9 10.243.229.96 9001/tcp Unit is ready
  hacluster-designate/6 active idle 10.243.229.96 Unit is ready and clustered

Machine State DNS Inst id Series AZ Message
1 started 10.243.229.159 7sydsg bionic rack-2 Deployed
1/lxd/9 started 10.243.229.96 juju-5b85c6-1-lxd-9 bionic rack-2 series upgrade completed: success
19 started 10.243.229.155 detxkx bionic rack-1 Deployed
19/lxd/11 started 10.243.229.157 juju-5b85c6-19-lxd-11 bionic rack-1 series upgrade completed: success
20 started 10.243.229.156 tyx3t4 bionic rack-2 Deployed
20/lxd/10 started 10.243.229.170 juju-5b85c6-20-lxd-10 bionic rack-2 series upgrade completed: success

Services have status "not-found" in systemd as expected:

$ systemctl status designate-zone-manager designate-pool-manager
● designate-zone-manager.service
   Loaded: not-found (Reason: No such file or directory)
   Active: failed (Result: exit-code) since Fri 2021-05-14 07:50:15 UTC; 1h 48min ago
 Main PID: 997808 (code=exited, status=1/FAILURE)

May 14 07:50:15 juju-5b85c6-20-lxd-10 systemd[1]: designate-zone-manager.service: Service hold-off time over, scheduling restart.
May 14 07:50:15 juju-5b85c6-20-lxd-10 systemd[1]: designate-zone-manager.service: Scheduled restart job, restart counter is at 6.
May 14 07:50:15 juju-5b85c6-20-lxd-10 systemd[1]: Stopped OpenStack Designate DNSaaS zone manager.
May 14 07:50:15 juju-5b85c6-20-lxd-10 systemd[1]: designate-zone-manager.service: Start request repeated too quickly.
May 14 07:50:15 juju-5b85c6-20-lxd-10 systemd[1]: designate-zone-manager.service: Failed with result 'exit-code'.
May 14 07:50:15 juju-5b85c6-20-lxd-10 systemd[1]: Failed to start OpenStack Designate DNSaaS zone manager.

● designate-pool-manager.service
   Loaded: not-found (Reason: No such file or directory)
   Active: failed (Result: exit-code) since Fri 2021-05-14 07:49:24 UTC; 1h 49min ago
 Main PID: 997356 (code=exited, status=1/FAILURE)

May 14 07:49:24 juju-5b85c6-20-lxd-10 systemd[1]: designate-pool-manager.service: Service hold-off time over, scheduling restart.
May 14 07:49:24 juju-5b85c6-20-lxd-10 systemd[1]: designate-pool-manager.service: Scheduled restart job, restart counter is at 5.
May 14 07:49:24 juju-5b85c6-20-lxd-10 systemd[1]: Stopped OpenStack Designate DNSaaS pool manager.
May 14 07:49:24 juju-5b85c6-20-lxd-10 systemd[1]: designate-pool-manager.service: Start request repeated too quickly.
May 14 07:49:24 juju-5b85c6-20-lxd-10 systemd[1]: designate-pool-manager.service: Failed with result 'exit-code'.
May 14 07:49:24 juju-5b85c6-20-lxd-10 systemd[1]: Failed to start OpenStack Designate DNSaaS pool manager.

ubuntu@juju-5b85c6-20-lxd-10:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"

ubuntu@juju-5b85c6-20-lxd-10:~$ dpkg -l | grep designate
ii designate-agent 1:8.0.1-0ubuntu1~cloud1 all OpenStack DNS as a Service - agent
ii designate-api 1:8.0.1-0ubuntu1~cloud1 all OpenStack DNS as a Service - API server
ii designate-central 1:8.0.1-0ubuntu1~cloud1 all OpenStack DNS as a Service - central daemon
ii designate-common 1:8.0.1-0ubuntu1~cloud1 all OpenStack DNS as a Service - common files
ii designate-mdns 1:8.0.1-0ubuntu1~cloud1 all OpenStack DNS as a Service - mdns
ii designate-producer 1:8.0.1-0ubuntu1~cloud1 all OpenStack DNS as a Service - producer
ii designate-sink 1:8.0.1-0ubuntu1~cloud1 all OpenStack DNS as a Service - sink
ii designate-worker 1:8.0.1-0ubuntu1~cloud1 all OpenStack DNS as a Service - worker
ii python3-designate 1:8.0.1-0ubuntu1~cloud1 all OpenStack DNS as a Service - Python 3 libs
ii python3-designateclient 2.9.0-0ubuntu1 all client library for the OpenStack Designate API - Python 3.x

Designate packages come from UCA bionic-stein:

ubuntu@juju-5b85c6-20-lxd-10:~$ apt-cache policy designate-worker
designate-worker:
  Installed: 1:8.0.1-0ubuntu1~cloud1
  Candidate: 1:8.0.1-0ubuntu1~cloud1
  Version table:
 *** 1:8.0.1-0ubuntu1~cloud1 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/stein/main amd64 Packages
        100 /var/lib/dpkg/status
     1:6.0.1-0ubuntu1.2 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
     1:6.0.0-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages

[0] https://review.opendev.org/c/openstack/charm-designate/+/593607/

summary: - Rocky to Stein upgrade on bionic results in "Services not running that
- should be: designate-zone-manager, designate-pool-manager"
+ [bioni-rocky->stein] Rocky to Stein upgrade on bionic results in
+ "Services not running that should be: designate-zone-manager, designate-
+ pool-manager"
tags: added: openstack-upgrade
summary: - [bioni-rocky->stein] Rocky to Stein upgrade on bionic results in
+ [bionic-rocky->stein] Rocky to Stein upgrade on bionic results in
"Services not running that should be: designate-zone-manager, designate-
pool-manager"
Revision history for this message
Drew Freiberger (afreiberger) wrote :

I experienced the same issue with Queens to Rocky upgrade on one of the 3 designate units.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I too have experienced this (just now) on queens to rocky upgrade. Triaged to a bug.

Changed in charm-designate:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I had to manually install designate-zone-manager and designate-pool-manager, and then I did a config-change flip (on debug) to ensure that any required config template files were updated.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

When I looked at the code, there's a reactive state in the designate_handlers.py that responds to changes of pools.yaml which calls charmhelpers to start the designate-pool-manager w/out regard to version of openstack. This may be what's causing the status change. I didn't find an equivalent for designate-zone-manager.

@reactive.when('leadership.changed.pool-yaml-hash')
def remote_pools_updated():
    hookenv.log(
        "Pools updated on remote host, restarting pool manager",
        level=hookenv.DEBUG)
    host.service_restart('designate-pool-manager')

@reactive.when_file_changed(designate.POOLS_YAML)
def local_pools_updated():
    hookenv.log(
        "Pools updated locally, restarting pool manager",
        level=hookenv.DEBUG)
    host.service_restart('designate-pool-manager')

The designate-zone-manager and designate-pool-managers are not to be installed on Rocky+, though I agree that starting them does work, the issue is that the DesignateCharmRocky class is not protecting all calls to those services.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

It occurs to me that when the packages are purged, it may not take out the files that the reactive framework then tests, which means that it may fail those functions. More so, why does the status actually refer to them on Rocky+ if they are so supposed to be purged.. I'll have to dig deeper.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Also related, this is the root issue on the site where this happened:

https://bugs.launchpad.net/charm-designate/+bug/1928495

Revision history for this message
Drew Freiberger (afreiberger) wrote :

the blocked status is gone after implementing the workaround in lp#1928495. I think that's the root bug and this status error is caused by the charm not properly updating the state database with the proper version of openstack, lending to the wrong charm class being loaded for Rocky+ during non-action-managed upgrades.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.