post-series-upgrade hook error when series-upgrading principle nova-compute

Bug #1952882 reported by Aurelien Lourot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Ceilometer Agent Charm
Fix Released
High
Aurelien Lourot

Bug Description

Related to lp:1927277. Can be reproduced by running the series-upgrade tests. [1] Here is what happens:

1. nova-compute (principal) and ceilometer-agent (subordinate) are running on the same bionic machine.
2. A series upgrade of that machine is started. This runs the pre-series-upgrade hooks of both charms, effectively stopping both the nova-compute and the ceilometer-agent-compute services.
3. `juju upgrade-series <machine-number> complete` is run. This runs ceilometer-agent's post-series-upgrade hook, attempting to start the ceilometer-agent-compute service. This service however has a dependency to the nova-compute service, which is still stopped. This produces a hook error.

Zaza logs:
----------
2021-11-30 18:41:53 [INFO] About to upgrade leader of nova-compute: 15
2021-11-30 18:41:53 [INFO] About to series-upgrade (15)
2021-11-30 18:41:53 [INFO] About to call '['juju', 'run', '--machine=15', '--', 'echo \'DPkg::options { "--force-confdef"; "--force-confnew"; }\' | sudo tee /etc/apt/apt.conf.d/local']'
...
2021-11-30 18:42:02 [INFO] About to call '['juju', 'run', '--machine=15', '--', 'yes | sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" dist-upgrade']'
...
2021-11-30 18:42:16 [INFO] About to call '['juju', 'run', '--machine=15', '--timeout=120m', '--', 'yes | sudo DEBIAN_FRONTEND=noninteractive do-release-upgrade -f DistUpgradeViewNonInteractive']'
...
2021-11-30 19:00:33 [INFO] About to call '['juju', 'upgrade-series', '-m', 'zaza-97c5ac52fcdc', '15', 'complete']'
----------

ceilometer-agent unit log:
----------
2021-11-30 19:01:00 WARNING post-series-upgrade Removed /etc/systemd/system/memcached.service.
2021-11-30 19:01:01 WARNING post-series-upgrade Synchronizing state of memcached.service with SysV service script with /lib/systemd/systemd-sysv-install.
2021-11-30 19:01:01 WARNING post-series-upgrade Executing: /lib/systemd/systemd-sysv-install enable memcached
2021-11-30 19:01:02 WARNING post-series-upgrade Created symlink /etc/systemd/system/multi-user.target.wants/memcached.service → /lib/systemd/system/memcached.service.
2021-11-30 19:01:02 DEBUG post-series-upgrade inactive
2021-11-30 19:01:02 WARNING post-series-upgrade Removed /etc/systemd/system/ceilometer-agent-compute.service.
2021-11-30 19:01:03 WARNING post-series-upgrade Synchronizing state of ceilometer-agent-compute.service with SysV service script with /lib/systemd/systemd-sysv-install.
2021-11-30 19:01:03 WARNING post-series-upgrade Executing: /lib/systemd/systemd-sysv-install enable ceilometer-agent-compute
2021-11-30 19:01:04 WARNING post-series-upgrade Created symlink /etc/systemd/system/multi-user.target.wants/ceilometer-agent-compute.service → /lib/systemd/system/ceilometer-agent-compute.service.
2021-11-30 19:01:04 DEBUG post-series-upgrade inactive
2021-11-30 19:01:04 WARNING post-series-upgrade Failed to start ceilometer-agent-compute.service: Unit nova-compute.service is masked.
2021-11-30 19:01:04 DEBUG post-series-upgrade active
2021-11-30 19:01:04 DEBUG post-series-upgrade inactive
2021-11-30 19:01:04 DEBUG jujuc server.go:211 running hook tool "status-set" for ceilometer-agent/1-post-series-upgrade-5326356148989293630
2021-11-30 19:01:04 WARNING post-series-upgrade Traceback (most recent call last):
2021-11-30 19:01:04 WARNING post-series-upgrade File "/var/lib/juju/agents/unit-ceilometer-agent-1/charm/hooks/post-series-upgrade", line 188, in <module>
2021-11-30 19:01:04 WARNING post-series-upgrade hooks.execute(sys.argv)
2021-11-30 19:01:04 WARNING post-series-upgrade File "/var/lib/juju/agents/unit-ceilometer-agent-1/charm/charmhelpers/core/hookenv.py", line 962, in execute
2021-11-30 19:01:04 WARNING post-series-upgrade self._hooks[hook_name]()
2021-11-30 19:01:04 WARNING post-series-upgrade File "/var/lib/juju/agents/unit-ceilometer-agent-1/charm/hooks/post-series-upgrade", line 159, in post_series_upgrade
2021-11-30 19:01:04 WARNING post-series-upgrade series_upgrade_complete(
2021-11-30 19:01:04 WARNING post-series-upgrade File "/var/lib/juju/agents/unit-ceilometer-agent-1/charm/charmhelpers/contrib/openstack/utils.py", line 2178, in series_upgrade_complete
2021-11-30 19:01:04 WARNING post-series-upgrade resume_unit_helper(configs)
2021-11-30 19:01:04 WARNING post-series-upgrade File "/var/lib/juju/agents/unit-ceilometer-agent-1/charm/hooks/ceilometer_utils.py", line 308, in resume_unit_helper
2021-11-30 19:01:04 WARNING post-series-upgrade _pause_resume_helper(resume_unit, configs)
2021-11-30 19:01:04 WARNING post-series-upgrade File "/var/lib/juju/agents/unit-ceilometer-agent-1/charm/hooks/ceilometer_utils.py", line 320, in _pause_resume_helper
2021-11-30 19:01:04 WARNING post-series-upgrade f(assess_status_func(configs),
2021-11-30 19:01:04 WARNING post-series-upgrade File "/var/lib/juju/agents/unit-ceilometer-agent-1/charm/charmhelpers/contrib/openstack/utils.py", line 1738, in resume_unit
2021-11-30 19:01:04 WARNING post-series-upgrade raise Exception("Couldn't resume: {}".format("; ".join(messages)))
2021-11-30 19:01:04 WARNING post-series-upgrade Exception: Couldn't resume: ceilometer-agent-compute didn't resume cleanly.; Services not running that should be: ceilometer-agent-compute
----------

The pause/resume logic should be entirely removed from ceilometer-agent. Since lp:1927277 the principal charm nova-compute is responsible for pausing/resuming the services of its subordinates.

[1]: https://github.com/openstack-charmers/charmed-openstack-tester

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceilometer-agent (master)
tags: added: aubergine
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceilometer-agent (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceilometer-agent/+/820001
Committed: https://opendev.org/openstack/charm-ceilometer-agent/commit/43a23127949e035506b89afd6fb01b5cfec04b63
Submitter: "Zuul (22348)"
Branch: master

commit 43a23127949e035506b89afd6fb01b5cfec04b63
Author: Aurelien Lourot <email address hidden>
Date: Mon Nov 29 15:17:44 2021 +0100

    Remove pause/resume logic

    This is a subordinate charm and since a recent
    commit [1] it shares a list of its services with
    the principal charm nova-compute, which has now
    the responsibility to pause and resume services. [2]

    The ceilometer-agent-compute service has a
    dependency to the nova-compute service anyway, so
    it is impossible for this charm to resume its
    service if its principal charm nova-compute is
    paused. This is what also led to errors in
    ceilometer-agent's post-series-upgrade hook. This
    hook attempted to resume its service although
    the principal service was still paused. Removing
    this logic entirely solves this issue.

    Validated by running openstack-upgrade and
    series-upgrade tests. [3]

    [1]: https://opendev.org/openstack/charm-ceilometer-agent/commit/be45f779
    [2]: https://opendev.org/openstack/charm-nova-compute/commit/8fb37dc0
    [3]: https://github.com/openstack-charmers/charmed-openstack-tester

    Closes-Bug: #1952882
    Change-Id: Ia22b53b52b541250f7f803c6708968d75e64475c

Changed in charm-ceilometer-agent:
status: In Progress → Fix Committed
Changed in charm-ceilometer-agent:
milestone: none → 22.04
Changed in charm-ceilometer-agent:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.