M/N upgrades - relax pre-upgrade check for failed actions
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Michele Baldessari |
Bug Description
So I'd like to start a discussion about potentially relaxing the pre-upgrade check for failed actions. The reason is the following bug: https:/
Basically on mitaka with ceph the following failed actions will be there right after the deployment:
Failed Actions:
* openstack-
last-
* openstack-
last-
* openstack-
last-
* openstack-
last-
* openstack-
last-
* openstack-
last-
If the operator takes no action (maybe because he was not using gnocchi & co), the upgrade will fail in the precheck for the failed actions.
Should we care about this situation or we simply need to fix the above bug and the operator *must* make sure there are no failed actions?
On one side I'd prefer a clean fix where it belongs (aka gnocchi/mitaka), on the other hand a failed action might actually have happened in a distant past and currently all resources are up and running, so it is a bit of a big hammer to stop an upgrade because of that?
Changed in tripleo: | |
importance: | Medium → High |
milestone: | none → newton-rc2 |
So I definitely think we should tweak this. I had at least one upgrade job failing because of failed resources: monitor_ 60000 on overcloud- controller- 1 'not running' (7): call=41, status=complete, exitreason='none', last-rc-change='Wed Sep 28 18:58:44 2016', queued=0ms, exec=0ms monitor_ 60000 on overcloud- controller- 1 'not running' (7): call=82, status=complete, exitreason= 'none', last-rc- change= 'Wed Sep 28 18:58:06 2016', queued=0ms, exec=0ms
Failed Actions:
* memcached_
* mongod_
....
But we actually had all the resources running: -controller- 0 ~]# pcs status |grep -i stopped -controller- 0 ~]#
[root@overcloud
[root@overcloud