M/N upgrades - Race during the upgrade step
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Michele Baldessari |
Bug Description
Currently when we call the major-upgrade step we do the following:
"""
...
if [[ -n $(is_bootstrap_
check_
fi
...
if [[ -n $(is_bootstrap_
migrate_
fi
...
for service in $(services_
manage_
...
done
"""
The problem with the above code is that it is open to the following race condition:
1. Code gets run first on a non-bootstrap controller node so we start stopping a bunch of services
2. Pacemaker notices will notice that services are down and will mark the service as stopped
3. Code gets run on the bootstrap node (controller-0) and the check_clean_cluster function will fail and exit
4. Eventually also the script on the non-bootstrap controller node will timeout and exit because the cluster never shut down (it never actually started the shutdown because we failed at 3)
Fix proposed to branch: master /review. openstack. org/395454
Review: https:/