HA overcloud, controller replacement or node scale up broken with pcs 0.10+
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
In Progress
|
High
|
Unassigned |
Bug Description
Since Stein and RHEL/CentOS 8, the code path that handles the replacement of controller node in a pacemaker cluster seems broken. The same applies to automatic scale up of the control plane.
When redeploying a stack with a list of new controllers to add to the cluster, the deployment times out and the new controllers are never added into the cluster. When inspecting the journal, one can see that the puppet run on the host (bootstrap node) yields an error when trying to add the new nodes:
# journalctl -t puppet-user
[...]
Aug 06 12:08:22 controller-0 puppet-
Aug 06 12:08:22 controller-0 puppet-
Aug 06 12:08:22 controller-0 puppet-
Aug 06 12:08:22 controller-0 puppet-
Aug 06 12:08:22 controller-0 puppet-
Aug 06 12:08:22 controller-0 puppet-
Aug 06 12:08:22 controller-0 puppet-
Aug 06 12:08:23 controller-0 puppet-
The untruncated error message looks something like:
Error: Host 'vm3' is not known to pcs, try to authenticate the host using 'pcs host auth vm3' command
Error: None of hosts is known to pcs.
Error: Errors have occurred, therefore pcs is unable to continue
Since Stein and RHEL/CentOS 8, the pacemaker cluster that composes the HA control plane is configured with pcs 0.10 which has a breaking change: before adding a node in the cluster, we must now explicitely authenticate the node to all the pcsd.
Changed in tripleo: | |
milestone: | train-3 → ussuri-1 |
Changed in tripleo: | |
milestone: | ussuri-1 → ussuri-2 |
Changed in tripleo: | |
milestone: | ussuri-2 → ussuri-3 |
Changed in tripleo: | |
milestone: | ussuri-3 → ussuri-rc3 |
Changed in tripleo: | |
milestone: | ussuri-rc3 → victoria-1 |
Changed in tripleo: | |
milestone: | victoria-1 → victoria-3 |
This issue was fixed in the openstack/ puppet- pacemaker 0.8.0 release.