When setting MaxFailPercentage to a certain percentage, we tolerate a certain amount of overcloud nodes to fail during the deployment.
However, some playbooks are executed from the Undercloud and therefore if an overcloud node is down, the playbook will report the error from the Undercloud node.
Example:
FATAL | Discovering nova hosts | undercloud -> 192.168.24.18 | error={"changed": false, "cmd": ["podman", "exec", "nova_compute", "nova-manage", "cell_v2", "discover_hosts", "--by-service"], "delta": "0:00:00.223708", "end": "2020-07-27 22:22:26.422824", "msg": "non-zero return code", "rc": 125, "start": "2020-07-27 22:22:26.199116", "stderr": "Error: no container with name or ID nova_compute found: no such container", "stderr_lines": ["Error: no container with name or ID nova_compute found: no such container"], "stdout": "", "stdout_lines": []}
192.168.24.18 is the compute "down".
It results into this confusing summary:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 2 ~~~~~~~~~~~~~~~~~
This or these node(s) failed to deploy: overcloud-novacompute-0, undercloud
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When deploying the overcloud, we should not consider the source deploy host for playbooks that fail; and make sure they don't appear in the state information.
BTW logs: https:/ /logserver. rdoproject. org/25/ 735625/ 9/openstack- check/tripleo- ci-centos- 8-ovb-3ctlr_ 1comp_1supp- featureset039/ 8879045/ logs/undercloud /home/zuul/ overcloud_ deploy. log.txt. gz