tripleo

MaxFailPercentage: undercloud can be included for an overcloud deploy failure

Bug #1889212 reported by Emilien Macchi on 2020-07-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Triaged	Medium	Emilien Macchi	tripleo xena-3

Bug Description

When setting MaxFailPercentage to a certain percentage, we tolerate a certain amount of overcloud nodes to fail during the deployment.

However, some playbooks are executed from the Undercloud and therefore if an overcloud node is down, the playbook will report the error from the Undercloud node.

Example:

FATAL | Discovering nova hosts | undercloud -> 192.168.24.18 | error={"changed": false, "cmd": ["podman", "exec", "nova_compute", "nova-manage", "cell_v2", "discover_hosts", "--by-service"], "delta": "0:00:00.223708", "end": "2020-07-27 22:22:26.422824", "msg": "non-zero return code", "rc": 125, "start": "2020-07-27 22:22:26.199116", "stderr": "Error: no container with name or ID nova_compute found: no such container", "stderr_lines": ["Error: no container with name or ID nova_compute found: no such container"], "stdout": "", "stdout_lines": []}

192.168.24.18 is the compute "down".

It results into this confusing summary:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 2 ~~~~~~~~~~~~~~~~~
This or these node(s) failed to deploy: overcloud-novacompute-0, undercloud
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When deploying the overcloud, we should not consider the source deploy host for playbooks that fail; and make sure they don't appear in the state information.

Tags:

Emilien Macchi (emilienm) on 2020-07-28

Changed in tripleo:
milestone:	none → victoria-1
milestone:	victoria-1 → victoria-2
importance:	Undecided → Medium
status:	New → Triaged

Emilien Macchi (emilienm) on 2020-07-28

Changed in tripleo:
milestone:	victoria-2 → victoria-3

Emilien Macchi (emilienm) on 2020-07-28

Changed in tripleo:
assignee:	nobody → Emilien Macchi (emilienm)
tags:	added: train-backport-potential ussuri-backport-potential

Revision history for this message

Emilien Macchi (emilienm) wrote on 2020-07-28:

BTW logs: https://logserver.rdoproject.org/25/735625/9/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039/8879045/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-28: Related fix proposed to tripleo-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/743549

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-28: Related fix proposed to python-tripleoclient (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/743556

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-29: Related fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/743549
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=f1969830e095401040be66bf245d91d20a08b221
Submitter: Zuul
Branch: master

commit f1969830e095401040be66bf245d91d20a08b221
Author: Emilien Macchi <email address hidden>
Date: Tue Jul 28 10:04:07 2020 -0400

tripleo_states: change wording

    Change the wording to replace "This or these node(s) failed to deploy"
    by "The following node(s) had failures:"; failures can happen at a
    different level (not necessarily deploy). Update the wording to avoid
    any confusion.

Change-Id: I80041738df05dbe0da678efa91e861390ad4657e
Related-Bug: #1889212

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-29: Related fix proposed to tripleo-ansible (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/743775

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-29: Related fix proposed to tripleo-ansible (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/743776

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-30: Related fix merged to tripleo-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/743775
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=584add0ca2a3ee3ac2e9de807de0766ff9a72381
Submitter: Zuul
Branch: stable/ussuri

commit 584add0ca2a3ee3ac2e9de807de0766ff9a72381
Author: Emilien Macchi <email address hidden>
Date: Tue Jul 28 10:04:07 2020 -0400

tripleo_states: change wording

    Change-Id: I80041738df05dbe0da678efa91e861390ad4657e
    Related-Bug: #1889212
    (cherry picked from commit f1969830e095401040be66bf245d91d20a08b221)

tags:

added: in-stable-ussuri

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-31: Related fix merged to tripleo-ansible (stable/train)

Reviewed: https://review.opendev.org/743776
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=8d8de17fedac73ee6804e5f2e9a2e22ca30aaf78
Submitter: Zuul
Branch: stable/train

commit 8d8de17fedac73ee6804e5f2e9a2e22ca30aaf78
Author: Emilien Macchi <email address hidden>
Date: Tue Jul 28 10:04:07 2020 -0400

tripleo_states: change wording

    Change-Id: I80041738df05dbe0da678efa91e861390ad4657e
    Related-Bug: #1889212
    (cherry picked from commit f1969830e095401040be66bf245d91d20a08b221)

tags:

added: in-stable-train

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-20: Related fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/743556
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=9dec1b2e334e110b03f951c9fd3480f6c3dc8e11
Submitter: Zuul
Branch: master

commit 9dec1b2e334e110b03f951c9fd3480f6c3dc8e11
Author: Emilien Macchi <email address hidden>
Date: Tue Jul 28 10:35:14 2020 -0400

overcloud_deploy: move horizon url/rc files before config-download

    When a deployment fails, we run the playbooks to generate horizon URL &
    RC files anyway. However it is confusing to have them at the end, after
    the actual trace and an operator with a small screen won't see the
    actual errors easily.

Let's just move these actions before the config download execution,
which has no impact anyway; but will improve logging a lot.

Change-Id: I70bbc40f8e5eb709d9f0f608e936a818e082918b
Related-Bug: #1889212

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-20: Related fix proposed to python-tripleoclient (stable/ussuri)

#10

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/747075

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-20: Related fix proposed to python-tripleoclient (stable/train)

#11

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/747076

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-20: Related fix merged to python-tripleoclient (stable/ussuri)

#12

Reviewed: https://review.opendev.org/747075
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=1ffc56ed97ec8878a255c96baa16652ef825cf9d
Submitter: Zuul
Branch: stable/ussuri

commit 1ffc56ed97ec8878a255c96baa16652ef825cf9d
Author: Emilien Macchi <email address hidden>
Date: Tue Jul 28 10:35:14 2020 -0400

overcloud_deploy: move horizon url/rc files before config-download

Let's just move these actions before the config download execution,
which has no impact anyway; but will improve logging a lot.

    Change-Id: I70bbc40f8e5eb709d9f0f608e936a818e082918b
    Related-Bug: #1889212
    (cherry picked from commit 9dec1b2e334e110b03f951c9fd3480f6c3dc8e11)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-24: Related fix merged to python-tripleoclient (stable/train)

#13

Reviewed: https://review.opendev.org/747076
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=fa0129963b463d7d3da3e7950c92f25d449cf0f7
Submitter: Zuul
Branch: stable/train

commit fa0129963b463d7d3da3e7950c92f25d449cf0f7
Author: Emilien Macchi <email address hidden>
Date: Tue Jul 28 10:35:14 2020 -0400

overcloud_deploy: move horizon url/rc files before config-download

Note: this is an unclean backport.

Let's just move these actions before the config download execution,
which has no impact anyway; but will improve logging a lot.

    Change-Id: I70bbc40f8e5eb709d9f0f608e936a818e082918b
    Related-Bug: #1889212
    (cherry picked from commit 9dec1b2e334e110b03f951c9fd3480f6c3dc8e11)

Marios Andreou (marios-b) on 2020-11-03

Changed in tripleo:
milestone:	victoria-3 → wallaby-1

Marios Andreou (marios-b) on 2020-12-08

Changed in tripleo:
milestone:	wallaby-1 → wallaby-2

Marios Andreou (marios-b) on 2021-01-29

Changed in tripleo:
milestone:	wallaby-2 → wallaby-3

Marios Andreou (marios-b) on 2021-03-17

Changed in tripleo:
milestone:	wallaby-3 → wallaby-rc1

Marios Andreou (marios-b) on 2021-05-06

Changed in tripleo:
milestone:	wallaby-rc1 → xena-1

Marios Andreou (marios-b) on 2021-06-22

Changed in tripleo:
milestone:	xena-1 → xena-2

Marios Andreou (marios-b) on 2021-07-21

Changed in tripleo:
milestone:	xena-2 → xena-3

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.