undercloud install fails because gateway is not pingable

Bug #1950528 reported by David Vallee Delisle
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
David Vallee Delisle

Bug Description

The error [1] is apparently caused by I93cded61ffb862e99fd8043dbf0def3d16079692

[1]
~~~
PLAY [Server network validation] ***********************************************
2021-11-10 22:37:02.111205 | 525400e3-fcf6-b232-3d7d-000000000085 | TASK | Basic Network Validation
2021-11-10 22:37:02.125392 | 525400e3-fcf6-b232-3d7d-000000000085 | TIMING | Basic Network Validation | undercloud-0 | 0:00:58.738346 | 0.01s
2021-11-10 22:37:02.159786 | 525400e3-fcf6-b232-3d7d-000000000911 | TASK | Collect default network fact
2021-11-10 22:37:02.863612 | 525400e3-fcf6-b232-3d7d-000000000911 | OK | Collect default network fact | undercloud-0
2021-11-10 22:37:02.864657 | 525400e3-fcf6-b232-3d7d-000000000911 | TIMING | tripleo_nodes_validation : Collect default network fact | undercloud-0 | 0:00:59.477615 | 0.70s
2021-11-10 22:37:02.882524 | 525400e3-fcf6-b232-3d7d-000000000912 | TASK | Check Default IPv4 Gateway availability
2021-11-10 22:37:03.074841 | 525400e3-fcf6-b232-3d7d-000000000912 | OK | Check Default IPv4 Gateway availability | undercloud-0
2021-11-10 22:37:03.076232 | 525400e3-fcf6-b232-3d7d-000000000912 | TIMING | tripleo_nodes_validation : Check Default IPv4 Gateway availability | undercloud-0 | 0:00:59.689190 | 0.19s
2021-11-10 22:37:03.094381 | 525400e3-fcf6-b232-3d7d-000000000913 | TASK | Check all networks Gateway availability
2021-11-10 22:37:03.286664 | 525400e3-fcf6-b232-3d7d-000000000913 | FATAL | Check all networks Gateway availability | undercloud-0 | item=['192.168.24.1'] | error={"ansible_loop_var": "gateway_ip", "changed": false, "cmd": ["ping", "-w", "10", "-c", "1", "[192.168.24.1]"], "delta": "0:00:00.003807", "end": "2021-11-10 22:37:03.269991", "gateway_ip": ["192.168.24.1"], "msg": "non-zero return code", "rc": 2, "start": "2021-11-10 22:37:03.266184", "stderr": "ping: [192.168.24.1]: Name or service not known", "stderr_lines": ["ping: [192.168.24.1]: Name or service not known"], "stdout": "", "stdout_lines": []}
2021-11-10 22:37:03.288278 | 525400e3-fcf6-b232-3d7d-000000000913 | TIMING | tripleo_nodes_validation : Check all networks Gateway availability | undercloud-0 | 0:00:59.901236 | 0.19s
2021-11-10 22:37:03.298188 | 525400e3-fcf6-b232-3d7d-000000000913 | TIMING | tripleo_nodes_validation : Check all networks Gateway availability | undercloud-0 | 0:00:59.911153 | 0.20s
~~~

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)
Changed in tripleo:
status: New → In Progress
Changed in tripleo:
assignee: nobody → David Vallee Delisle (valleedelisle)
Revision history for this message
chandan kumar (chkumar246) wrote :

We are seeing it in CS9 fs02. https://logserver.rdoproject.org/67/36267/27/check/periodic-tripleo-ci-centos-9-ovb-1ctlr_1comp-featureset002-master/c376c5e/logs/undercloud/home/zuul-worker/undercloud_install.log.txt.gz

```
Check all networks Gateway availability | undercloud | item=['192.168.24.1'] | error={"ansible_loop_var": "gateway_ip", "changed": false, "cmd": ["ping", "-w", "10", "-c", "1", "[192.168.24.1]"], "delta": "0:00:00.003951", "end": "2021-11-11 05:28:16.938501", "gateway_ip": ["192.168.24.1"], "msg": "non-zero return code", "rc": 2, "start": "2021-11-11 05:28:16.934550", "stderr": "ping: [192.168.24.1]: Name or service not known", "stderr_lines": ["ping: [192.168.24.1]: Name or service not known"], "stdout": "", "stdout_lines": []}
2021-11-11 05:28:16.958506 | fa163e88-0c77-accd-20fc-00000000052f | TIMING | tripleo_nodes_validation : Check all networks Gateway availability | undercloud | 0:00:47.881329 | 0.20s
2021-11-11 05:28:16.964781 | fa163e88-0c77-accd-20fc-00000000052f | TIMING | tripleo_nodes_validation : Check all networks Gateway availability | undercloud | 0:00:47.887618 | 0.21s

PLAY RECAP *********************************************************************
```

tags: added: alert promotion-blocker
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)
Douglas Viroel (dviroel)
Changed in tripleo:
importance: Undecided → Critical
milestone: none → yoga-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Revision history for this message
David Vallee Delisle (valleedelisle) wrote :

After further investigation, this might be another symptom of the yaql2 change.

We've had similar issues in Iba0262fbc2af0903f02c8c28c7ba2b1d935abe7f and Ic5144c58ceb9bd146e2c470725ec7f4b65328c4d (Bug #1947193)

Proposed an untested change.

Revision history for this message
Douglas Viroel (dviroel) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/817774
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/efc328c66898bfc09c5dcb00fe5f584a45b899ff
Submitter: "Zuul (22348)"
Branch: master

commit efc328c66898bfc09c5dcb00fe5f584a45b899ff
Author: Douglas Viroel <email address hidden>
Date: Fri Nov 12 12:11:27 2021 -0300

    Make PingTestGatewayIPsMap a map of flatten lists

    PingTestGatewayIPsMap elements may contain list of lists, causing failures
    on roles that iterate over them. See [1] and #1950528 for more info.

    [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/817500

    Closes-bug: #1950528
    Change-Id: Idb70c822f01f808871a53689edfa2edf52e59e54
    Signed-off-by: Douglas Viroel <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by "David Vallee Delisle <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/817793
Reason: My bad, I didn't realize this change was proposed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/817774

Revision history for this message
Douglas Viroel (dviroel) wrote :

Doesn't affect stable/wallaby because the code was not backported yet [1]. Updating the wallaby backport will guarantee that the issue will not happen again.

[1] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/816969

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ansible (master)

Change abandoned by "David Vallee Delisle <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/817500

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/816969
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/a266b5a9487fcadee04ec462feaf9b54a6aea4d9
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit a266b5a9487fcadee04ec462feaf9b54a6aea4d9
Author: Harald Jensås <email address hidden>
Date: Thu Oct 28 22:52:45 2021 +0200

    Add ping test for all networks gateway IPs

    Add ping test for gateway IPs on all networks, to ensure
    all gateways are reachable.

    The releated Bugzilla reports an issue where some network
    fabrics fail when using the current node ping test, which
    pings the first node in each role. The fabric simply does
    not forward traffic before the gateway has been pinged.

    One can argue that the fabric in question is broken. However,
    with the current implementation the first node in each role
    actually ping tests only against it's own address? So adding
    the test to ping the gateway addresses improves the validation
    in general.

    Make PingTestGatewayIPsMap a map of flatten lists

    PingTestGatewayIPsMap elements may contain list of lists, causing failures
    on roles that iterate over them. See [1] and #1950528 for more info.

    [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/817500

    Related RHBZ#1875962
    Closes-bug: #1950528
    Depends-On: I93cded61ffb862e99fd8043dbf0def3d16079692

    Change-Id: I3309f2a0e39ad115930ecd5c0e895816565819e9
    (cherry picked from commit 5d830980ec842c6093a8fd44ef922014bcadf693)
    (cherry picked from commit efc328c66898bfc09c5dcb00fe5f584a45b899ff)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/python-tripleoclient/+/825676

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-tripleoclient (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/python-tripleoclient/+/825676
Committed: https://opendev.org/openstack/python-tripleoclient/commit/0e251ff91e2d84b01f3f9e7655ef5567b8445f84
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 0e251ff91e2d84b01f3f9e7655ef5567b8445f84
Author: Harald Jensås <email address hidden>
Date: Tue Sep 22 00:12:04 2020 +0200

    ctlplane network attributes in overcloud environment

    Set CtlplaneNetworkAttributes parameter in overcloud
    environment. The parameter contains a map with network
    and subnets data.

    CtlplaneNetworkAttributes:
      network:
        dns_domain: ctlplane.localdomain.
        mtu: 1442
        name: ctlplane
        tags: ['192.168.24.0/24', '192.168.25.0/24']
      subnets:
        ctlplane-leaf1:
          cidr: 192.168.25.0/24
          dns_nameservers: ['8.8.8.8', '8.8.4.4']
          gateway_ip: 192.168.25.254
          host_routes:
          - {'destination': '192.168.24.0/24', 'nexthop': '192.168.25.254'}
          ip_version: 4
          name: ctlplane-leaf1

    Also set the CtlplaneNetworkAttributes in the undercloud environment
    from the data in undercloud.conf.

    Also set the CtlplaneNetworkAttributes in the standalone environment.

    Conflicts:
      requirements.txt
      lower-constraints.txt
      tripleoclient/tests/v1/overcloud_deploy/test_overcloud_deploy.py
      tripleoclient/tests/v1/overcloud_update/test_overcloud_update.py

    Related-Bug: #1950528
    Related RHBZ#1875962
    Change-Id: I12f1ea965d489eb36353e988cc3ec947f72a35ad
    (cherry picked from commit 6ced9c71db18c07279d4dfe80842eb631f6b0179)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/825225
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/a61368ff2814aa4281055deca4fa5a6515cbc308
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit a61368ff2814aa4281055deca4fa5a6515cbc308
Author: Harald Jensås <email address hidden>
Date: Thu Oct 28 22:52:45 2021 +0200

    Add ping test for all networks gateway IPs

    Add ping test for gateway IPs on all networks, to ensure
    all gateways are reachable.

    The releated Bugzilla reports an issue where some network
    fabrics fail when using the current node ping test, which
    pings the first node in each role. The fabric simply does
    not forward traffic before the gateway has been pinged.

    One can argue that the fabric in question is broken. However,
    with the current implementation the first node in each role
    actually ping tests only against it's own address? So adding
    the test to ping the gateway addresses improves the validation
    in general.

    Make PingTestGatewayIPsMap a map of flatten lists

    PingTestGatewayIPsMap elements may contain list of lists, causing
    failures on roles that iterate over them. See [1] and #1950528 for
    more info.

    [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/817500

    Related RHBZ#1875962
    Closes-bug: #1950528
    Depends-On: https://review.opendev.org/825228
    Change-Id: I3309f2a0e39ad115930ecd5c0e895816565819e9
    (cherry picked from commit efc328c66898bfc09c5dcb00fe5f584a45b899ff)
    (cherry picked from commit 5d830980ec842c6093a8fd44ef922014bcadf693)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/825432
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/720b18fd8cb233aa809dd38ea71d1cf5ae39808f
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 720b18fd8cb233aa809dd38ea71d1cf5ae39808f
Author: Harald Jensås <email address hidden>
Date: Thu Oct 28 22:52:45 2021 +0200

    Add ping test for all networks gateway IPs

    Add ping test for gateway IPs on all networks, to ensure
    all gateways are reachable.

    The releated Bugzilla reports an issue where some network
    fabrics fail when using the current node ping test, which
    pings the first node in each role. The fabric simply does
    not forward traffic before the gateway has been pinged.

    One can argue that the fabric in question is broken. However,
    with the current implementation the first node in each role
    actually ping tests only against it's own address? So adding
    the test to ping the gateway addresses improves the validation
    in general.

    Make PingTestGatewayIPsMap a map of flatten lists

    PingTestGatewayIPsMap elements may contain list of lists, causing
    failures on roles that iterate over them. See [1] and #1950528 for
    more info.

    [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/817500

    Add attr of networks and subnets to Networks resource

    In the Networks resource tempaltes add the full resource
    attributes to net_attributes_map. This is a partial cherry-pick of
    commit 5b3878580a9768be7c7c39d3062f567eb5e13767.

    Also the parameter {{role.name}}ControlPlaneSubnet is added in
    overcloud.yaml.j2, the parameter was previously only used in
    puppet/role.role.j2.yaml. This is a partial cherry-pick of
    commit 7b8c6b07dad3ddebbf9966fc6c8ba3e8a7b4cd8a.

    Conflicts:
      common/deploy-steps.j2

    Closes-bug: #1950528
    Related RHBZ#1875962
    Depends-On: I93cded61ffb862e99fd8043dbf0def3d16079692
    Depends-On: https://review.opendev.org/825676
    Change-Id: I3309f2a0e39ad115930ecd5c0e895816565819e9
    (cherry picked from commit 5d830980ec842c6093a8fd44ef922014bcadf693)
    (cherry picked from commit efc328c66898bfc09c5dcb00fe5f584a45b899ff)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/825433
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/b22453f52d62c6e4e673fb3d26d9dd8a7a9aee7f
Submitter: "Zuul (22348)"
Branch: stable/train

commit b22453f52d62c6e4e673fb3d26d9dd8a7a9aee7f
Author: Harald Jensås <email address hidden>
Date: Thu Oct 28 22:52:45 2021 +0200

    Add ping test for all networks gateway IPs

    Add ping test for gateway IPs on all networks, to ensure
    all gateways are reachable.

    The releated Bugzilla reports an issue where some network
    fabrics fail when using the current node ping test, which
    pings the first node in each role. The fabric simply does
    not forward traffic before the gateway has been pinged.

    One can argue that the fabric in question is broken. However,
    with the current implementation the first node in each role
    actually ping tests only against it's own address? So adding
    the test to ping the gateway addresses improves the validation
    in general.

    Make PingTestGatewayIPsMap a map of flatten lists

    PingTestGatewayIPsMap elements may contain list of lists, causing
    failures on roles that iterate over them. See [1] and #1950528 for
    more info.

    [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/817500

    Add attr of networks and subnets to Networks resource

    In the Networks resource tempaltes add the full resource
    attributes to net_attributes_map. This is a partial cherry-pick of
    commit 5b3878580a9768be7c7c39d3062f567eb5e13767.

    Also the parameter {{role.name}}ControlPlaneSubnet is added in
    overcloud.yaml.j2, the parameter was previously only used in
    puppet/role.role.j2.yaml. This is a partial cherry-pick of
    commit 7b8c6b07dad3ddebbf9966fc6c8ba3e8a7b4cd8a.

    Conflicts:
      common/deploy-steps.j2
      network/network.j2

    Closes-bug: #1950528
    Related RHBZ#1875962
    Depends-On: I93cded61ffb862e99fd8043dbf0def3d16079692
    Depends-On: https://review.opendev.org/825676
    Change-Id: I3309f2a0e39ad115930ecd5c0e895816565819e9
    (cherry picked from commit 5d830980ec842c6093a8fd44ef922014bcadf693)
    (cherry picked from commit efc328c66898bfc09c5dcb00fe5f584a45b899ff)
    (cherry picked from commit 720b18fd8cb233aa809dd38ea71d1cf5ae39808f)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 12.4.6

This issue was fixed in the openstack/tripleo-heat-templates 12.4.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 16.0.0

This issue was fixed in the openstack/tripleo-heat-templates 16.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 13.6.0

This issue was fixed in the openstack/tripleo-heat-templates 13.6.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.