periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-component-master-validation log pollution leads to intermittent failures

Bug #1993262 reported by Jiri Podivin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Jiri Podivin

Bug Description

Job sometimes fails due to logs from pre-deploy validations being intermixed with logs from followup validations tests. This leads to assertion failure during "Verify full history output" task.

Trace:
------
2022-10-17 13:06:13.301375 | primary | TASK [validations : Verify full history output] ********************************
2022-10-17 13:06:13.301421 | primary | Monday 17 October 2022 13:06:13 -0400 (0:00:02.556) 1:58:46.484 ********
2022-10-17 13:06:13.385635 | primary | fatal: [undercloud]: FAILED! => {"changed": false, "msg": "The history output length 7 doesn't match the number of expected validations runs 5.\n"}

Log:
----
https://logserver.rdoproject.org/91/43591/11/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-component-master-validation/8bf10e0/job-output.txt

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to validations-common (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to validations-common (master)

Reviewed: https://review.opendev.org/c/openstack/validations-common/+/861716
Committed: https://opendev.org/openstack/validations-common/commit/b02d478d513a2b35b969ef96f766923714c4a20a
Submitter: "Zuul (22348)"
Branch: master

commit b02d478d513a2b35b969ef96f766923714c4a20a
Author: Jiri Podivin <email address hidden>
Date: Tue Oct 18 10:10:17 2022 +0200

    Adding CI task for clean up of existing validation logs

    Ensuring that validation logs are empty before starting framework tests
    will ensure clean environment actual testing.

    Closes-Bug: #1993262

    Signed-off-by: Jiri Podivin <email address hidden>
    Change-Id: Ia4e2fbbbd02d66522fc5f92441f9b6d3c0ebd840

Changed in tripleo:
status: In Progress → Fix Released
tags: added: promotion-blocker
Revision history for this message
Marios Andreou (marios-b) wrote :

also hitting the zed component lines there

        * https://logserver.rdoproject.org/51/45451/5/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-zed/3d29db4/job-output.txt
        * 2022-10-19 11:52:47.116436 | primary | fatal: [undercloud]: FAILED! => {"changed": false, "msg": "The history output length 3 doesn't match the number of expected validations runs 1.\n"}

        * https://logserver.rdoproject.org/51/45451/5/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-zed/c9f0335/job-output.txt
        * 2022-10-19 12:10:22.112916 | primary | fatal: [undercloud]: FAILED! => {"changed": false, "msg": "The history output length 5 doesn't match the number of expected validations runs 3.\n"}

        * https://logserver.rdoproject.org/51/45451/5/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-compute-zed/b7ce58b/job-output.txt
        * 2022-10-19 12:09:13.436340 | primary | fatal: [undercloud]: FAILED! => {"changed": false, "msg": "The history output length 3 doesn't match the number of expected validations runs 1.\n"}

        * https://logserver.rdoproject.org/51/45451/5/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-component-zed-validation/eccce54/job-output.txt
        * 2022-10-19 12:01:50.009987 | primary | fatal: [undercloud]: FAILED! => {"changed": false, "msg": "The history output length 7 doesn't match the number of expected validations runs 5.\n"}

Changed in tripleo:
importance: High → Critical
status: Fix Released → Triaged
Revision history for this message
Marios Andreou (marios-b) wrote :

moving this back to in progress based on irc chat just now with jpodivin

I think we need to disable validations until we work it out as this blocks components across branches (master/zed verified but expect this to hit wallaby too?)

the fix in [1] merged and is being picked up in the validation component job in [2]

2022-10-19 21:52:06.613349 | primary | TASK [validations : Remove validations log dir to ensure clean env] ************
2022-10-19 21:52:06.613365 | primary | Wednesday 19 October 2022 21:52:06 -0400 (0:00:00.116) 1:51:32.546 *****
2022-10-19 21:52:08.668568 | primary | changed: [undercloud]

but the problem persists:

2022-10-19 21:54:52.262362 | primary | TASK [validations : Verify full history output] ********************************
2022-10-19 21:54:52.262382 | primary | Wednesday 19 October 2022 21:54:52 -0400 (0:00:02.557) 1:54:18.195 *****
2022-10-19 21:54:52.374344 | primary | fatal: [undercloud]: FAILED! => {"changed": false, "msg": "The history output length 7 doesn't match the number of expected validations runs 5.\n"}

[1] https://review.opendev.org/c/openstack/validations-common/+/861716
[2] https://logserver.rdoproject.org/openstack-component-validation/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-component-master-validation/80cc670/job-output.txt

Revision history for this message
Marios Andreou (marios-b) wrote :

posted that

https://review.rdoproject.org/r/c/rdo-jobs/+/45708 Disable validations from running in component lines for related bug

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to validations-common (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Jiri Podivin (jpodivin) wrote :

Turns out I have messed up the directory location. Rendering my previously merged patch pointless, new proposal should work out though.

Revision history for this message
Marios Andreou (marios-b) wrote :

adding milestone - I think that must be why this did not show up on cix board yet

Changed in tripleo:
milestone: none → zed-1
milestone: zed-1 → antelope-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to validations-common (master)

Reviewed: https://review.opendev.org/c/openstack/validations-common/+/861986
Committed: https://opendev.org/openstack/validations-common/commit/f9b11a160b76fa8d38b6f0bc4922e84da88e3178
Submitter: "Zuul (22348)"
Branch: master

commit f9b11a160b76fa8d38b6f0bc4922e84da88e3178
Author: Jiri Podivin <email address hidden>
Date: Thu Oct 20 13:00:40 2022 +0200

    Retargetting log removal task

    Existing log dir removal task, added with Ia4e2fbbbd02d66522fc5f92441f9b6d3c0ebd840
    was targetting wrong directory, essentially becoming useless.
    With this change the location will be cleaned up
    before execution of the tests as intened.

    Closes-Bug: #1993262

    Signed-off-by: Jiri Podivin <email address hidden>
    Change-Id: Ic2130702ca459b62a82682c7e85b89359ac6dcbe

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :

we are good to go here

we also re-enabled validations with https://review.rdoproject.org/r/c/rdo-jobs/+/44771

Revision history for this message
Jiri Podivin (jpodivin) wrote :

Yep. The bug is closed and fix is released.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/validations-common 1.8.0

This issue was fixed in the openstack/validations-common 1.8.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.