tripleo-ci-centos-8-standalone in stable/wallaby consistently fails because of the failure during creating galera resource

Bug #1997939 reported by Takashi Kajinami
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Luca Miccini

Bug Description

Description
===========
tripleo-ci-centos-8-standalone in stable/wallaby is now consistenly failing.
 https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-standalone&branch=stable%252Fwallaby&skip=0

It seems the standalone deployment fails because of the failure with galera resource creation by puppet.

example:
https://zuul.opendev.org/t/openstack/build/1855d01fe0984c24bc230f36571e07e0

https://zuul.opendev.org/t/openstack/build/1855d01fe0984c24bc230f36571e07e0/log/logs/undercloud/home/zuul/standalone_deploy.log#3227-3228

Warning: /etc/puppet/hiera.yaml: Use of 'hiera.yaml' version 3 is deprecated. It should be converted to version 5
   (file: /etc/puppet/hiera.yaml)
Warning: Undefined variable '::deploy_config_name';
   (file & line not available)
Warning: The function 'hiera' is deprecated in favor of using 'lookup'. See https://puppet.com/docs/puppet/7.6/deprecated_language.html
   (file & line not available)
Warning: This command is deprecated and will be removed. Please use 'pcs property config' instead.
Warning: This command is deprecated and will be removed. Please use 'pcs constraint config' instead.
Warning: This command is deprecated and will be removed. Please use 'pcs constraint config' instead.
Error: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20221124-90206-ctj9kd resource create galera ocf:heartbeat:galera log='/var/log/mysql/mysqld.log' additional_parameters='--open-files-limit=16384' enable_creation=true wsrep_cluster_address='gcomm://standalone.ctlplane.localdomain' cluster_host_map='standalone:standalone.ctlplane.localdomain' meta master-max=1 ordered=true container-attribute-target=host op promote timeout=300s on-fail=block bundle galera-bundle failed: Error: Validation result from agent (use --force to override):. Too many tries
Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Pacemaker::Resource::Ocf[galera]/Pcmk_resource[galera]/ensure: change from 'absent' to 'present' failed: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20221124-90206-ctj9kd resource create galera ocf:heartbeat:galera log='/var/log/mysql/mysqld.log' additional_parameters='--open-files-limit=16384' enable_creation=true wsrep_cluster_address='gcomm://standalone.ctlplane.localdomain' cluster_host_map='standalone:standalone.ctlplane.localdomain' meta master-max=1 ordered=true container-attribute-target=host op promote timeout=300s on-fail=block bundle galera-bundle failed: Error: Validation result from agent (use --force to override):. Too many tries", "stderr_lines": ["Warning: /etc/puppet/hiera.yaml: Use of 'hiera.yaml' version 3 is deprecated. It should be converted to version 5", " (file: /etc/puppet/hiera.yaml)", "Warning: Undefined variable '::deploy_config_name'; ", " (file & line not available)", "Warning: The function 'hiera' is deprecated in favor of using 'lookup'. See https://puppet.com/docs/puppet/7.6/deprecated_language.html", " (file & line not available)", "Warning: This command is deprecated and will be removed. Please use 'pcs property config' instead.", "Warning: This command is deprecated and will be removed. Please use 'pcs constraint config' instead.", "Warning: This command is deprecated and will be removed. Please use 'pcs constraint config' instead.", "Error: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20221124-90206-ctj9kd resource create galera ocf:heartbeat:galera log='/var/log/mysql/mysqld.log' additional_parameters='--open-files-limit=16384' enable_creation=true wsrep_cluster_address='gcomm://standalone.ctlplane.localdomain' cluster_host_map='standalone:standalone.ctlplane.localdomain' meta master-max=1 ordered=true container-attribute-target=host op promote timeout=300s on-fail=block bundle galera-bundle failed: Error: Validation result from agent (use --force to override):. Too many tries", "Error: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Pacemaker::Resource::Ocf[galera]/Pcmk_resource[galera]/ensure: change from 'absent' to 'present' failed: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20221124-90206-ctj9kd resource create galera ocf:heartbeat:galera log='/var/log/mysql/mysqld.log' additional_parameters='--open-files-limit=16384' enable_creation=true wsrep_cluster_address='gcomm://standalone.ctlplane.localdomain' cluster_host_map='standalone:standalone.ctlplane.localdomain' meta master-max=1 ordered=true container-attribute-target=host op promote timeout=300s on-fail=block bundle galera-bundle failed: Error: Validation result from agent (use --force to override):. Too many tries"], "stdout": "Notice: Compiled catalog for standalone.localdomain in environment production in 2.20 seconds
Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Pacemaker::Property[galera-role-standalone]/Pcmk_property[property-standalone-galera-role]/ensure: created
Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/Pacemaker::Resource::Bundle[galera-bundle]/Pcmk_bundle[galera-bundle]/ensure: created
Notice: Applied catalog in 80.32 seconds
Changes:
            Total: 2
Events:
          Failure: 1
          Success: 2
            Total: 3
Resources:
           Failed: 1
          Skipped: 141
          Changed: 2
      Out of sync: 3
            Total: 145
Time:
      Pcmk bundle: 14.36
         Last run: 1669328593
   Config retrieval: 2.40
    Pcmk resource: 56.58
    Pcmk property: 8.89
   Transaction evaluation: 80.27
   Catalog application: 80.32
            Total: 80.33
Version:
           Config: 1669328511
           Puppet: 7.6.1", "stdout_lines": ["Notice: Compiled catalog for standalone.localdomain in environment production in 2.20 seconds

Changed in tripleo:
importance: Undecided → Critical
tags: added: ci promotion-blocker
Changed in tripleo:
milestone: none → antelope-1
Revision history for this message
Jiri Podivin (jpodivin) wrote :
Download full text (4.1 KiB)

It appears that there is some sort of parsing error at play.

Log:
----
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_185/865601/1/check/tripleo-ci-centos-8-standalone/1855d01/logs/undercloud/var/log/extra/podman/containers/container-puppet-mysql/stdout.log

Trace:
------
include ::tripleo::packages
['Mysql_datadir', 'Mysql_user', 'Mysql_database', 'Mysql_grant', 'Mysql_plugin'].each |String $val| { noop_resource($val) }
exec {'wait-for-settle': command => '/bin/true' }
include tripleo::profile::pacemaker::database::mysql_bundle
Running puppet
+ /usr/bin/puppet apply --summarize --detailed-exitcodes --color=false --modulepath=/etc/puppet/modules:/usr/share/openstack-puppet/modules --tags '"file,file_line,concat,augeas,cron,file"' /etc/config.pp
+ logger -s -t puppet-user
<13>Nov 24 22:20:00 puppet-user: Warning: Found multiple default providers for service: swiftinit, pacemaker, base, pacemaker_xml; using swiftinit
<13>Nov 24 22:20:00 puppet-user: Warning: /etc/puppet/hiera.yaml: Use of 'hiera.yaml' version 3 is deprecated. It should be converted to version 5
<13>Nov 24 22:20:00 puppet-user: (file: /etc/puppet/hiera.yaml)
<13>Nov 24 22:20:00 puppet-user: Warning: Undefined variable '::deploy_config_name';
<13>Nov 24 22:20:00 puppet-user: (file & line not available)
<13>Nov 24 22:20:00 puppet-user: Warning: The function 'hiera' is deprecated in favor of using 'lookup'. See https://puppet.com/docs/puppet/7.6/deprecated_language.html
<13>Nov 24 22:20:00 puppet-user: (file & line not available)
<13>Nov 24 22:20:00 puppet-user: Could not connect to the CIB: Transport endpoint is not connected
<13>Nov 24 22:20:00 puppet-user: Init failed, could not perform requested operations
<13>Nov 24 22:20:00 puppet-user: -:1: parser error : Document is empty
<13>Nov 24 22:20:00 puppet-user:
<13>Nov 24 22:20:00 puppet-user: ^
<13>Nov 24 22:20:00 puppet-user: Warning: Unknown variable: '::pacemaker::pcs_010'. (file: /etc/puppet/modules/pacemaker/manifests/resource/bundle.pp, line: 159, column: 6)
<13>Nov 24 22:20:00 puppet-user: Notice: Compiled catalog for standalone.localdomain in environment production in 2.37 seconds
<13>Nov 24 22:20:00 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/File[/etc/sysconfig/clustercheck]/ensure: defined content as '{sha256}089a031afe62721961eaec54469382f2409e6208f899d3444da2809306e15024'
<13>Nov 24 22:20:00 puppet-user: Notice: /Stage[main]/Mysql::Server::Config/File[mysql-config-file]/content: content changed '{sha256}059405bb34a8ec5c13ea553f64cbc450868c0af1b65d608c8cc4a601e4dcc2dd' to '{sha256}f619fa6c8c600bf061999529c2eb34880eb2c96fff28ef227f947a0925b4c41a'
<13>Nov 24 22:20:00 puppet-user: Notice: /Stage[main]/Mysql::Server::Installdb/File[/var/log/mariadb/mariadb.log]/ensure: created
<13>Nov 24 22:20:00 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql_bundle/File[/root/.my.cnf]/ensure: defined content as '{sha256}6baae0f7392ddab5cdeb48835ece4f217cb7bc1f6c610069fa0981b41544704f'
<13>Nov 24 22:20:00 puppet-user: Notice: Applied catalog in 0.18 seconds
<13>Nov 24 22:20:00 puppet-user: Changes:
<13>Nov 2...

Read more...

Revision history for this message
Jiri Podivin (jpodivin) wrote :

It appears that at least some of the derived jobs are also affected.

https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-standalone-tv-validation

Revision history for this message
Rabi Mishra (rabi) wrote :
Revision history for this message
Takashi Kajinami (kajinamit) wrote :

As far as I understand the error in comment:1 is not the direct cause.

Looking at the puppet error it seems the pcs command [1] is failing because of some validation error[2].

[1]
pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20221124-90206-ctj9kd resource create galera ocf:heartbeat:galera log='/var/log/mysql/mysqld.log' additional_parameters='--open-files-limit=16384' enable_creation=true wsrep_cluster_address='gcomm://standalone.ctlplane.localdomain' cluster_host_map='standalone:standalone.ctlplane.localdomain' meta master-max=1 ordered=true container-attribute-target=host op promote timeout=300s on-fail=block bundle galera-bundle

[2]
Error: Validation result from agent (use --force to override):. Too many tries"

We initially suspected inconsistent package versions between host and container but we later confirmed the same version is installed in both sides.

Changed in tripleo:
status: New → Triaged
Revision history for this message
Marios Andreou (marios-b) wrote :

this was filed and marked duplicate https://bugs.launchpad.net/tripleo/+bug/1998080

Revision history for this message
Luca Miccini (lmiccini2) wrote (last edit ):

this looks like a legitimate issue with the latest pcs 0.10.14-6
workaround (before running the standalone deployment):

dnf downgrade -y pcs

or pin it to a previous version, like pcs-0.10.14-5.el8.x86_64.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/865880

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Luca Miccini (lmiccini2) wrote :

I've tried https://review.opendev.org/c/openstack/tripleo-heat-templates/+/865881 and I could deploy a standalone. Maybe we can give this a go.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/zed)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/865935

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)
Changed in tripleo:
assignee: nobody → Luca Miccini (lmiccini2)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/865881
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/59300dfb30c5ee9c8186ee4b28f9c57e9b345647
Submitter: "Zuul (22348)"
Branch: master

commit 59300dfb30c5ee9c8186ee4b28f9c57e9b345647
Author: Luca Miccini <email address hidden>
Date: Mon Nov 28 19:38:15 2022 +0100

    Use --force when creating pacemaker resources

    pcs-0.10.14-6 introduced enforce validation at resources creation time.
    Unfortunately this doesn't work in our use case as puppet runs on the
    host but the necessary binaries are installed inside the containers.

    Let's try using --force to workaround.

    Closes-Bug: #1997939

    Change-Id: Id7616ebceb820d9799661c0fbc5f3f234f421ea3

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/865934
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/fde20f941d0452d875ade3c86caf0a7ed34b789f
Submitter: "Zuul (22348)"
Branch: stable/zed

commit fde20f941d0452d875ade3c86caf0a7ed34b789f
Author: Luca Miccini <email address hidden>
Date: Mon Nov 28 19:38:15 2022 +0100

    Use --force when creating pacemaker resources

    pcs-0.10.14-6 introduced enforce validation at resources creation time.
    Unfortunately this doesn't work in our use case as puppet runs on the
    host but the necessary binaries are installed inside the containers.

    Let's try using --force to workaround.

    Closes-Bug: #1997939

    Change-Id: Id7616ebceb820d9799661c0fbc5f3f234f421ea3
    (cherry picked from commit 59300dfb30c5ee9c8186ee4b28f9c57e9b345647)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/865935
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/3fc9478a863980e4448434f413ff3e878ccf04ab
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 3fc9478a863980e4448434f413ff3e878ccf04ab
Author: Luca Miccini <email address hidden>
Date: Mon Nov 28 19:38:15 2022 +0100

    Use --force when creating pacemaker resources

    pcs-0.10.14-6 introduced enforce validation at resources creation time.
    Unfortunately this doesn't work in our use case as puppet runs on the
    host but the necessary binaries are installed inside the containers.

    Let's try using --force to workaround.

    Conflicts:
            deployment/rabbitmq/rabbitmq-messaging-rpc-pacemaker-puppet.yaml

    Closes-Bug: #1997939

    Change-Id: Id7616ebceb820d9799661c0fbc5f3f234f421ea3
    (cherry picked from commit 59300dfb30c5ee9c8186ee4b28f9c57e9b345647)
    (cherry picked from commit fde20f941d0452d875ade3c86caf0a7ed34b789f)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/865955
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/b17daedddfde90ea19ed71e46ca25b21503bb83d
Submitter: "Zuul (22348)"
Branch: stable/train

commit b17daedddfde90ea19ed71e46ca25b21503bb83d
Author: Luca Miccini <email address hidden>
Date: Mon Nov 28 19:38:15 2022 +0100

    Use --force when creating pacemaker resources

    pcs-0.10.14-6 introduced enforce validation at resources creation time.
    Unfortunately this doesn't work in our use case as puppet runs on the
    host but the necessary binaries are installed inside the containers.

    Let's try using --force to workaround.

    Conflicts:
            deployment/rabbitmq/rabbitmq-messaging-rpc-pacemaker-puppet.yaml

    (wallaby to train)
    Conflicts:
            deployment/database/redis-pacemaker-puppet.yaml
            deployment/rabbitmq/rabbitmq-messaging-rpc-pacemaker-puppet.yaml

    Closes-Bug: #1997939

    Change-Id: Id7616ebceb820d9799661c0fbc5f3f234f421ea3
    (cherry picked from commit 59300dfb30c5ee9c8186ee4b28f9c57e9b345647)
    (cherry picked from commit fde20f941d0452d875ade3c86caf0a7ed34b789f)
    (cherry picked from commit 3fc9478a863980e4448434f413ff3e878ccf04ab)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 18.0.0

This issue was fixed in the openstack/tripleo-heat-templates 18.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.