The jobs fails randomly at different places like sometimes at deployment and sometimes while running tempest.
pcs resource status unhealthy on the jobs where it fails this way:-
Failed Resource Actions:
* haproxy-bundle-podman-0_stop_0 on standalone 'error' (1): call=87, status='Timed Out', exitreason='', last-rc-change='2021-07-28 06:50:54Z', queued=0ms, exec=20002ms
* galera-bundle-podman-0_stop_0 on standalone 'error' (1): call=76, status='Timed Out', exitreason='', last-rc-change='2021-07-28 06:50:06Z', queued=0ms, exec=20006ms
* rabbitmq-bundle-podman-0_stop_0 on standalone 'error' (1): call=72, status='Timed Out', exitreason='', last-rc-change='2021-07-28 06:49:34Z', queued=0ms, exec=20005ms
* redis-bundle-podman-0_stop_0 on standalone 'error' (1): call=78, status='Timed Out', exitreason='', last-rc-change='2021-07-28 06:50:06Z', queued=0ms, exec=20001ms
* ovn-dbs-bundle-podman-0_stop_0 on standalone 'error' (1): call=88, status='Timed Out', exitreason='', last-rc-change='2021-07-28 06:50:54Z', queued=0ms, exec=20002ms
* openstack-cinder-backup-podman-0_stop_0 on standalone 'error' (1): call=74, status='Timed Out', exitreason='', last-rc-change='2021-07-28 06:49:34Z', queued=0ms, exec=20004ms
* openstack-cinder-volume-podman-0_stop_0 on standalone 'error' (1): call=82, status='Timed Out', exitreason='', last-rc-change='2021-07-28 06:50:28Z', queued=0ms, exec=20799ms
From pacemaker logs:-
Jul 28 06:49:48 standalone.localdomain pacemaker-controld [389429] (throttle_check_thresholds) info: Moderate CPU load detected: 11.400000
Jul 28 06:49:48 standalone.localdomain pacemaker-controld [389429] (throttle_send_command) info: New throttle mode: medium load (was negligible)
Jul 28 06:49:54 standalone.localdomain pacemaker-execd [389426] (child_timeout_callback) warning: rabbitmq-bundle-podman-0_stop_0 process (PID 567787) timed out
Jul 28 06:49:54 standalone.localdomain pacemaker-execd [389426] (operation_finished) warning: rabbitmq-bundle-podman-0_stop_0[567787] timed out after 20000ms
Jul 28 06:49:54 standalone.localdomain pacemaker-execd [389426] (log_finished) info: rabbitmq-bundle-podman-0 stop (call 72, PID 567787) exited with status 1
Jul 28 06:50:18 standalone.localdomain pacemaker-controld [389429] (throttle_check_thresholds) info: Moderate CPU load detected: 11.430000
Jul 28 06:50:23 podman(galera-bundle-podman-0)[569777]: INFO: 67d41dc0f29c51917d9fb5553925205f58fada6c72f4fedd267e867fdd28221c
Jul 28 06:50:24 podman(galera-bundle-podman-0)[569777]: NOTICE: Cleaning up inactive container, galera-bundle-podman-0.
Jul 28 06:50:26 standalone.localdomain pacemaker-execd [389426] (child_timeout_callback) warning: galera-bundle-podman-0_stop_0 process (PID 569777) timed out
Jul 28 06:50:26 standalone.localdomain pacemaker-execd [389426] (operation_finished) warning: galera-bundle-podman-0_stop_0[569777] timed out after 20000ms
Jul 28 06:50:26 standalone.localdomain pacemaker-execd [389426] (log_finished) info: galera-bundle-podman-0 stop (call 76, PID 569777) exited with status 1
Example logs:-
https://logserver.rdoproject.org/01/34701/1/check/rdoinfo-tripleo-master-testing-centos-8-scenario001-standalone/5ba0b6f/logs/undercloud/var/log/extra/pcs.txt.gz
https://logserver.rdoproject.org/01/34701/1/check/rdoinfo-tripleo-master-testing-centos-8-scenario001-standalone/205cfe2/logs/undercloud/var/log/extra/pcs.txt.gz
https://logserver.rdoproject.org/01/34701/1/check/rdoinfo-tripleo-master-testing-centos-8-scenario001-standalone/7fffd3e/logs/undercloud/var/log/extra/pcs.txt.gz
https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/3a8026d/logs/overcloud-controller-0/var/log/extra/pcs.txt.gz
https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/d6f41d6/logs/overcloud-controller-0/var/log/extra/pcs.txt.gz
https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario001-standalone-master/e7a09e6/logs/undercloud/var/log/extra/pcs.txt.gz
https://review.opendev.org/c/openstack/tripleo-heat-templates/+/791416 triggered it as timeout setting changed, before this patch it's used to be 120s and now it fallback's to default 20s leading to failures when there is load on system.
Fix proposed to branch: master /review. opendev. org/c/openstack /tripleo- heat-templates/ +/802696
Review: https:/