hanode-relation-changed Job for corosync.service failed because a timeout was exceeded

Bug #1951392 reported by Hua Zhang
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Triaged
High
Unassigned

Bug Description

Recently, various timeout problems (eg: [1]) have occurred in CI system. I did some analysis for one CI sosreport [2], I found that these timeout issues were ultimately caused by the timeout issue of hacluster_octavia

$ grep -r 'timeout' 4d46f4e0-bc78-4702-92bf-c7e9f723e51c/octavia_3/var/log/juju/unit-hacluster-octavia-3.log |tail -n1
2021-11-17 11:09:53 WARNING hanode-relation-changed Job for corosync.service failed because a timeout was exceeded.

Other context about this log pls refer - https://pastebin.ubuntu.com/p/J44tRQcHGP/

and I didn't find any more info about corosync from this CI sosreport.

[1] https://review.opendev.org/c/openstack/charm-octavia/+/787700/
[2] https://openstack-ci-reports.ubuntu.com/artifacts/5ee/787700/29/check/bionic-ussuri-ha-ovn/5ee01ba/log/juju-crashdump-4d46f4e0-bc78-4702-92bf-c7e9f723e51c.tar.xz

Hua Zhang (zhhuabj)
tags: added: eng sts
tags: added: seg
removed: eng
Revision history for this message
Hua Zhang (zhhuabj) wrote :

I set up a zaza CI test env in stsstack today, I ran it twice, one was successful,
but one failed, see - https://pastebin.ubuntu.com/p/z8SwYR5cHy/

Changed in charm-hacluster:
status: New → Triaged
importance: Undecided → High
Felipe Reyes (freyes)
tags: added: unstable-test
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.