Cluster fails when 2 controller nodes become down simultaneously | tripleo wallaby
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
puppet-pacemaker |
Invalid
|
Undecided
|
Unassigned |
Bug Description
I have configured a 3 node pcs cluster for openstack.
To test the HA, i issue the following commands:
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT &&
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT &&
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 5016 -j ACCEPT &&
iptables -A INPUT -p udp -m state --state NEW -m udp --dport 5016 -j ACCEPT &&
iptables -A INPUT ! -i lo -j REJECT --reject-with icmp-host-
iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT &&
iptables -A OUTPUT -p tcp --sport 5016 -j ACCEPT &&
iptables -A OUTPUT -p udp --sport 5016 -j ACCEPT &&
iptables -A OUTPUT ! -o lo -j REJECT --reject-with icmp-host-
When i issue iptables command on 1 node then it is fenced and forced to reboot and cluster works fine.
But when i issue this on 2 of the controller nodes the resource bundles fail and doesn't come back up.
[root@overcloud
Cluster name: tripleo_cluster
Cluster Summary:
* Stack: corosync
* Current DC: overcloud-
* Last updated: Sat Oct 29 03:15:29 2022
* Last change: Sat Oct 29 03:12:26 2022 by root via crm_resource on overcloud-
* 19 nodes configured
* 68 resource instances configured
Node List:
* Node overcloud-
* Node overcloud-
* Online: [ overcloud-
Full List of Resources:
* ip-172.25.201.91 (ocf::heartbeat
* ip-172.25.201.150 (ocf::heartbeat
* ip-172.25.201.206 (ocf::heartbeat
* ip-172.25.201.250 (ocf::heartbeat
* ip-172.25.202.50 (ocf::heartbeat
* ip-172.25.202.90 (ocf::heartbeat
* Container bundle set: haproxy-bundle [172.25.
* haproxy-
* haproxy-
* haproxy-
* haproxy-
* Container bundle set: galera-bundle [172.25.
* galera-bundle-0 (ocf::heartbeat
* galera-bundle-1 (ocf::heartbeat
* galera-bundle-2 (ocf::heartbeat
* galera-bundle-3 (ocf::heartbeat
* Container bundle set: redis-bundle [172.25.
* redis-bundle-0 (ocf::heartbeat
* redis-bundle-1 (ocf::heartbeat
* redis-bundle-2 (ocf::heartbeat
* redis-bundle-3 (ocf::heartbeat
* Container bundle set: ovn-dbs-bundle [172.25.
* ovn-dbs-bundle-0 (ocf::ovn:
* ovn-dbs-bundle-1 (ocf::ovn:
* ovn-dbs-bundle-2 (ocf::ovn:
* ovn-dbs-bundle-3 (ocf::ovn:
* ip-172.25.201.208 (ocf::heartbeat
* Container bundle: openstack-
* openstack-
* Container bundle: openstack-
* openstack-
* Container bundle set: rabbitmq-bundle [172.25.
* rabbitmq-bundle-0 (ocf::heartbeat
* rabbitmq-bundle-1 (ocf::heartbeat
* rabbitmq-bundle-2 (ocf::heartbeat
* rabbitmq-bundle-3 (ocf::heartbeat
* ip-172.25.204.250 (ocf::heartbeat
* ceph-nfs (systemd:
* Container bundle: openstack-
* openstack-
* stonith-
* stonith-
* stonith-
* stonith-
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
PCS requires more than half the nodes to be alive for the clusterto work it seems.
Correct, pacemaker will shut down services on nodes without quorum. This is by design.