HA. mysql cluster failover issues after a connection loss on primary controller management nic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
High
|
Sergii Golovatiuk | ||
5.1.x |
Won't Fix
|
High
|
Fuel Library (Deprecated) | ||
6.0.x |
Invalid
|
High
|
Sergii Golovatiuk |
Bug Description
Scenario:
1. Deploy HA env using 5.1.1 48-RC2 iso:
with nova-flat, ceph for images and volumes. 3x Controllers, 2x computes, 2x ceph-storage
2. Run Network verification tests (passes well)
3. Run OSTF (all pass fine)
4. Simulate connection loss on management interface of the primary controller node
by running next command
# brctl delif <br> <if>
where 'if' is the node interface attached to management network
and 'br' is the bridge this 'if' is attached to.
5. Wait for 30+ min
6. Run OSTF
Result we have several tests failed:
- Create volume and boot instance from it
- Check network connectivity from instance without floating IP
- Check network connectivity from instance via floating IP
- Launch instance, create snapshot, launch instance from snapshot
HA
- Mysql node detection failed Please refer to OpenStack logs for more details.
- Check amount of tables in databases is the same on each node
- Check RabbitMQ is available
though 'Check galera environment state' passes well.
- Platform tests also are failed.
And almost every test fails because of a time out or alike.
In fact on Horizon most of the actions can be done successfully, but not always and pretty much always takes quite a lot of time.
e.g. to create an instance, while trying to assign floating ip to an instance (also eventually the ip WAS assigned, but an error message is shown 'HTTP 504' and 'Unable to assign...')
Cluster status:
http://
Changed in fuel: | |
status: | Invalid → Confirmed |
status: | Confirmed → New |
Changed in fuel: | |
status: | New → Confirmed |
The failover procedure is not an instant. It requires some time to bring services back to the operations. That is why it is important to have a relevant OSTF checks, such as HA group, in order to run them as a mandatory prerequisite for any other checks. If HA health checks cannot pass, you should not expect any other would.