Node has not become online after repititive cold reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Unassigned | ||
Mitaka |
Fix Released
|
High
|
Unassigned | ||
Newton |
Fix Committed
|
High
|
Unassigned |
Bug Description
Detailed bug description:
016-06-03 07:57:56,726 - ERROR decorators.py:126 -- Traceback (most recent call last):
File "/home/
result = func(*args, **kwargs)
File "/home/
result = func(*args, **kwargs)
File "/home/
'slave-05']))
File "/home/
' after cold start'.
File "/home/
raise ASSERTION_
AssertionError: Node slave-01 has not become online after cold start
Steps to reproduce:
1. Revert snapshot 'prepare_
2. Wait until MySQL Galera is UP on some controller
3. Check Ceph status
4. Run ostf
5. Fill ceph partitions on all nodes up to 30%
6. Check Ceph status
7. Run RALLY
8. 100 times repetitive reboot: <<<<failed on 3rd reboot
9. Cold restart of all nodes
10. Wait for HA services ready
11. Wait until MySQL Galera is UP on some controller
12. Run ostf
Expected results:
all nodes became online
Actual result:
AssertionError: Node slave-01 has not become online after cold start
Reproducibility:
https:/
Workaround:
-
Impact:
swarm
Changed in fuel: | |
status: | New → Confirmed |
tags: | added: area-python |
tags: |
added: area-library removed: area-python |
Changed in fuel: | |
assignee: | Fuel Sustaining (fuel-sustaining-team) → Kyrylo Galanov (kgalanov) |
Changed in fuel: | |
assignee: | Kyrylo Galanov (kgalanov) → Oleksiy Molchanov (omolchanov) |
tags: | added: swarm-fail |
tags: | added: on-verification |
Changed in fuel: | |
assignee: | Maksym Strukov (unbelll) → nobody |
The issue was not reproduced in CI since 97 build. Other environments in this test are ok itself after manual check.
Some improvements should be done in devops library to increase restart procedure realibility and repeatability.