All nodes in error state after scaling because one compute node was unreachable
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
High
|
Maciej Kwiek | ||
6.1.x |
In Progress
|
High
|
MOS Maintenance | ||
7.0.x |
In Progress
|
High
|
MOS Maintenance |
Bug Description
Fuel 6.1.
Short description:
Customer successfully deployed cloud with 20+ compute nodes. He tried to add one more compute node day or two later. Compute node was successfully deployed and then astute task failed on 'uploadfile' step, because 1 compute node was unavailable that time and mcollective agent couldn't reach it. After this astute marked all nodes as "error" and set cloud status to error.
http://
Customer plans to use a lot of compute nodes, so one of compute nodes could be unreachable when he will scale up cloud. And by the way, unavailability of one or two compute nodes doesn't affect whole cloud.
Steps to reproduce:
1. Deploy cloud with 1 controller and 2 compute nodes.
2. Make 1 compute node unreachable.
3. Scale up your cloud with 1 more compute node.
Current result:
All nodes after scale up when 1 compute node is unreachable will be in error state.
Expected result.
Only unreachable node after scale up when 1 compute node is unreachable will be in error state.
Changed in fuel: | |
status: | New → Confirmed |
importance: | Undecided → High |
assignee: | nobody → Fuel Python Team (fuel-python) |
milestone: | none → 6.1-updates |
tags: | removed: critical |
tags: | added: tricky |
no longer affects: | fuel/8.0.x |
tags: | added: area-python |
tags: | added: on-verification |
Changed in fuel: | |
status: | Fix Committed → Fix Released |
There is a workaround for this bug: when the node goes offline, you should remove it (there is an option for removing offline nodes in web ui). After the offline node is removed, you are able to deploy any new changes.