multinode rabbitmq failing upgrades
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla-ansible |
Fix Released
|
High
|
Radosław Piliszek | ||
Train |
Fix Committed
|
High
|
Radosław Piliszek | ||
Ussuri |
Fix Committed
|
High
|
Radosław Piliszek | ||
Victoria |
Fix Committed
|
High
|
Radosław Piliszek | ||
Wallaby |
Fix Committed
|
High
|
Radosław Piliszek | ||
Xena |
Fix Released
|
High
|
Radosław Piliszek |
Bug Description
Multinode rabbitmq upgrade may fail depending on the order of stops and starts.
It can be randomly wrong and cause the run to fail.
Example failure:
ara summary: (It shows stop on 'secondary1' last, yet first to start is 'secondary2')
Stopping all rabbitmq instances but the first node secondary1 kolla_docker 0:02:38 0:00:00 SKIPPED
Stopping all rabbitmq instances but the first node secondary2 kolla_docker 0:02:38 0:00:07 CHANGED
Stopping all rabbitmq instances but the first node primary kolla_docker 0:02:38 0:00:09 CHANGED
Stopping rabbitmq on the first node secondary2 kolla_docker 0:02:48 0:00:00 SKIPPED
Stopping rabbitmq on the first node primary kolla_docker 0:02:48 0:00:00 SKIPPED
Stopping rabbitmq on the first node secondary1 kolla_docker 0:02:48 0:00:17 CHANGED
Restart rabbitmq container secondary2 include_tasks 0:03:06 0:00:00 OK
Restart rabbitmq container secondary1 include_tasks 0:03:06 0:00:00 OK
Restart rabbitmq container primary include_tasks 0:03:06 0:00:00 OK
Restart rabbitmq container secondary2 kolla_docker 0:03:06 0:00:01 CHANGED
Waiting for rabbitmq to start secondary2 command 0:03:07 0:10:06 FAILED
Restart rabbitmq container secondary1 kolla_docker 0:13:14 0:00:01 CHANGED
Waiting for rabbitmq to start secondary1 command 0:13:15 0:00:05 CHANGED
Restart rabbitmq container primary kolla_docker 0:13:21 0:00:01 CHANGED
Waiting for rabbitmq to start primary command 0:13:23 0:00:07 CHANGED
docker logs for the failing rabbitmq: (It shows the order is the actual problem)
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
2021-05-
Changed in kolla-ansible: | |
status: | Triaged → In Progress |
Note that, depending on the order, in the most typical (and recommended) 3-node rabbitmq scenario one or two rabbitmqs may fail.