2015-04-20 11:50:19 |
Bogdan Dobrelya |
bug |
|
|
added bug |
2015-04-20 11:51:52 |
Bogdan Dobrelya |
description |
This issue was discovered at the scale lab, when rabbit nodes were running under load.
The issue is that stop_server_process() ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic.
This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact. |
This issue was discovered at the scale lab, when rabbit nodes were running under load.
The issue is that stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic.
This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact. |
|
2015-04-20 11:52:02 |
Bogdan Dobrelya |
nominated for series |
|
fuel/6.0.x |
|
2015-04-20 11:52:02 |
Bogdan Dobrelya |
bug task added |
|
fuel/6.0.x |
|
2015-04-20 11:52:02 |
Bogdan Dobrelya |
nominated for series |
|
fuel/5.1.x |
|
2015-04-20 11:52:02 |
Bogdan Dobrelya |
bug task added |
|
fuel/5.1.x |
|
2015-04-20 11:52:09 |
Bogdan Dobrelya |
fuel: milestone |
|
6.1 |
|
2015-04-20 11:52:15 |
Bogdan Dobrelya |
fuel: importance |
Undecided |
Critical |
|
2015-04-20 11:52:19 |
Bogdan Dobrelya |
fuel: assignee |
|
Bogdan Dobrelya (bogdando) |
|
2015-04-20 11:52:23 |
Bogdan Dobrelya |
fuel: status |
New |
In Progress |
|
2015-04-20 11:52:28 |
Bogdan Dobrelya |
fuel/5.1.x: milestone |
|
5.1.2 |
|
2015-04-20 11:52:33 |
Bogdan Dobrelya |
fuel/6.0.x: milestone |
|
6.0.1 |
|
2015-04-20 11:52:36 |
Bogdan Dobrelya |
fuel/6.0.x: assignee |
|
Bogdan Dobrelya (bogdando) |
|
2015-04-20 11:52:39 |
Bogdan Dobrelya |
fuel/6.0.x: importance |
Undecided |
Critical |
|
2015-04-20 11:52:41 |
Bogdan Dobrelya |
fuel/5.1.x: importance |
Undecided |
Critical |
|
2015-04-20 11:52:44 |
Bogdan Dobrelya |
fuel/5.1.x: assignee |
|
Bogdan Dobrelya (bogdando) |
|
2015-04-20 11:52:55 |
Bogdan Dobrelya |
fuel/5.1.x: status |
New |
Triaged |
|
2015-04-20 11:52:58 |
Bogdan Dobrelya |
fuel/6.0.x: status |
New |
Triaged |
|
2015-04-20 11:53:58 |
Bogdan Dobrelya |
description |
This issue was discovered at the scale lab, when rabbit nodes were running under load.
The issue is that stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic.
This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact. |
This issue was discovered at the scale lab, when rabbit nodes were running under load.
The issue is that stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic.
Here is an example log:
http://paste.openstack.org/show/H89Uo8ZdPlMUstlp1Tb5/
This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact. |
|
2015-04-20 12:00:09 |
Bogdan Dobrelya |
summary |
RabbitMQ OCF may hang on the stop action as it ignores the stop command exit code |
RabbitMQ OCF may hang on the stop/start actions as it ignores the stop/wait commands exit code |
|
2015-04-20 12:07:35 |
Bogdan Dobrelya |
description |
This issue was discovered at the scale lab, when rabbit nodes were running under load.
The issue is that stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic.
Here is an example log:
http://paste.openstack.org/show/H89Uo8ZdPlMUstlp1Tb5/
This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact. |
This issue was discovered at the scale lab, when rabbit nodes were running under load.
The issues are:
1) stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores the exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic.
2) try_to_start_rmq_app() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L740-L744 ignores the exit code of the "rabbitmqctl wait" command and may hang until the given resource agent's operation timeout exceeded, which brakes the "start" action logic.
Here is an example log:
broken stop: http://paste.openstack.org/show/H89Uo8ZdPlMUstlp1Tb5/
broken start: http://paste.openstack.org/show/nHFoeSn21kne22vtBHZS/
These issues may appear only when the specified timeout for commands to stop or wait have exceeded. That is a usual case under load, hence is critical by its impact. |
|
2015-04-20 15:58:45 |
Bogdan Dobrelya |
fuel/5.1.x: assignee |
Bogdan Dobrelya (bogdando) |
Fuel Library Team (fuel-library) |
|
2015-04-20 15:58:55 |
Bogdan Dobrelya |
fuel/6.0.x: assignee |
Bogdan Dobrelya (bogdando) |
Fuel Library Team (fuel-library) |
|
2015-04-20 16:31:57 |
OpenStack Infra |
fuel: status |
In Progress |
Fix Committed |
|
2015-04-21 08:24:12 |
Dina Belova |
tags |
|
scale |
|
2015-05-04 10:07:03 |
Bogdan Dobrelya |
fuel/5.1.x: status |
Triaged |
Fix Committed |
|
2015-05-04 10:07:07 |
Bogdan Dobrelya |
fuel/6.0.x: status |
Triaged |
Fix Committed |
|