Comment 1 for bug 1424959

Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: corosync restarts resources during rally tests

According to the logs from the affected env, haproxy instances at node-44, node-49 wasn't restarted so the downtime was temporary and local for node-1. This should be considered as a medium impact I believe.

The instance of haproxy at node-1 was restarted due to timeout of monitor operation:
pacemaker.log:
Feb 20 22:30:50 [23083] node-1.domain.tld lrmd: warning: child_timeout_callback: p_haproxy_monitor_20000 process (PID 12917) timed out
Feb 20 22:30:51 [23083] node-1.domain.tld lrmd: warning: operation_finished: p_haproxy_monitor_20000:12917 - timed out after 10000ms
Feb 20 22:30:51 [23086] node-1.domain.tld crmd: error: process_lrm_event: Operation p_haproxy_monitor_20000: Timed Out (node=node-1.domain.tld, call=238, timeout=10000ms)

The RC is yet unknown but might be related with critical level of IO load:
atop 2015/02/20 22:30:33 - 2015/02/20 22:30:53:
LVM | os-root | busy 95% | read 1782 | write 432390 | KiB/r 35 | KiB/w 3 | MBr/s 3.10 | MBw/s 84.45 | avio 0.04 ms |
DSK | sda | busy 95% | read 1881 | write 3948 | KiB/r 34 | KiB/w 457 | MBr/s 3.16 | MBw/s 88.25 | avio 3.23 ms