Comment 12 for bug 1466020

Revision history for this message
Alex Schultz (alex-schultz) wrote :

So I think we've traced it down to the following...

execution expired means the script never terminates, so as part of the net_probe mcollective task, we call stop_frame_listeners[0] which attempts to send a SIGINT to the running processes[1]. Unfortunately if the SIGINT does not terminate the running processes, it will enter an infinite loop[2]. Going over to the SIGINT which is sent to the n et_probe code from nailgun, we can see where the INT is handled as part of the _run function in the Listener[3] class. When then trace the function execution and we see it loops through all the current listeners (forked tcpdump processes) and attempt to terminate them[4]. The issue seems that the tcpdump process is not exiting or itself is hanging. I believe this to be related to the lack of internet connectivity on the hosts being check. I found a posting over on serverfault[5], that points to DNS being an issue and as we are not skipping DNS as part of our tcpdump command line options[6], I believe this is what is causing it to hang.

[0] https://github.com/stackforge/fuel-astute/blob/master/mcagents/net_probe.rb#L188
[1] https://github.com/stackforge/fuel-astute/blob/master/mcagents/net_probe.rb#L209-L216
[2] https://github.com/stackforge/fuel-astute/blob/master/mcagents/net_probe.rb#L218-L230
[3] https://github.com/stackforge/fuel-web/blob/master/network_checker/network_checker/net_check/api.py#L467
[4] https://github.com/stackforge/fuel-web/blob/master/network_checker/network_checker/net_check/api.py#L472-L480
[5] http://serverfault.com/questions/697854/tcpdump-freezes-and-not-capturing-properly-without-internet-connection
[6] https://github.com/stackforge/fuel-web/blob/master/network_checker/network_checker/net_check/api.py#L539-L542