Comment 4 for bug 590623

Revision history for this message
Sasha Pachev (sasha-pachev) wrote :

An easy way to simulate the problem. On the database host firewall the outgoing TCP PUSH packets on the agent port like this:

iptables -A OUTPUT -p tcp --sport 9989 --tcp-flags PSH PSH -j DROP

This will allow the connect to succeed, but when the agent tries to reply with data, it will block. Network-wise the behavior is the same as if the the agent/databse host ran out of memory and is not able to proceed beyond acknowledging a TCP handshake.

The monitor does have a timeout on the connect operation, but not on network I/O.

Attached is the proof-of-concept patch that fixes this problem for a likely practical scenario - a host running MySQL and the agent ran out of memory or is otherwise overloaded, and entered a state where the connection to the agent succeeds, but the actual data response is not happening. It enables timeouts on reads. This fix needs to be improved:

- take care of perpetually blocking write operation
- apply the fix to other network I/O in the code

This would require some refactoring of the code to put network I/O in wrappers.