ZeroMQ cast timeout ineffective
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.messaging |
Invalid
|
Undecided
|
Li Ma |
Bug Description
The ZeroMQ driver is supposed to timeout on cast() and stop attempting to send a message per the expiration of the cast timeout. However, the ZeroMQ library call to send() is relatively non-blocking (it can block, but only blocks on putting a message into the queue, it doesn't block until a message is delivered).
Because of this, socket.close() is always called immediately after doing send(). This isn't a problem because linger=-1 is set on socket close.
Because linger is set to -1 by default and is not overridden, ZeroMQ does not simply stop attempting to send messages after we close the socket and release the reference from Python. Instead, while we garbage collect on Python's side, the C side keeps the message alive.
The present behavior will allow sockets to close should they successfully send a message. Sending failures will leave a hanging file descriptor and will retry unto infinity.
The solution is not to use Eventlet's timeout, but to use the ZeroMQ linger argument correctly. This also has the positive benefit of removing some reliance on Eventlet itself.
Changed in oslo: | |
assignee: | nobody → Eric Windisch (ewindisch) |
status: | New → In Progress |
Changed in oslo: | |
status: | Incomplete → In Progress |
affects: | oslo-incubator → oslo.messaging |
Changed in oslo.messaging: | |
assignee: | Eric Windisch (ewindisch) → Li Ma (nick-ma-z) |
Changed in oslo.messaging: | |
status: | In Progress → Invalid |
You say "Someone found this in testing and I'm having them confirm it fixes their problem."
Could you describe the exact symptoms seen by the user?