Client connection handling improvements

Bug #1098409 reported by Keyur
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gearman
Fix Released
Medium
Brian Aker

Bug Description

There's 2 scenarios that this bug relates to:

1) a client adds 2 or more servers (gearmand-1 and gearmand-2 in that order). It sends a job over to gearmand-1 and all is well. gearmand-1 then dies. The client tries to insert a new job insertions and it fails with LOST_CONNECTION. The client tries again and gets a COULD_NOT_CONNECT, even though gearmand-2 is up. In fact, the job is sent to gearmand-2 but the client never reads the server's response. The server though will merrily process that job.

2) This is similar. A client adds 2 or more servers (gearmand-1 and gearmand-2 in that order). It sends a job over to gearmand-1 and all is well. gearmand-1 then dies. The client tries to insert 2 new jobs. If #1 is not fixed, both will fail. Then gearmand-1 comes back. Now when a new job is tried to be inserted to gearmand-1, the client will go into a poll() and timeout. This is even though the server actually responds correctly and has inserted the job into its queue.

The test case for both is attached.

Patch 1 fixes scenario #1. We do not return an error code from gearman_flush_all(). Instead let the errors bubble up in gearman_wait().

Patch 2 fixes scenario #2. The created_id in the connection and task get out of sync when there are errors. Instead of figuring out all the places to update to keep them in sync, I thought it simpler to reset the connection id's when it is closed.

Revision history for this message
Keyur (keyurdg) wrote :
Revision history for this message
Keyur (keyurdg) wrote :

Patch 1

Revision history for this message
Keyur (keyurdg) wrote :

Patch 2

Brian Aker (brianaker)
Changed in gearmand:
assignee: nobody → Brian Aker (brianaker)
Revision history for this message
Keyur (keyurdg) wrote :

Patches are based off of v1.0.2

Brian Aker (brianaker)
Changed in gearmand:
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Brian Aker (brianaker) wrote :

BTW sorry about the slow progress on merging this. I pulled in the test case first so that I could spend some time looking for different angles on how not testing the return could go wrong.

At this point I am comfortable enough with this, so I will go on and merge it.

Thanks for your patience.

Brian Aker (brianaker)
Changed in gearmand:
status: In Progress → Fix Committed
milestone: none → 1.0.3
Brian Aker (brianaker)
Changed in gearmand:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.