Gearman

Client connection handling improvements

Bug #1098409 reported by Keyur on 2013-01-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Gearman	Fix Released	Medium	Brian Aker	Gearman 1.0.3

Bug Description

There's 2 scenarios that this bug relates to:

1) a client adds 2 or more servers (gearmand-1 and gearmand-2 in that order). It sends a job over to gearmand-1 and all is well. gearmand-1 then dies. The client tries to insert a new job insertions and it fails with LOST_CONNECTION. The client tries again and gets a COULD_NOT_CONNECT, even though gearmand-2 is up. In fact, the job is sent to gearmand-2 but the client never reads the server's response. The server though will merrily process that job.

2) This is similar. A client adds 2 or more servers (gearmand-1 and gearmand-2 in that order). It sends a job over to gearmand-1 and all is well. gearmand-1 then dies. The client tries to insert 2 new jobs. If #1 is not fixed, both will fail. Then gearmand-1 comes back. Now when a new job is tried to be inserted to gearmand-1, the client will go into a poll() and timeout. This is even though the server actually responds correctly and has inserted the job into its queue.

The test case for both is attached.

Patch 1 fixes scenario #1. We do not return an error code from gearman_flush_all(). Instead let the errors bubble up in gearman_wait().

Patch 2 fixes scenario #2. The created_id in the connection and task get out of sync when there are errors. Instead of figuring out all the places to update to keep them in sync, I thought it simpler to reset the connection id's when it is closed.