sock.recv fails to return in some cases (infinite wait)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MySQL Connector/Python |
Triaged
|
Critical
|
Geert JM Vanderkelen |
Bug Description
Python version 3.2
connector version: .32 devel 292
MySQL version 5.0.51a-12-log
I've been debugging a very strange issue in an application that uses this connector. It manifests in only some queries. Strangely, any modification to the query (removal/addition to whitespace) will change whether its happening.
Using liberal traces, I drilled from fetchall all the way down to connection.py line 162:
On the queries this issue manifests on, fetchall starts in the cursor, several packets are successfully read from the socket, but then recv_plain goes to line 162 and self.sock.recv never returns. By default the socket has no timeout value, and is blocking.
I have been using this connector without issue up to this point, so this issue is baffling and strange. I will include an example query that caused this behavior for me. I doubt this will manifest for anyone else, as I'm guessing the data in question may affect it:
SELECT fileid, filechunkid, `sequence`, `data`, file.name, file.created, file.updated, file.hash, file.type FROM `filechunk` JOIN `file` USING (fileid) WHERE fileid IN(16945, 16946, 16947, 16948, 16949, 16950, 16951, 16952, 16953, 16954, 16955, 16956, 16957, 16958, 16959, 16960, 16961, 16962, 16963, 16964, 16965, 16966, 16967, 16968, 16969)
Modifying any whitespace or backticks will cause the issue to disappear. This suggests to me that maybe the return packets may include the original query in some way, and even barely changing the size of the return causes some offset to change?
It is worth noting that the app in question is retrieving blobs to replicate. The above query is attempting to retrieve 25 ~100kb blobs in the data column. I would guess it might be a size issue, but I've also done 500 at once and had no trouble (but again some queries will have the error). These queries run and return successfully in sqlYog in a reasonable amount of time. The instances in which it happens seem random but are deterministic (reproducable at least).
I have worked around this by adding in a timeout in my application of 60 seconds. I do not like this solution, plus, due to the nature of this application, it is not an acceptable solution. Any help, suggestions, or workaround would be much appreciated.
Maybe making the socket non-blocking, and doing a timeout wait on it using select() would be a better solution? Possibly the content of my return bytes are being misinterpreted by some layer and causing an issue in the python socket?
Related branches
Changed in myconnpy: | |
importance: | High → Critical |
Changed in myconnpy: | |
milestone: | none → 0.4 |
Changed in myconnpy: | |
status: | Confirmed → Triaged |
As an addendum, I've tested this on both windows and unix to the same effect. Additionally, the faulty query returns as expected when I run it inside a c++ mysql wrapper that I use.