Gunicorn logs not helpful in identifying cause of timeouts

Bug #1506025 reported by Tom Haddon
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

Per we had an issue where gunicorn/daisy units were repeatedly dying with a timeout error such as this:

2015-10-11 07:33:26 [16981] [CRITICAL] WORKER TIMEOUT (pid:31628)
2015-10-11 07:33:26 [31628] [INFO] Worker exiting (pid: 31628)
2015-10-11 07:33:26 [31715] [INFO] Booting worker with pid: 31715

We tried adding debug logging, but it still didn't yield any useful information about what it was timing out connecting to. We eventually tracked it down the Cassandra using strace, but it would have been much better if this was stated in the logs.

Revision history for this message
Brian Murray (brian-murray) wrote :

There doesn't seem to be any testing of connectivity to cassandra, and I think daisy/ should test for that before starting to accept crash reports.

Changed in daisy:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Brian Murray (brian-murray) wrote :

deej pinged me as these timeouts were happening again today. stud indicated he was seeing dropped reads on 2 cassandra nodes.

Revision history for this message
Brian Murray (brian-murray) wrote :

Given that the mojo spec for the error tracker can now setup a cassandra server, this should be easier to setup (cassandra failing) and improve the logging.

Revision history for this message
Brian Murray (brian-murray) wrote :

Is there some way I might replicate the timeout on a cassandra server? Just stopping cassandra raises a AllServersUnavailable error (logged as an OOPS) and trying to use netcat to listen on 9160 caused the same error, so I'm at a loss here.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.