After a DB restart this morning we were seeing the following pattern in the logs:
2009-10-14 09:00:40+0100 [-] Starting scanning cycle.
2009-10-14 09:00:40+0100 [-] Slave Scan Process Initiated.
2009-10-14 09:00:40+0100 [-] Buildd Master has been initialised
2009-10-14 09:00:40+0100 [-] Setting Builders.
2009-10-14 09:00:40+0100 [-] Slave Scan Process Initiated.
2009-10-14 09:00:40+0100 [-] Buildd Master has been initialised
2009-10-14 09:00:40+0100 [-] Setting Builders.
2009-10-14 09:00:40+0100 [-] Slave Scan Process Initiated.
2009-10-14 09:00:40+0100 [-] Buildd Master has been initialised
2009-10-14 09:00:40+0100 [-] Setting Builders.
2009-10-14 09:00:40+0100 [-] Scanning failed with: Already disconnected
2009-10-14 09:00:40+0100 [-] Finishing scanning cycle.
2009-10-14 09:00:40+0100 [-] Scanning cycle finished.
However, the process was responding to nagios checks fine. As a result, we were only able to tell something was wrong based on user feedback.
The internal log reporting for scan failures is currently very obtuse. It could do with adding a stack trace to the error shown. This is very easy by doing something like this:
=== modified file 'lib/lp/ buildmaster/ manager. py' buildmaster/ manager. py 2009-07-26 14:19:49 +0000 buildmaster/ manager. py 2009-12-14 20:46:44 +0000
self. logger. info(
' Scanning failed with: %s' % error.getErrorM essage( )) eback()
self. finishCycle( )
--- lib/lp/
+++ lib/lp/
@@ -238,6 +238,7 @@
"""Deal with scanning failures."""
+ error.printTrac