Comment 2 for bug 343683

Revision history for this message
Adam Conrad (adconrad) wrote :

In a short discussion on IRC, we came to the conclusion that this (and a whole class of bugs relating to this) could be solved with the following two actions:

1) builders shouldn't be marked NOT OK immediately upon a failed attempt to contact them, but rather we should give a 5-minute window for the machine to come back (so a short network hiccup, for instance, doesn't offline 12 buildds and kill their builds), marking them NOT OK at the end of that 5-minute grace period.

2) builders that are marked NOT OK (either manually, or at the end of the above 5-minute window) should have their active jobs reclaimed, so they can be pushed to active buildds.