We've discussed this further on IRC and I don't think we've converged on a consensus just yet, but I'll lay out my position here for the bug log:
A regression in an autopkgtest in the release pocket means our gate has failed. We absolutely should care about that, and analyze why it happened in order to try to prevent it in the future - with the understanding that we will never have 100% success (due to things like kernels in LTS release not gating on devel userspace, etc.)
*When* the gate fails, we should acknowledge that this is the new (slightly worse) status quo for the release pocket, and move forward. We should not penalize packages in -proposed for these unrelated regressions; we should not penalize developers who are managing transitions in -proposed by making it critical path that they resolve these regressions; we should not penalize the release team by making them manage override hints for these regressions. Time spent managing override hints is time *not* spent making Ubuntu better; and while time spent fixing the test regressions is improving the quality of Ubuntu (either now or for the future), tying the fixing of autopkgtests to proposed-migration when those test failures do not represent a regression between the release pocket and -proposed is an artificial prioritization of that work, and IMHO not in the spirit of the gate as designed.
AIUI there is a range of practices today across the release team regarding these problems. Some release team members will tend to add 'skiptest' hints when the failure rate for important packages is 'good enough' without necessarily analyzing the individual failures to understand if they are true regressions. Some will dig in and 'badtest' those packages that they confirm have regressed in the release. Some will go further and try to resolve the regressions, even if they've already landed in release. That we have this range of practices today tells me that the p-m ratchet is already not very effective at driving fixes of those in-release test regressions. I therefore think we should solve that elsewhere, and implement a policy for p-m that requires minimal release team management.
This should eliminate the need for the majority of force-badtest / force-skiptest hints by the release team, and actually allow us to be more *strict* about their use than we have been.
We've discussed this further on IRC and I don't think we've converged on a consensus just yet, but I'll lay out my position here for the bug log:
A regression in an autopkgtest in the release pocket means our gate has failed. We absolutely should care about that, and analyze why it happened in order to try to prevent it in the future - with the understanding that we will never have 100% success (due to things like kernels in LTS release not gating on devel userspace, etc.)
*When* the gate fails, we should acknowledge that this is the new (slightly worse) status quo for the release pocket, and move forward. We should not penalize packages in -proposed for these unrelated regressions; we should not penalize developers who are managing transitions in -proposed by making it critical path that they resolve these regressions; we should not penalize the release team by making them manage override hints for these regressions. Time spent managing override hints is time *not* spent making Ubuntu better; and while time spent fixing the test regressions is improving the quality of Ubuntu (either now or for the future), tying the fixing of autopkgtests to proposed-migration when those test failures do not represent a regression between the release pocket and -proposed is an artificial prioritization of that work, and IMHO not in the spirit of the gate as designed.
AIUI there is a range of practices today across the release team regarding these problems. Some release team members will tend to add 'skiptest' hints when the failure rate for important packages is 'good enough' without necessarily analyzing the individual failures to understand if they are true regressions. Some will dig in and 'badtest' those packages that they confirm have regressed in the release. Some will go further and try to resolve the regressions, even if they've already landed in release. That we have this range of practices today tells me that the p-m ratchet is already not very effective at driving fixes of those in-release test regressions. I therefore think we should solve that elsewhere, and implement a policy for p-m that requires minimal release team management.
This should eliminate the need for the majority of force-badtest / force-skiptest hints by the release team, and actually allow us to be more *strict* about their use than we have been.