Comment 20 for bug 1742102

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/573239
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=43a84dbc1ebf147d43451610b76c700a31e08f4b
Submitter: Zuul
Branch: stable/queens

commit 43a84dbc1ebf147d43451610b76c700a31e08f4b
Author: Dan Smith <email address hidden>
Date: Mon Jun 4 10:21:37 2018 -0700

    Change consecutive build failure limit to a weigher

    There is concern over the ability for compute nodes to reasonably
    determine which events should count against its consecutive build
    failures. Since the compute may erronenously disable itself in
    response to mundane or otherwise intentional user-triggered events,
    this patch adds a scheduler weigher that considers the build failure
    counter and can negatively weigh hosts with recent failures. This
    avoids taking computes fully out of rotation, rather treating them as
    less likely to be picked for a subsequent scheduling
    operation.

    This introduces a new conf option to control this weight. The default
    is set high to maintain the existing behavior of picking nodes that
    are not experiencing high failure rates, and resetting the counter as
    soon as a single successful build occurs. This is minimal visible
    change from the existing behavior with default configuration.

    The rationale behind the default value for this weigher comes from the
    values likely to be generated by its peer weighers. The RAM and Disk
    weighers will increase the score by number of available megabytes of
    memory (range in thousands) and disk (range in millions). The default
    value of 1000000 for the build failure weigher will cause competing
    nodes with similar amounts of available disk and a small (less than ten)
    number of failures to become less desirable than those without, even
    with many terabytes of available disk.

    Change-Id: I71c56fe770f8c3f66db97fa542fdfdf2b9865fb8
    Related-Bug: #1742102
    (cherry picked from commit 91e29079a0eac825c5f4fe793cf607cb1771467d)