commit bdb5c3b4057b8b3c1ba27c70e237e5fa71064694
Author: Dan Smith <email address hidden>
Date: Mon Jun 4 10:21:37 2018 -0700
Change consecutive build failure limit to a weigher
There is concern over the ability for compute nodes to reasonably
determine which events should count against its consecutive build
failures. Since the compute may erronenously disable itself in
response to mundane or otherwise intentional user-triggered events,
this patch adds a scheduler weigher that considers the build failure
counter and can negatively weigh hosts with recent failures. This
avoids taking computes fully out of rotation, rather treating them as
less likely to be picked for a subsequent scheduling
operation.
This introduces a new conf option to control this weight. The default
is set high to maintain the existing behavior of picking nodes that
are not experiencing high failure rates, and resetting the counter as
soon as a single successful build occurs. This is minimal visible
change from the existing behavior with default configuration.
The rationale behind the default value for this weigher comes from the
values likely to be generated by its peer weighers. The RAM and Disk
weighers will increase the score by number of available megabytes of
memory (range in thousands) and disk (range in millions). The default
value of 1000000 for the build failure weigher will cause competing
nodes with similar amounts of available disk and a small (less than ten)
number of failures to become less desirable than those without, even
with many terabytes of available disk.
Conflicts:
nova/conf/scheduler.py
nova/test.py
NOTE(danms): The conflict was due to not having changes
Icee137e15f264da59a1bdc1dc1ecfeaac82b98c6 and
I911cc51a226d6af29d63a7a2c69253de870073e9 in Queens.
NOTE(danms): Because IronicHostManager was a thing in pike, this
includes the fix applied late to queens in this commit:
d26dc0ca03e9cc9a04ac02d88ba2d2867340f5cd
Change-Id: I71c56fe770f8c3f66db97fa542fdfdf2b9865fb8
Related-Bug: #1742102
(cherry picked from commit 91e29079a0eac825c5f4fe793cf607cb1771467d)
(cherry picked from commit 43a84dbc1ebf147d43451610b76c700a31e08f4b)
Reviewed: https:/ /review. openstack. org/573248 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=bdb5c3b4057 b8b3c1ba27c70e2 37e5fa71064694
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit bdb5c3b4057b8b3 c1ba27c70e237e5 fa71064694
Author: Dan Smith <email address hidden>
Date: Mon Jun 4 10:21:37 2018 -0700
Change consecutive build failure limit to a weigher
There is concern over the ability for compute nodes to reasonably
determine which events should count against its consecutive build
failures. Since the compute may erronenously disable itself in
response to mundane or otherwise intentional user-triggered events,
this patch adds a scheduler weigher that considers the build failure
counter and can negatively weigh hosts with recent failures. This
avoids taking computes fully out of rotation, rather treating them as
less likely to be picked for a subsequent scheduling
operation.
This introduces a new conf option to control this weight. The default
is set high to maintain the existing behavior of picking nodes that
are not experiencing high failure rates, and resetting the counter as
soon as a single successful build occurs. This is minimal visible
change from the existing behavior with default configuration.
The rationale behind the default value for this weigher comes from the
values likely to be generated by its peer weighers. The RAM and Disk
weighers will increase the score by number of available megabytes of
memory (range in thousands) and disk (range in millions). The default
value of 1000000 for the build failure weigher will cause competing
nodes with similar amounts of available disk and a small (less than ten)
number of failures to become less desirable than those without, even
with many terabytes of available disk.
Conflicts: conf/scheduler. py
nova/
nova/test.py
NOTE(danms): The conflict was due to not having changes 264da59a1bdc1dc 1ecfeaac82b98c6 and 6d6af29d63a7a2c 69253de870073e9 in Queens.
Icee137e15f
I911cc51a22
NOTE(danms): Because IronicHostManager was a thing in pike, this 9cc9a04ac02d88b a2d2867340f5cd
includes the fix applied late to queens in this commit:
d26dc0ca03e
Change-Id: I71c56fe770f8c3 f66db97fa542fdf df2b9865fb8 5c5f4fe793cf607 cb1771467d) d43451610b76c70 0a31e08f4b)
Related-Bug: #1742102
(cherry picked from commit 91e29079a0eac82
(cherry picked from commit 43a84dbc1ebf147