Evacuation takes too long when destination host has a large number of NICs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Low
|
Artom Lifshitz | ||
Queens |
Fix Committed
|
Low
|
Artom Lifshitz | ||
Rocky |
Fix Committed
|
Low
|
Artom Lifshitz | ||
Stein |
Fix Committed
|
Low
|
Artom Lifshitz |
Bug Description
Description
===========
Evacuation takes a long time if the destination host a large number of network interfaces.
Steps to reproduce
==================
1. Have a host down, or force it down.
2. Evacuate instances to a host with a large number of network interfaces.
Expected result
===============
Evacuation completes in a reasonable time frame.
Actual result
=============
Evacuation takes too long.
Additional info
===============
This was initially reported against OSP10/Newton [1]. In that case, based on the included sosreports, the compute host has 1324 network interfaces, and 109 instances are being evacuated. That means in total, there's 109 * 1324 = 144316 iterations over the loop in get_machine_ips().
Changed in nova: | |
assignee: | nobody → Artom Lifshitz (notartom) |
status: | New → In Progress |
Changed in nova: | |
assignee: | Artom Lifshitz (notartom) → Eric Fried (efried) |
Changed in nova: | |
assignee: | Eric Fried (efried) → Artom Lifshitz (notartom) |
tags: | added: evac |
tags: |
added: evacuate performance removed: evac |
Changed in nova: | |
importance: | Undecided → Low |
Reviewed: https:/ /review. opendev. org/671471 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=30d8159d4ee 51a26a03de1cb13 4ea64c6c07ffb2
Committed: https:/
Submitter: Zuul
Branch: master
commit 30d8159d4ee51a2 6a03de1cb134ea6 4c6c07ffb2
Author: Artom Lifshitz <email address hidden>
Date: Fri Jul 19 11:35:24 2019 -0400
libvirt: move checking CONF.my_ip to init_host()
Migrations use the libvirt driver's get_host_ip_addr() method to host_ip_ addr() checks whether CONF.my_ip is actually assigned to machine_ ips(), which iterates over all of the host's interfaces. machine_ ips() is only used to print a warning, so this patch moves
determine the dest_host field of the migration object.
get_
one of the host's interfaces. It does so by calling
get_
If the host has many interfaces, this can take a long time, and
introduces needless delays in processing the migration.
get_
the get_machine_ips() call to a single method in init_host(). This
way, a warning is still emitted at compute service startup, and
migration progress is not needlessly slowed down.
This patch also has a chicken and egg problem with the patch on top of interfaces( ) in tests. While this machine_ ips() that will get poisoned with the subsequent patch.
it, which poisons use of netifaces.
patch fixes all the tests that break with that poison, it starts
breaking different tests because of the move of get_machine_ips() into
init_host(). Therefore, while not directly related to the bug, this
patch also preventatively mocks or stubs out any use of
get_
Closes-bug: 1837075 c28927d914e7160 9e4deea3d9f
Change-Id: I58a4038b04d5a9