libvirt driver leaves interface residue after failed start
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Dan Smith | ||
Newton |
Fix Committed
|
Medium
|
Lee Yarwood |
Bug Description
When the libvirt driver fails to start a VM due to reasons other than neutron plug timeout, it leaves interfaces on the system from the vif plugging. If a subsequent delete is performed and completes successfully, these will be removed. However, in cases where connectivity is preventing a normal delete, a local delete will be performed at the api level and the interfaces will remain.
In at least one real world situation I have observed, a script was creating test instances which were failing and leaving residue. After the residue interface count reached about 6,000 on the system, VM creates started failing with "Argument list too long" as libvirt was choking on enumerating the interfaces it had left behind.
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
assignee: | nobody → Dan Smith (danms) |
Changed in nova: | |
status: | Confirmed → In Progress |
tags: | added: libvirt |
Reviewed: https:/ /review. openstack. org/408806 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=5e7f765266e 0b94807e019b645 c8be89770e7428
Committed: https:/
Submitter: Jenkins
Branch: master
commit 5e7f765266e0b94 807e019b645c8be 89770e7428
Author: Dan Smith <email address hidden>
Date: Thu Dec 8 12:25:37 2016 -0800
Cleanup after any failed libvirt spawn
When we go to spawn a libvirt domain, we catch a few types of exceptions
and perform cleanup before failing the operation. For some reason, we
don't do this universally, which means that we leave things like network
devices laying around (from plug_vifs()). If a delete comes later, it
should clean those things up. However, if a subsequent failure prevents
that, and especially if we do a local delete at the API, we'll leak those
interfaces.
As seen in at least one real-world situation, this can cause us to leak
interfaces until we have tens of thousands of them on the system, which
then causes secondary failures.
Since we run the cleanup() routine for certain failures, it certainly
seems appropriate to run it always and not leave residue until a
successful delete is performed.
Closes-Bug: #1648840 07ea0e5895c24d5 0712e7dc7b1
Change-Id: Iab5afdf1b5b8d1