Nova creates duplicate Neutron ports on instance reschedule

Bug #1609526 reported by Major Hayden
102
This bug affects 20 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Committed
Medium
Liyingjun

Bug Description

Consider this environment:

* Running stable/mitaka (latest available)
* Four hypervisors
* Two glance nodes (A and B)
* The glance nodes are storing images locally but the image files aren't in sync between both hosts

When I request a new instance, the following happens:

* Instance is scheduled to hypervisor A
* Hypervisor A checks to see if the image is available for use -- SUCCESS
* Hypervisor A calls neutron for a network port -- SUCCESS
* Hypervisor A tries to download image from glance server A -- FAILURE (glance server A doesn't have the image cached on its filesystem)
* Instance is rescheduled to hypervisor B
* Hypervisor B checks to see if the image is available for use -- SUCCESS
* Hypervisor B calls neutron for a network port -- SUCCESS
* Hypervisor B downloads an image from glance server B -- SUCCESS (glance server B has the image on its filesystem)

The instance will come up on hypervisor B with two ports attached to the instance. The second one (requested by hypervisor B) will be up and fully functional. The first port (requested by hypervisor A) will be marked as 'down' and won't be usable.

It seems like nova-compute should call neutron to say "I don't need that network port any longer since I can't get what I need to build the rest of the instance" and clean up that port. Without the cleanup, an instance can end up with a lot of ports attached and potentially waste a lot of IPv4 address space.

I wrote more details on this issue here: https://major.io/2016/08/03/openstack-instances-come-online-with-multiple-network-ports-attached/

summary: - nova doesn't clean up network ports when an image fails to download from
+ nova should clean up network ports when an image fails to download from
glance
Revision history for this message
Rui Chen (kiwik-chenrui) wrote : Re: nova should clean up network ports when an image fails to download from glance

Today I face the same issue in my devstack.

stack@szxbzci0004 ~/nova (master *) $ git log -1
commit e9d503a1202fadd5163e343424cf15285f5dc016
Merge: 5426d95 a6ad102
Author: Jenkins <email address hidden>
Date: Thu Sep 1 03:15:49 2016 +0000

    Merge "Update placement config reno"

I have two compute nodes, but one of them(A) exist RBD configure issue, so when libvirt try to launch the instance, a LibvirtError is raised, the instance is rescheduled to another compute node(B), but the linux bridge isn't cleaned up on compute node A, and the instance launch on compute node B successfully, but it allocate port again, so the instance run with two ports.

See my operation details:
http://paste.openstack.org/show/565674/

Changed in nova:
status: New → Confirmed
Changed in nova:
assignee: nobody → Zhenyu Zheng (zhengzhenyu)
Revision history for this message
cloudbuilders (operations-8) wrote :

We've came across this problem as well.
We have 4 Glance nodes, with the images mounted on an NFS volume. One of the Glance instances went down, and it failed mounting the NFS when it rebooted. We started having VMs with more than one port assigned (showing more than one IP per VM in Horizon.)

Seems to us that Nova should tell Neutron, either to delete the unused port, or update it instead of creating a new one.

Revision history for this message
Maciej Szankin (mszankin) wrote :

Zhenyu Zheng, how is the work going? It has been some time since your last activity. If you are actively working on this item can you confirm, otherwise unassign yourself?

summary: - nova should clean up network ports when an image fails to download from
- glance
+ Nova creates duplicate Neutron ports on instance reschedule
Changed in nova:
importance: Undecided → Medium
Changed in nova:
assignee: Zhenyu Zheng (zhengzhenyu) → nobody
Revision history for this message
Piyush Srivastava (piyush0101) wrote :

We have run into this issue on Mitaka as well. Its not happening for the same reason i.e glance image failing to download.

For us, one of the hypervisors did not have propert virt enabled which caused the instance launch to fail on that hypervisor and reschedule on a different one. However, the port that was created while the instance was attempting to launch on the first one was still there and not cleaned up.

Result was two ports attached to the instance and only one of them being in use.

Liyingjun (liyingjun)
Changed in nova:
assignee: nobody → Liyingjun (liyingjun)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/467509

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Jirayut Nimsaeng (winggundamth) wrote :

I'm sorry. Seems like I have problem with my touchpad so it click automatically.

information type: Public → Public Security
information type: Public Security → Private Security
information type: Private Security → Public
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version mitaka in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.mitaka
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Li Yingjun (<email address hidden>) on branch: master
Review: https://review.openstack.org/467509
Reason: confirmed, already fixed in https://review.openstack.org/#/c/393805/

Liyingjun (liyingjun)
Changed in nova:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.