Deployment fails when nodes are VMs in VMware and vSwitch has multiple links

Bug #1596054 reported by Ed Balduf
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
Medium
Fuel Sustaining

Bug Description

Detailed bug description:
 When you deploy fuel using VMs in vSphere and your vSwitches are setup with multiple Physical links, the linux bridge (br-ex) on the public network gets confused by the vSwitch sending ARPs back to it. The details of this issue can be read here https://communities.vmware.com/thread/262520?start=0&tstart=0 When this occurs the haproxy cannot get a successful ARP response from the external gateway and therefore one cannot get to Horizon and in a multi-controller deployment ARPs from the other controller nodes have a similar problem and cause the deployment to completely fail.

Steps to reproduce:
 VMware dvSwitch with 2 physical links to 2 different physical switches are required. Promiscuous, MAC address changes and Forged transmits enabled. In my case VLANs are used to setup port groups, and then each VM is assigned 5 interfaces, one in each port group, such that the Fuel deployment uses 5 /24 networks. Create a deployment node on one network to PXE boot the nodes, make 2-5 nodes (your choice) VMs. Create a deployment with appropriate settings.... 2 nodes for the Horizon problem, 5 nodes for the bigger problem (3 controllers, 2 compute). It's easier to debug and diagnose the smaller problem with 2 nodes... deploy. Attempt to access Horizon. To confirm this issue, login to the controller node and look at the ARP table from the haproxy namespace using 'ip netns exec haproxy ip neigh' and you should see you gateway listed with a failed APR entry.

Expected results:
 Successful deployment

Actual result:
 Inability to access Horizon and/or complete deployment failure, depending on the number of controllers.

Reproducibility:
 100%

Workaround:
 Setting the linux bridge to HUB mode by setting the ageing timer to zero with the command 'brctl setageing br-ex 0'. Since this is a really an issue with VMware, there is no fix in Fuel, but it sure would be nice to have the ability to set the ageing timer within Fuel, so as not to need to login at the critical time (during a multi controller deployment) and set the ageing timer to zero

Impact:
 Automation is no longer automatic... need to manually set ageing timer at precisely the correct time.

Description of the environment:
 Operation system: Fuel 8
 Versions of components: VMware vSphere 5.5.0 build 1750787, ESXi-5.5.0-20140302001-standard
 Reference architecture:
 Network model: Dell, Arista, Cisco
 Related projects installed: None

Additional information:
 If nothing else this is now documented!

Tags: area-library
Changed in fuel:
importance: Undecided → Medium
status: New → Confirmed
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
milestone: none → 10.0
tags: added: area-library
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.