Macvtap driver/agent migrates instances on an invalid physical network
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Won't Fix
|
Medium
|
Andreas Scheuring |
Bug Description
Scenario1 - Migration on wrong physical network - High Prio
=======
Host1 has physical_
Host2 has physical_
Now Live migration from an instance hosted on host1 (connected to physnet1) to host2 succeeds. Libvirt just migrates the whole server with its domain.xml and the macvtap is plugged on the targets side eth0.
Now the instance does not have access to its network anymore, but access to another physical network. The behavior is documented, however this needs to be fixed!
Scenario2 - Migration fails - Low Prio
=======
Host1 has physical_
Host2 has physical_
Let's assume a vlan setup. Let's assume a migration from host1 to host2. Host to does NOT have a interface eth0. Migration will fail in instance will remain active on the source, as nova plug on host2 failed to create a vlan device on eth0.
If you have a flat network - definition of he libvirt xml will fail on host2.
Two approaches are thinkable
* Solve the problem (Scenario 1+2)
* just prevent such an invalid migration (let scenario 1 fail like scenario 2 fails today)
Solve the problem
=================
#1 Solve it in Nova pre live migration
-------
This would allow migration although physical_interface mappings are different.
a) On pre live migration nova should change the binding:host to the migration target. This will trigger the portbinding and the mech driver which will update the vif_details with the right macvtap source device information. Libvirt can then adapt the migration-xml to reflect the changes.
Currently the update of the binding is done in post migration, after migration succeeded. Can we already do it in pre_live_migration and on failure undo it in rollback ?
- There's no issue for the reference implementations - See the prototype: https:/
- But there might be some mechanisms for external SDN Controllers that might shut down ports on the source host as soon as the host_id is being updated. On the other hand, if controller rely on this mechanism, they will set the port up a little too late today, as the update host_id is sent after live migration succeeded.
b) The alternative would be to allow a port to be bound to multiple hosts simultaneously. So in pre_live migration, nova would add a binding for the target host and in post_live_migration it would remove the original binding.
This would require
- simultaneous port binding. This will be achieved by https:/
- allow such a binding for compute ports as well
- Update APIs to reflect multiple port_bindings
- Create / Update / Show Port
- host_id is not reflect for DVR ports today [1]
#2 Moved to Prevent section
-------
#3 Device renaming in the macvtap agent
-------
This would allow migration although physical_interface mappings are different.
Instead of
physical_
use a
physical_
On agent startup, the agent could rename the associated device to "physnet1" (or to some other generic value) that is consistent cross all hosts!
We would need to document that this interface should not be used by any other application (that relies on the interface name)
#4 Use generic vlan device names
-------
This solves the problem only for vlan networks! For flat networks it still would exist
Today, the agent generates the vlan device names like this: for eth1 eth1.<vlan-id>. We could get rid of this pattern and use network-uuid.vlan instead. Where nework-uuid are the first 10 chars of the id.
But this would not solve the issue for flat networks. Therefore the device renaming like proposed in #3 would be required.
Prevent invalid migration
=======
#1 Let Port binding fail
-------
The idea is to detect an invalid migration in the mechanism driver and let port binding fail.
This approach has two problems
a) Portbinding happens AFTER Migration happened. In post live migration nova requests to update the binding:host-id to the target. But when doing so, the instance is already running on the target host. The binding will fail, but the migration happened, though.
--> But at least the the instance would be in error state and user is aware of that! In addition, we might drop all traffic related to this instance.
b) Detecting a migration in the mech driver is difficult. The idea is to use the PortContext.
--> In the worst case, use the profile information added with https:/
see patch https:/
#2 Let agent detect invalid migration (not working)
-------
An invalid migration could be detected in the agent to avoid the agent setting the device status to up.
But this is too late, as the agent detects the device after migration already started. There is no way to stop it again.
see patch https:/
#3 Solve it in nova post live migration
-------
The idea is, that nova starts the migration and then listens on plug_vif event that is emitted by neutron after the agent reported the device as up. Nova also waits for the portbinding to occur. If one of both runs into a timeout or fails, either the migration should be rolled back (if still possible) or the instance should be set into error state and the network locked down (which is the default for ovs - not sure about other right now).
There are some patchsets out that try to achieve something similar, but for the ovs-hybrid plug only. For others it's much more complicated, as the agent will only report the device up after it occured on the target (after migration already started) https:/
#4 Prohibit agent start with invalid mapping
-------
Do not allow different mappings at all.
How to trigger the validtion?
* Have an RPC call from the Agent to the Neutron plugin at agent start.
--> Less resource consumption, but extra rpc call
* Use the regular agent status reports.
--> Checking on every status report consumes a lot of resources (db query and potential mech_driver calls)
What to compare?
* Have a master interface mappings config option configured on the plugin/mech_driver. All agent mappings must match that mapping
--> If the server changes the master mapping, there's no way to notify the agents (or it must get implemented)
--> Config option duplication
* Query the master mapping from the server and compare at agent side, ord ignore local mapping at all if one has been configured.
* Compare against existing mappings in database. The first agent that sends his mapping via status reports defines the valid mapping.
--> We need explicit table locking (locking rows is not sufficient) to avoid races, especially for the cases where the first agents get added.
Where to do the validation/gather data for validation?
* In the mech driver
--> Most natural way, but requires a new mech_driver interface
--> Also a new plugin interface is required
* In the rpc callbacks class
--> As the validation depends on the mech_driver, we would have mech_driver specific code there. But we would get around new interfaces
* In the agent
Proposal:
* Agent has a new config option "safe_migration
* If it is set, the servers master mapping is queried by the agent on agent start via RPC.
* If it does not map the local mapping, the agents terminates
* The RPC call will be made to the plugin, which then triggers all mechanism drivers. Those have a generic method like 'get_plugin_
* If this method is not present, the code will just continue (to not break other drivers)
* The plugins returns a dict mech_driver:
* If the master mapping on the server got changed, bug agents haven't been restarted, the local mapping will not be validated against the new master mapping again (which would required agent restart)
References
==========
[1] curl -g -i -X GET http://
{"port": {"status": "ACTIVE", "binding:host_id": "", "description": "", "allowed_
description: | updated |
Changed in neutron: | |
assignee: | nobody → Andreas Scheuring (andreas-scheuring) |
Changed in neutron: | |
status: | Incomplete → In Progress |
importance: | Undecided → High |
description: | updated |
Changed in neutron: | |
milestone: | none → newton-1 |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in neutron: | |
milestone: | newton-1 → newton-2 |
Changed in neutron: | |
milestone: | newton-2 → newton-3 |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in neutron: | |
status: | In Progress → Confirmed |
Changed in neutron: | |
milestone: | newton-3 → newton-rc1 |
Changed in neutron: | |
assignee: | Andreas Scheuring (andreas-scheuring) → Gary Kotton (garyk) |
Changed in neutron: | |
assignee: | Gary Kotton (garyk) → Andreas Scheuring (andreas-scheuring) |
Changed in neutron: | |
milestone: | ocata-1 → ocata-2 |
Changed in neutron: | |
assignee: | Andreas Scheuring (andreas-scheuring) → Gary Kotton (garyk) |
Changed in neutron: | |
milestone: | ocata-2 → ocata-3 |
Changed in neutron: | |
milestone: | ocata-3 → ocata-rc1 |
Changed in neutron: | |
assignee: | Gary Kotton (garyk) → Andreas Scheuring (andreas-scheuring) |
milestone: | ocata-rc1 → pike-1 |
Changed in neutron: | |
milestone: | pike-1 → pike-2 |
Adding Incomplete till more details.