STX-Openstack: Failed to activate binding for port for live migration

Bug #2012389 reported by Lucas de Ataides Barreto
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
In Progress
Undecided
Thales Elero Cervi

Bug Description

Brief Description
-----------------
When live migrating a VM with hw_cpu_policy='dedicated', it failed with "Error: Failed to activate binding for port...". This look like it's intermittent, since I manually tried again and it worked.

Severity
--------
Minor: System/Feature is usable with minor issue

Steps to Reproduce
------------------
1. Create an image with hw_cpu_policy='dedicated'
2. Boot a VM with said image and any flavor
3. Live migrate this VM

Expected Behavior
------------------
VM is live migrated

Actual Behavior
----------------
VM failed to live migrate with "Failed to activate binding for port..." error

Reproducibility
---------------
Intermittent - Passed on retest

System Configuration
--------------------
Bare metal AIO-DX

Branch/Pull Time/Commit
-----------------------
https://mirror.starlingx.cengn.ca/mirror/starlingx/master/debian/monolithic/20230319T060000Z/

Last Pass
---------
Last tested on CentOS: https://lists.starlingx.io/pipermail/starlingx-discuss/2022-October/013415.html

Timestamp/Logs
--------------

+-------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | controller-1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | controller-1 |
| OS-EXT-SRV-ATTR:instance_name | instance-0000001b |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | 2023-03-21T07:12:08.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | tenant1-mgmt-net=192.168.99.91; tenant1-net1=172.16.1.248 |
| config_drive | |
| created | 2023-03-21T07:12:00Z |
| fault | {'code': 500, 'created': '2023-03-21T07:17:04Z', 'message': 'Failed to activate binding for port 5d7670a1-f18d-4b82-9c2a-1ec1dee6a835 and host controller-1.', 'details': 'Traceback (most recent call last):\n File "/var/lib/openstack/lib/python3.9/site-packages/nova/compute/manager.py", line 205, in decorated_function\n return function(self, context, *args, **kwargs)\n File "/var/lib/openstack/lib/python3.9/site-packages/nova/compute/manager.py", line 8578, in _post_live_migration\n self.network_api.migrate_instance_start(ctxt,\n File "/var/lib/openstack/lib/python3.9/site-packages/nova/network/neutron.py", line 2887, in migrate_instance_start\n self.activate_port_binding(context, vif[\'id\'], dest_host)\n File "/var/lib/openstack/lib/python3.9/site-packages/nova/network/neutron.py", line 1479, in activate_port_binding\n raise exception.PortBindingActivationFailed(\nnova.exception.PortBindingActivationFailed: Failed to activate binding for port 5d7670a1-f18d-4b82-9c2a-1ec1dee6a835 and host controller-1.\n'} |
| flavor | cpu_pol (51dbd000-f322-4b08-8eb2-9e25f93f9072) |
| hostId | 8df03ede866f18689a70b641f84155076b498ded0c8aeffda418f731 |
| id | c8598b68-97e4-4ac6-b82d-38c93d39a8f1 |
| image | N/A (booted from volume) |
| key_name | keypair-tenant1 |
| name | tenant1-cpu_pol_dedicated_2-3 |
| project_id | 4a5485e632da44dea38bfbaac660ac66 |
| properties | |
| security_groups | name='default' |
| | name='default' |
| status | ERROR |
| updated | 2023-03-21T07:17:04Z |
| user_id | c61a798d61784091bb9c3c6486b82812 |
| volumes_attached | id='0d2ef2e5-c02b-4e76-aba9-8f512540384d' |
+-------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

+----+--------------------------------------+--------------+--------------+----------------+--------------+---------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+----------------------------------+----------------------------------+
| Id | UUID | Source Node | Dest Node | Source Compute | Dest Compute | Dest Host | Status | Instance UUID | Old Flavor | New Flavor | Created At | Updated At | Type | Project ID | User ID |
+----+--------------------------------------+--------------+--------------+----------------+--------------+---------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+----------------------------------+----------------------------------+
| 11 | 85db3a9a-478a-42fe-8b57-e19a80e32772 | controller-0 | controller-1 | controller-0 | controller-1 | 192.168.206.3 | error | c8598b68-97e4-4ac6-b82d-38c93d39a8f1 | 37 | 37 | 2023-03-21T07:16:51.000000 | 2023-03-21T07:17:04.000000 | live-migration | fd7c2dcce263460ba13375fc415a1f27 | 418e88e8e0694dbe9e4f2e294d7d4b43 |
+----+--------------------------------------+--------------+--------------+----------------+--------------+---------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+----------------------------------+----------------------------------+

Test Activity
-------------
Regression Testing

Workaround
----------
Retry the live migration

description: updated
Changed in starlingx:
assignee: nobody → Thales Elero Cervi (tcervi)
status: New → In Progress
Revision history for this message
Thales Elero Cervi (tcervi) wrote :

As peer this LP description, the issue is Intermittent. Indeed, this VM migration error message: 'Failed to activate binding for port 5d7670a1-f18d-4b82-9c2a-1ec1dee6a835 and host controller-1.' is usually related to services intermittences on the target host (e.g. network agents, hypervisor).

I executed the case_8_test_cpu_pol_vm_actions[2-dedicated-image-volume] 20 times in a loop and got no errors. Also, this LP description states that the scenario passed when retested (executed alone).

All this to say that the migration issue seems to me like a "side-effect" of something else that left controller-1 on a bad state. I know that the Regression Suite that (executes this case_8_test_cpu_pol_vm_actions) runs after the Sanity Suite and case_8_test_cpu_pol_vm_actions seems to be the first Test Case that tries a VM migration.
Also, Sanity Suite's last Test Case is "test_lock_with_vms" and after this test it might be the case that controller-1 takes too long to reestablish its services after the unlock or (even worst) that the services are failing to be reestablished after the unlock.

I will dive deeper on it and update this LP accordingly.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.