This might explain what's happening during a cold migration.
Conductor creates a legacy filter_properties dict here:
https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/migrate.py#L172
If the spec has an instance_group it will call here:
https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L397
and _to_legacy_group_info sets these values in the filter_properties dict:
return {'group_updated': True, 'group_hosts': set(self.instance_group.hosts), 'group_policies': set([self.instance_group.policy]), 'group_members': set(self.instance_group.members)}
Note there is no group_uuid.
Those filter_properties are passed to the prep_resize method on the dest compute:
https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/migrate.py#L304
zigo said he hit this:
https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4272
(10:03:07 AM) zigo: 2019-05-28 15:02:35.534 30706 ERROR nova.compute.manager [instance: ae6f8afe-9c64-4aaf-90e8-be8175fee8e4] nova.exception.UnableToMigrateToSelf: Unable to migrate instance (ae6f8afe-9c64-4aaf-90e8-be8175fee8e4) to current host (clint1-compute-5.infomaniak.ch).
which will trigger a reschedule here:
https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4348
The _reschedule_resize_or_reraise method will setup the parameters for the resize_instance compute task RPC API (conductor) method:
https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4378-L4379
Note that in Rocky the RequestSpec is not passed back to conductor on the reschedule, only the filter_properties:
https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L1452
We only started passing the RequestSpec from compute to conductor on reschedule starting in Stein: https://review.opendev.org/#/c/582417/
Without the request spec we get here in conductor:
https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L307
Note that was pass in the filter_properties but no instance_group to RequestSpec.from_components.
And because there is no instance_group but there are filter_properties, we call _populate_group_info here:
https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L442
Which means we get into this block that sets the RequestSpec.instance_group with no uuid:
https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L228
Then we eventually RPC cast off to prep_resize on the next host to try for the cold migration and save the request_spec changes here:
https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L356
Which is how later attempts to use that request spec to migrate the instance blow up when loading it from the DB because spec.instance_group.uuid is not set.
This might explain what's happening during a cold migration.
Conductor creates a legacy filter_properties dict here:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ conductor/ tasks/migrate. py#L172
If the spec has an instance_group it will call here:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ objects/ request_ spec.py# L397
and _to_legacy_ group_info sets these values in the filter_properties dict:
return {'group_updated': True,
'group_ hosts': set(self. instance_ group.hosts) ,
'group_ policies' : set([self. instance_ group.policy] ),
'group_ members' : set(self. instance_ group.members) }
Note there is no group_uuid.
Those filter_properties are passed to the prep_resize method on the dest compute:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ conductor/ tasks/migrate. py#L304
zigo said he hit this:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ compute/ manager. py#L4272
(10:03:07 AM) zigo: 2019-05-28 15:02:35.534 30706 ERROR nova.compute. manager [instance: ae6f8afe- 9c64-4aaf- 90e8-be8175fee8 e4] nova.exception. UnableToMigrate ToSelf: Unable to migrate instance (ae6f8afe- 9c64-4aaf- 90e8-be8175fee8 e4) to current host (clint1- compute- 5.infomaniak. ch).
which will trigger a reschedule here:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ compute/ manager. py#L4348
The _reschedule_ resize_ or_reraise method will setup the parameters for the resize_instance compute task RPC API (conductor) method:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ compute/ manager. py#L4378- L4379
Note that in Rocky the RequestSpec is not passed back to conductor on the reschedule, only the filter_properties:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ compute/ manager. py#L1452
We only started passing the RequestSpec from compute to conductor on reschedule starting in Stein: https:/ /review. opendev. org/#/c/ 582417/
Without the request spec we get here in conductor:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ conductor/ manager. py#L307
Note that was pass in the filter_properties but no instance_group to RequestSpec. from_components .
And because there is no instance_group but there are filter_properties, we call _populate_ group_info here:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ objects/ request_ spec.py# L442
Which means we get into this block that sets the RequestSpec. instance_ group with no uuid:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ objects/ request_ spec.py# L228
Then we eventually RPC cast off to prep_resize on the next host to try for the cold migration and save the request_spec changes here:
https:/ /github. com/openstack/ nova/blob/ stable/ rocky/nova/ conductor/ manager. py#L356
Which is how later attempts to use that request spec to migrate the instance blow up when loading it from the DB because spec.instance_ group.uuid is not set.