I've managed to push the code towards the same code path that the logs shows. I used 17.0.2 queens as that matched the line numbers in my log with the numbers in the log attached. I used 2.7 and 3.5 nova functional env for the reproduction.
* define two compute hosts (host1, host2)
* boot an instance on host1
* make sure that host2 is less desirable for the scheduler as a migration target by consuming resource from it but keep enough resources that it can be an allocation candidate still
* make sure that allow_resize_to_same_host = True so that scheduler will also consider host1 as a migration target
* make sure that the virt driver on host1 does not have the capability to migrate to the same host
* call migrate without forcing a host
Now the following happens
* scheduler gets allocation candidates in the following order (host1, host2)
* conductor tries to migrate the instance to host1 but that fails on host1 compute.manager._prep_resize with UnableToMigrateToSelf as the virt driver has no capability [1]
* UnableToMigrateToSelf is handled in the exception block in prep_resize and calls _reschedule_resize_or_reraise [2]
* that does the reschedule and conductor now selects host2 and do the allocation successfully so _reschedule returns True
* this means that nova ends up sending a resize.error (about the failure to migrate to host1) at [3]
* and this leads to 'inspect.trace()[-1]' call in [4] that fails for the bug author. But does not fail for me. inspect.trace() should return a non empty list [5][6] if called from a exception handling context. We are in an except block as we are executing [2]. It is also proven by the fact that the sys.exc_info() return a non (None, None, None) result at [7] that is printed at [8] and visible both in the bug reporters and in my logs.
So I'm clueless what happens.
@Vladislav: Could you provide all three compute logs and the conductor log? Could you please leave a bit more context in the logs before the first ERROR line?
@Vladislav: What is your exact environment? Which version of Queens? Do you have any custom nova code modification top of the upstream Queens version?
I've managed to push the code towards the same code path that the logs shows. I used 17.0.2 queens as that matched the line numbers in my log with the numbers in the log attached. I used 2.7 and 3.5 nova functional env for the reproduction.
* define two compute hosts (host1, host2) to_same_ host = True so that scheduler will also consider host1 as a migration target
* boot an instance on host1
* make sure that host2 is less desirable for the scheduler as a migration target by consuming resource from it but keep enough resources that it can be an allocation candidate still
* make sure that allow_resize_
* make sure that the virt driver on host1 does not have the capability to migrate to the same host
* call migrate without forcing a host
Now the following happens manager. _prep_resize with UnableToMigrate ToSelf as the virt driver has no capability [1] ToSelf is handled in the exception block in prep_resize and calls _reschedule_ resize_ or_reraise [2] trace() [-1]' call in [4] that fails for the bug author. But does not fail for me. inspect.trace() should return a non empty list [5][6] if called from a exception handling context. We are in an except block as we are executing [2]. It is also proven by the fact that the sys.exc_info() return a non (None, None, None) result at [7] that is printed at [8] and visible both in the bug reporters and in my logs.
* scheduler gets allocation candidates in the following order (host1, host2)
* conductor tries to migrate the instance to host1 but that fails on host1 compute.
* UnableToMigrate
* that does the reschedule and conductor now selects host2 and do the allocation successfully so _reschedule returns True
* this means that nova ends up sending a resize.error (about the failure to migrate to host1) at [3]
* and this leads to 'inspect.
So I'm clueless what happens.
@Vladislav: Could you provide all three compute logs and the conductor log? Could you please leave a bit more context in the logs before the first ERROR line?
@Vladislav: What is your exact environment? Which version of Queens? Do you have any custom nova code modification top of the upstream Queens version?
[1] https:/ /github. com/openstack/ nova/blob/ 307382f58d38778 b480d2d030e4277 59a44c204b/ nova/compute/ manager. py#L4085 /github. com/openstack/ nova/blob/ 307382f58d38778 b480d2d030e4277 59a44c204b/ nova/compute/ manager. py#L4162 /github. com/openstack/ nova/blob/ 307382f58d38778 b480d2d030e4277 59a44c204b/ nova/compute/ manager. py#L4221 /github. com/openstack/ nova/blob/ 307382f58d38778 b480d2d030e4277 59a44c204b/ nova/notificati ons/objects/ exception. py#L42 /docs.python. org/2.7/ library/ inspect. html#inspect. trace /docs.python. org/3.5/ library/ inspect. html#inspect. trace /github. com/openstack/ nova/blob/ 307382f58d38778 b480d2d030e4277 59a44c204b/ nova/compute/ manager. py#L4159 /github. com/openstack/ nova/blob/ 307382f58d38778 b480d2d030e4277 59a44c204b/ nova/compute/ manager. py#L1313
[2] https:/
[3] https:/
[4] https:/
[5] https:/
[6] https:/
[7] https:/
[8] https:/