OpenStack Compute (nova)

Series ocata
Bug #1707071
Activity log

Activity log for bug #1707071

Date	Who	What changed	Old value	New value	Message
2017-07-27 21:35:09	Dan Smith	bug			added bug
2017-07-27 22:31:44	Matt Riedemann	tags		compute placement resource-tracker
2017-07-27 22:31:50	Matt Riedemann	nominated for series		nova/ocata
2017-07-27 22:31:50	Matt Riedemann	bug task added		nova/ocata
2017-07-27 22:31:55	Matt Riedemann	nova: status	New	Confirmed
2017-07-27 22:32:15	Matt Riedemann	nova: importance	Undecided	Medium
2017-07-27 22:33:23	Matt Riedemann	description	As far back as Ocata, compute nodes that manage allocations will end up overwriting allocations from other compute nodes when doing a migration. This stems from the fact that the Resource Tracker was designed to manage a per-compute-node set of accounting, but placement is per-instance accounting. When we try to create/update/delete allocations for instances on compute nodes from the existing resource tracker code paths, we end up deleting allocations that apply to other compute nodes in the process. For example, when an instance A is running against compute1, there is an allocation for its resources against that node. When migrating that instance to compute2, the target compute (or scheduler) may create allocations for instance A against compute2, which overwrite those for compute1. Then, compute1's periodic healing task runs, and deletes the allocation for instance A against compute2, replacing it with one for compute1. When migration completes, compute2 heals again and overwrites the allocation with one for the new home of the instance. Then, compute1 may the allocation it thinks it owns, followed finally by another heal on compute2. While this is going on, the scheduler (via placement) does not have a consistent view of resources to make proper decisions. In order to fix this, we need a combination of changes: 1. There should be allocations against both compute nodes for an instance during a migration 2. Compute nodes should respect the double claim, and not delete allocations for instances it used to own, if the allocation has no resources for its resource provider 3. Compute nodes should not delete allocations for instances unless they own the instance _and_ the instance is in DELETED/SHELVED_OFFLOADED state	As far back as Ocata, compute nodes that manage allocations will end up overwriting allocations from other compute nodes when doing a migration. This stems from the fact that the Resource Tracker was designed to manage a per-compute-node set of accounting, but placement is per-instance accounting. When we try to create/update/delete allocations for instances on compute nodes from the existing resource tracker code paths, we end up deleting allocations that apply to other compute nodes in the process. For example, when an instance A is running against compute1, there is an allocation for its resources against that node. When migrating that instance to compute2, the target compute (or scheduler) may create allocations for instance A against compute2, which overwrite those for compute1. Then, compute1's periodic healing task runs, and deletes the allocation for instance A against compute2, replacing it with one for compute1. When migration completes, compute2 heals again and overwrites the allocation with one for the new home of the instance. Then, compute1 may delete the allocation it thinks it owns, followed finally by another heal on compute2. While this is going on, the scheduler (via placement) does not have a consistent view of resources to make proper decisions. In order to fix this, we need a combination of changes: 1. There should be allocations against both compute nodes for an instance during a migration 2. Compute nodes should respect the double claim, and not delete allocations for instances it used to own, if the allocation has no resources for its resource provider 3. Compute nodes should not delete allocations for instances unless they own the instance _and_ the instance is in DELETED/SHELVED_OFFLOADED state
2017-07-27 22:34:00	Matt Riedemann	nova/ocata: status	New	Confirmed
2017-07-27 22:34:02	Matt Riedemann	nova/ocata: importance	Undecided	Medium
2017-07-28 13:45:06	Chris Dent	bug			added subscriber Chris Dent
2017-07-28 15:38:10	Jay Pipes	nova: assignee		Jay Pipes (jaypipes)
2017-07-28 16:10:13	OpenStack Infra	nova: status	Confirmed	In Progress
2017-08-01 21:34:40	Matt Riedemann	tags	compute placement resource-tracker	compute pike-rc-potential placement resource-tracker
2017-08-02 19:39:12	OpenStack Infra	nova: assignee	Jay Pipes (jaypipes)	Dan Smith (danms)
2017-08-02 20:31:35	OpenStack Infra	nova: assignee	Dan Smith (danms)	Matt Riedemann (mriedem)
2017-08-03 14:12:25	OpenStack Infra	nova: assignee	Matt Riedemann (mriedem)	Chris Dent (cdent)
2017-08-03 15:21:56	OpenStack Infra	nova: assignee	Chris Dent (cdent)	Dan Smith (danms)
2017-08-03 16:56:39	OpenStack Infra	nova: assignee	Dan Smith (danms)	Jay Pipes (jaypipes)
2017-08-03 19:32:58	OpenStack Infra	nova: assignee	Jay Pipes (jaypipes)	Dan Smith (danms)
2017-08-04 16:11:17	OpenStack Infra	nova: assignee	Dan Smith (danms)	Jay Pipes (jaypipes)
2017-08-04 18:19:04	OpenStack Infra	nova: assignee	Jay Pipes (jaypipes)	Dan Smith (danms)
2017-08-08 16:21:07	OpenStack Infra	nova: assignee	Dan Smith (danms)	Jay Pipes (jaypipes)
2017-08-11 02:29:54	OpenStack Infra	nova: status	In Progress	Fix Released
2017-08-11 12:44:11	Matt Riedemann	tags	compute pike-rc-potential placement resource-tracker	compute placement resource-tracker