OpenStack Compute (nova)

[Ocata]resource tracker does not validate placement allocation

Series ocata
Bug #1861067

Bug #1861067 reported by Yang Youseok on 2020-01-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	Undecided	Unassigned
	Ocata	Confirmed	Low	Unassigned

Bug Description

For stable/ocata, we got serious scheduler problem makes us to upgrade to upper release. I could not find any issue report for that so leave it for whom meet this issue later.

The problem which we encounter is like this
- conductor try to schedule one compute nodes for 2 instances
- nova-compute at that time has enough resource in compute_nodes, scheduler choose the nova-compute
- resource tracker in nova-compute claim for resource to placement
- placement returns for the answer of one of the request 409, since there were several concurrent requests.
- [BUG here] resource tracker in nova-compute does not care about the return code from placement, so 'allocation' is only increased for share of the one instance.
- After that compute_nodes in scheduler was full but allocation in placement has slot to be used.
- [User meet weirdness here] since there were slot to be used in scheduler side, instance could be made in compute node which is actually full. The result is that compute node is over provisionning.
- OOM occurs. (We got tight memory, if admin has other resource policy, they would be meet different side effect)

I found it's already fixed over pike in which scheduler make allocation first and nova-compute just checks the compute_nodes. But for me, it's very hard to find root cause and need to investigate a lot for scheduler history, so I hope someone who meet this problem would be helpful.

I do not sure it should be fixed since ocata is quite old though, we can fix it up to change the function (nova/scheduler/client/report.py _allocate_for_instance()) to catch the 409 conflict similar to the function latter added (put_allocations())

Thanks.

See original description

Tags:

Yang Youseok (ileixe) on 2020-01-28

description:

updated

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2020-04-16:

I checked and on stable/ocata nova ignores the error from placement in the reported case. So I made this confirmed for ocata. The same issue is not valid for newer branches. Ocata is in extended maintenance so the official project does not focus on fixing issues there but you can still persuade your OpenStack vendor to fix the problem upstream.

Changed in nova:
status:	New → Confirmed
status:	Confirmed → Invalid
tags:	added: placement scheduler

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.