This bug came up again on an internal bug at my organization, so I took another look at this again.
I noticed that we create build requests *after* the initial quota limit check in the compute/api (which I think we should, to bail early if no room), so counting build requests in addition to instances (with UUID de-duping) isn't quite enough to handle the issue. I think we also need to relocate the CONF.quota.recheck_quota logic from conductor to compute/api, after we create the build request. I think this would result in a bit nicer code to keep the quota checking all in compute/api too.
Regarding the handling of VCPU and MEMORY_MB, I had another idea about how to do that by correlating request specs with build requests and using the flavor info in the request spec to count VCPU and MEMORY_MB, but that would be inefficient since we'd have to loop over each request spec's flavor info vs an efficient SQL query count if we were to add VCPU and MEMORY_MB columns to the build_requests table.
This bug came up again on an internal bug at my organization, so I took another look at this again.
I noticed that we create build requests *after* the initial quota limit check in the compute/api (which I think we should, to bail early if no room), so counting build requests in addition to instances (with UUID de-duping) isn't quite enough to handle the issue. I think we also need to relocate the CONF.quota. recheck_ quota logic from conductor to compute/api, after we create the build request. I think this would result in a bit nicer code to keep the quota checking all in compute/api too.
Regarding the handling of VCPU and MEMORY_MB, I had another idea about how to do that by correlating request specs with build requests and using the flavor info in the request spec to count VCPU and MEMORY_MB, but that would be inefficient since we'd have to loop over each request spec's flavor info vs an efficient SQL query count if we were to add VCPU and MEMORY_MB columns to the build_requests table.