azure failed provisioning: conflict with a concurrent request

Bug #1973829 reported by Kevin W Monroe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Ian Booth

Bug Description

Bootstrap azure with juju 2.9.29:

-----
$ juju bootstrap azure/centralus arc-ubu --model-default 'logging-config=<root>=DEBUG'
Creating Juju controller "arc-ubu" on azure/centralus
Looking for packaged Juju agent version 2.9.29 for amd64
Located Juju agent version 2.9.29-ubuntu-amd64 at https://jujuagents.blob.core.windows.net/juju-agents/agents/agent/2.9.29/juju-2.9.29-linux-amd64.tgz
Launching controller instance(s) on azure/centralus...
 - machine-0 (arch=amd64 mem=3.5G cores=1)
Installing Juju agent on bootstrap instance
Fetching Juju Dashboard 0.8.1
Waiting for address
Attempting to connect to 40.77.5.186:22
Attempting to connect to 192.168.16.4:22
Connected to 40.77.5.186
Running machine configuration script...
Bootstrap agent now started
Contacting Juju controller at 192.168.16.4 to verify accessibility...

Bootstrap complete, controller "arc-ubu" is now available
Controller machines are in the "controller" model
Initial model "default" added
-----

Deploy something with a few units:

-----
$ juju deploy ubuntu -n 10
Located charm "ubuntu" in charm-hub, revision 19
Deploying "ubuntu" from charm-hub charm "ubuntu", revision 19 in channel stable on focal
-----

At least one machine usually fails to provision. Snippit from the debug-log (full log attached):

-----
controller-0: 16:30:16 INFO juju.worker.instancepoller machine "7" (instance ID "machine-7") instance status changed from {"allocating" "starting"} to {"provisioning error" "The request failed due to conflict with a concurrent request. To resolve it, please refer to https://aka.ms/activitylog to get more details on the conflicting requests."}
-----

When I look at the activity log in the azure portal, i see "Create Deployment" has failed on the "Create or Update Availability Set" sub-task (see attached json).

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.9-next
status: New → Triaged
importance: Undecided → High
Revision history for this message
Ian Booth (wallyworld) wrote :

Marking as critical as this seems like a regression due to the parallelisation of machine provisioning.

https://github.com/juju/juju/pull/13499

Changed in juju:
milestone: 2.9-next → 2.9.33
importance: High → Critical
John A Meinel (jameinel)
Changed in juju:
milestone: 2.9.33 → 2.9.32
Ian Booth (wallyworld)
Changed in juju:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Revision history for this message
Ian Booth (wallyworld) wrote :

It was actually this PR which introduced the problem

https://github.com/juju/juju/pull/13598

The issue turns out to be bigger than reported here. We use the fact that InstanceConfig.Controller is set to indicate that the machine is a controller, and this is used by many providers to determine things like default constraints etc. So setting this for non controller machines has affected deployments on ec2, oracle, openstack, equinix.

So TL;DR: we must not set InstanceConfig.Controller unless the machine is a controller, or we introduce a new boolean attribute to communicate that intent.

Revision history for this message
Ian Booth (wallyworld) wrote :

Partial fix, addresses the miscategorisation of all machines as a controller

https://github.com/juju/juju/pull/14119

Revision history for this message
Ian Booth (wallyworld) wrote :

This is the fix for parallel deployments
https://github.com/juju/juju/pull/14128

Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Kevin W Monroe (kwmonroe) wrote (last edit ):

Ack the issue is larger than azure and confirming juju-2.9.32-917a8f1 has fixed my initial problem. Thanks!

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.