juju add-machine manual on fresh EC2 instance hangs, juju 2.8.11

Bug #1931910 reported by Sheldon Ruiz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
Undecided
Vitaly Antonenko

Bug Description

I am able to consistently create a hang state when adding a new EC2 instance to a fresh controller+model on EC2 with a focal based controller instance. I am using juju 2.8.11.

For reference, the controller was bootstrapped like so:
juju bootstrap --config vpc-id-force=true --config vpc-id=some-vpc --constraints "instance-type=a1.large arch=arm64" --bootstrap-series=focal --to "subnet=some-subnet" aws/us-east-1 ec2controller --debug

I am able to SSH to both the controller and the new instance before attempting to add the instance to my model. I also noticed that the db-service on the controller was disabled:
snap-juju\x2ddb-30.mount enabled enabled
jujud-machine-0.service enabled enabled
snap.juju-db.daemon.service disabled enabled

After re-enabling the service I was able to finally add the machine without hanging, but I'm not sure if this was the actual cause.

Debug log from add-machine (192.168.99.62 is the instance IP, 192.168.99.27 is the controller):

juju add-machine -m ec2controller:ec2model ssh:ubuntu@192.168.99.62 --verbose --debug

16:54:47 INFO juju.cmd supercommand.go:54 running juju [2.8.11 0 5e99fae0eff8e18081a8f734eab4680378b08608 gc go1.14.15]
16:54:47 DEBUG juju.cmd supercommand.go:55 args: []string{"/snap/juju/16475/bin/juju", "add-machine", "-m", "ec2controller:ec2model", "ssh:ubuntu@192.168.99.62", "--verbose", "--debug"}
16:54:47 INFO juju.juju api.go:67 connecting to API addresses: [192.168.99.27:17070 252.99.96.1:17070]
16:54:47 DEBUG juju.api apiclient.go:1116 successfully dialed "wss://192.168.99.27:17070/model/a9616bc6-74bc-4e6e-8d4f-2d9f6b1ba425/api"
16:54:47 INFO juju.api apiclient.go:648 connection established to "wss://192.168.99.27:17070/model/a9616bc6-74bc-4e6e-8d4f-2d9f6b1ba425/api"
16:54:47 INFO juju.juju api.go:67 connecting to API addresses: [192.168.99.27:17070 252.99.96.1:17070]
16:54:47 DEBUG juju.api apiclient.go:1116 successfully dialed "wss://192.168.99.27:17070/model/a9616bc6-74bc-4e6e-8d4f-2d9f6b1ba425/api"
16:54:47 INFO juju.api apiclient.go:648 connection established to "wss://192.168.99.27:17070/model/a9616bc6-74bc-4e6e-8d4f-2d9f6b1ba425/api"
16:54:47 INFO juju.cmd.juju.machine add.go:244 load config
16:54:47 INFO juju.juju api.go:67 connecting to API addresses: [192.168.99.27:17070 252.99.96.1:17070]
16:54:47 DEBUG juju.api apiclient.go:1116 successfully dialed "wss://192.168.99.27:17070/model/a9616bc6-74bc-4e6e-8d4f-2d9f6b1ba425/api"
16:54:47 INFO juju.api apiclient.go:648 connection established to "wss://192.168.99.27:17070/model/a9616bc6-74bc-4e6e-8d4f-2d9f6b1ba425/api"
16:54:47 INFO juju.juju api.go:304 API endpoints changed from [252.99.96.1:17070 192.168.99.27:17070] to [192.168.99.27:17070 252.99.96.1:17070]
16:54:47 INFO cmd authkeys.go:114 Adding contents of "/home/sherui01/.local/share/juju/ssh/juju_id_rsa.pub" to authorized-keys
16:54:47 INFO cmd authkeys.go:114 Adding contents of "/home/sherui01/.ssh/id_rsa.pub" to authorized-keys
16:54:47 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:43 initialising "192.168.99.62", user "ubuntu"
16:54:47 DEBUG juju.utils.ssh ssh.go:305 using OpenSSH ssh client
16:54:49 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:54 ubuntu user is already initialised
16:54:49 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:167 Checking if 192.168.99.62 is already provisioned
16:54:49 DEBUG juju.utils.ssh ssh.go:305 using OpenSSH ssh client
16:54:50 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:102 Detecting series and characteristics on 192.168.99.62
16:54:50 DEBUG juju.utils.ssh ssh.go:305 using OpenSSH ssh client
16:54:51 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:158 series: focal, characteristics: arch=amd64 cores=48 mem=189141M
16:59:17 ERROR juju.api monitor.go:59 health ping timed out after 30s
16:59:17 ERROR juju.api monitor.go:59 health ping timed out after 30s
16:59:17 ERROR juju.api monitor.go:59 health ping timed out after 30s
Timeout, server 192.168.99.62 not responding.
17:00:28 ERROR juju.environs.manual.sshprovisioner provisioner.go:23 provisioning failed, removing machine 0: subprocess encountered error code 255

Sheldon Ruiz (sherui01)
summary: - juju add-machine manual on fresh EC2 instance, juju 2.8.11
+ juju add-machine manual on fresh EC2 instance hangs, juju 2.8.11
Revision history for this message
Sheldon Ruiz (sherui01) wrote :

I've also noticed that sometimes, if I successfully was able to create a focal controller+model, when adding a charm to the model Juju will hang here as well. No error or anything.

Revision history for this message
Vitaly Antonenko (anvial) wrote :

Hi Sheldon,

Looks like it's more than a one-year-old bug report. Is it still valid?

Are you tried to reproduce recent versions of Juju?

Changed in juju:
status: New → Incomplete
assignee: nobody → Vitaly Antonenko (anvial)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.