Application status changes right after juju wait-for timeout
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Unassigned |
Bug Description
In testrun https:/
```
2022-10-08-08:43:07 root ERROR [localhost] Command failed: juju wait-for unit -m foundations-
2022-10-08-08:43:07 root ERROR [localhost] STDOUT follows:
properties:
workload-message: configuring Nagios checks
workload-status: maintenance
```
The weird thing is that if we look at the status, vault/2 is actually in the expected state:
```
vault/2 blocked idle 8 10.246.166.160 8200/tcp Vault needs to be initialized
canonical-
filebeat/1 active idle 10.246.166.160 Filebeat ready.
hacluster-vault/1 active idle 10.246.166.160 Unit is ready and clustered
landscape-
nrpe/1 active idle 10.246.166.160 icmp,5666/tcp Ready
ntp/1 active idle 10.246.166.160 123/udp chrony: Ready, OK: offset is 0.000003
prometheus-
telegraf/1 active idle 10.246.166.160 9103/tcp Monitoring vault/2 (source version/commit 76901fd)
```
Looking at the status log in the crashdump, vault changes state 5 ms before the timeout error is thrown:
```
08 Oct 2022 07:20:29Z juju-unit executing running certificates-
08 Oct 2022 07:20:36Z juju-unit executing running certificates-
08 Oct 2022 07:20:45Z juju-unit executing running certificates-
08 Oct 2022 07:20:52Z juju-unit executing running etcd-relation-
08 Oct 2022 07:21:01Z juju-unit executing running certificates-
08 Oct 2022 07:21:11Z juju-unit executing running shared-
08 Oct 2022 07:21:20Z juju-unit executing running certificates-
08 Oct 2022 07:21:24Z workload waiting 'shared-db' incomplete
08 Oct 2022 07:21:27Z juju-unit idle
08 Oct 2022 07:21:31Z juju-unit executing running shared-
08 Oct 2022 07:21:36Z workload maintenance configuring Nagios checks
08 Oct 2022 07:22:06Z juju-unit executing running shared-
08 Oct 2022 07:22:14Z juju-unit idle
08 Oct 2022 07:23:22Z juju-unit executing running shared-
08 Oct 2022 07:23:29Z juju-unit idle
08 Oct 2022 07:26:46Z juju-unit executing running certificates-
08 Oct 2022 07:26:52Z juju-unit idle
08 Oct 2022 07:29:48Z juju-unit executing running certificates-
08 Oct 2022 07:29:54Z juju-unit idle
08 Oct 2022 08:43:02Z workload blocked Vault needs to be initialized
```
This could be a weird coincidence, except that something very similar happened in testrun https:/
Crashdumps and configs can be found here:
https:/
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → Medium |
importance: | Medium → High |
tags: | added: solutions-qa-expired |
Changed in juju: | |
status: | Triaged → Invalid |
Changed in juju: | |
status: | Invalid → New |
tags: |
added: cdo-qa removed: solutions-qa-expired |
Changed in juju: | |
milestone: | none → 2.9-next |
status: | New → Triaged |
Changed in juju: | |
milestone: | 2.9-next → none |
Looking at strategy.go, it doesn't look right to fire off a new Goroutine in the run method, that uses time.After.
This should be initiated outside, and the channel passed into the method.
Looks like there are places for racy behaviour to live here.