juju models reports an error on half-dead model
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
John A Meinel | ||
2.3 |
Fix Released
|
High
|
John A Meinel |
Bug Description
With the new patch against having 'juju models' do direct queries against the database for all models, we can once again end up in an inconsistent state and return an error to "juju models".
To test this, I set up these scripts.
First the monitoring script:
$ cat monitor.sh
#!/bin/bash
set -o pipefail
set -e
tmpfile=$(mktemp ./out.XXXXXX)
juju models --debug 2>&1 | tee $tmpfile || cat ${tmpfile} >> bad.txt
rm ${tmpfile}
Then in 2 windows to stress it, I run:
$ watch -n 0.1 ./monitor.sh
Ultimately you can then find the failures in 'bad.txt'.
I then ran these scripts in 2 other windows to stress the system:
$ for i in `seq 50`; do juju add-model ma$i --no-switch; juju status -m ma$i 2>&1 >st.tmp || cat st.tmp; juju destroy-model -y ma$i; done
$ for i in `seq 50`; do juju add-model mb$i --no-switch; juju status -m mb$i 2>&1 >st.tmp || cat st.tmp; juju destroy-model -y mb$i; done
This is creating and destroying 2 models concurrently, which makes it more likely to fail.
After doing that, I was able to see failures like this:
10:26:40 INFO juju.cmd supercommand.go:56 running juju [2.3.2 gc go1.9.2]
10:26:40 DEBUG juju.cmd supercommand.go:57 args: []string{"juju", "models", "--debug"}
10:26:40 INFO juju.juju api.go:67 connecting to API addresses: [10.67.99.11:17070]
10:26:40 DEBUG juju.api apiclient.go:844 successfully dialed "wss://
10:26:40 INFO juju.api apiclient.go:598 connection established to "wss://
10:26:40 INFO cmd listmodels.go:130 cannot list models: could not find settings/config for models: [138ee6d6-
10:26:40 DEBUG juju.api monitor.go:35 RPC connection died
ERROR cannot list models: could not find settings/config for models: [138ee6d6-
10:26:40 DEBUG cmd supercommand.go:459 error stack:
could not find settings/config for models: [138ee6d6-
github.
github.
github.
github.
github.
On the controller I can see that it is calling the new ListModelSummaries API:
machine-0: 10:27:59 DEBUG juju.apiserver <- [234E] user-admin {"request-
Changed in juju: | |
milestone: | 2.3.2 → 2.4-beta1 |
Changed in juju: | |
status: | Fix Committed → Fix Released |
Note that this failure was also seen as intermittent failures in CI like: qa.jujucharms. com/releases/ 6057/job/ model-migration -azure- arm/attempt/ 1071
http://