I have not seen this problem with the algorithm and Juju 1.24. If you have a snippet from your logs, I'd be interested in seeing what it looks like.
I think I understand what is happening. The wait statement does two complete passes over each unit, using juju run to collect the last line of the juju log file. If the last lines have not been modified, no hooks have run on any unit and we know as best we can that the environment is stable. If not, we keep doing the check until two complete passes report no changes. If two complete passes cannot complete in 30 seconds, then whatever keeps the leadership lease alive might spam the juju logs breaking the process.
I think this is fixable by ignoring these particular log messages, which is probably just a case of adding a 'grep -v' to the juju run command.
I also have a branch that uses Juju 1.24's unit status, but I am suspicious of races. It won't help for Juju 1.23, which has leadership but not unit status.
Please support Bug #1488777, which means we can drop this hack altogether. Its been acknowledged as needed since forever, but never gets scheduled.
I have not seen this problem with the algorithm and Juju 1.24. If you have a snippet from your logs, I'd be interested in seeing what it looks like.
I think I understand what is happening. The wait statement does two complete passes over each unit, using juju run to collect the last line of the juju log file. If the last lines have not been modified, no hooks have run on any unit and we know as best we can that the environment is stable. If not, we keep doing the check until two complete passes report no changes. If two complete passes cannot complete in 30 seconds, then whatever keeps the leadership lease alive might spam the juju logs breaking the process.
I think this is fixable by ignoring these particular log messages, which is probably just a case of adding a 'grep -v' to the juju run command.
I also have a branch that uses Juju 1.24's unit status, but I am suspicious of races. It won't help for Juju 1.23, which has leadership but not unit status.
Please support Bug #1488777, which means we can drop this hack altogether. Its been acknowledged as needed since forever, but never gets scheduled.