machine continuously retries upgrading agent tools when disk is full
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
Logging was added for disk full situations over in LP:1782367. A unit that failed because of disk being full shows this in its machine log as expected:
2018-12-10 14:15:13 ERROR juju.worker.
However, the controller logs show this cryptic message:
2018-12-10 13:39:37 ERROR juju.apiserver tools.go:89 failed to send agent binaries: write tcp 1.2.3.4:
The failing fetch appears to keep retrying despite detecting the out of space condition on the unit itself. The controller shows many retries:
grep -c 'failed to send agent binaries: write tcp 1.2.3.4:
209145
So despite detecting out of space conditions on the unit/machine, everything keeps retrying and chews up controller resources.
Could the agent be made to not retry when it determines there isn't enough free space? Or retry in such a way that it backs off, or just fetches metadata about the new tools version, versus trying to download and then discovering it is out of space?
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → 2.5.1 |
tags: | added: logging upgrade-juju ux |
tags: | added: canonical-is |
Changed in juju: | |
milestone: | 2.5.1 → 2.5.2 |
Changed in juju: | |
milestone: | 2.5.2 → 2.5.3 |
Changed in juju: | |
milestone: | 2.5.3 → 2.5.4 |
Changed in juju: | |
milestone: | 2.5.4 → 2.5.5 |
Changed in juju: | |
milestone: | 2.5.6 → 2.5.8 |
Changed in juju: | |
milestone: | 2.5.8 → 2.5.9 |
FWIW, I believe all workers now backoff when they fail to operate. So it
will still continue to try, but it will try less frequently the more
failures it encounters.
On Tue, Jan 29, 2019 at 2:15 AM Ian Booth <email address hidden> wrote:
> ** Changed in: juju /bugs.launchpad .net/bugs/ 1807717 /bugs.launchpad .net/juju/ +bug/1807717/ +subscriptions
> Milestone: 2.5.1 => 2.5.2
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https:/
>
> Title:
> machine continuously retries upgrading agent tools when disk is full
>
> To manage notifications about this bug go to:
> https:/
>