i/o timeout errors can cause non-atomic service deploys
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Nate Finch | ||
juju-core |
Fix Released
|
High
|
Nate Finch | ||
1.24 |
Fix Released
|
Critical
|
Nate Finch | ||
1.25 |
Fix Released
|
High
|
Nate Finch |
Bug Description
We've recently started seeing "i/o timeout errors" when issuing serviceDeploy API calls from Landscape:
EDIT
##########
To clarify:
landscape ---------> |17070:state server -> mongo:37017|
The timeout is happening inside the juju state server box, between the api and mongo.
It's NOT happening between landscape and juju. It's inside juju.
##########
Aug 7 20:45:06 job-handler-1 INFO Traceback (failure with no frames): <class 'canonical.
...
Aug 7 20:45:06 job-handler-1 INFO Traceback (failure with no frames): <class 'canonical.
...
On retry, we hit the "service already exists" error, so the service has actually been deployed.
However, recently, we've also hit a case where service was only partially deployed:
Aug 17 16:22:18 job-handler-1 INFO Traceback (failure with no frames): <class 'canonical.
service "ceph-radosgw": read tcp 127.0.0.1:37017: i/o timeout
We emit a service deploy call together with the full service configuration, but no service configuration was done in this case. It seems as if service deploy and configuration are not atomic: we'd expect the call to either fail (so we don't hit service-
See bug 1482791 for reference.
Changed in juju-core: | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → 1.26.0 |
tags: | added: cisco landscape |
Changed in juju-core: | |
milestone: | 1.26.0 → none |
Changed in juju-core: | |
milestone: | none → 1.26.0 |
Changed in juju-core: | |
status: | Triaged → In Progress |
description: | updated |
Changed in juju-core: | |
status: | In Progress → Triaged |
assignee: | Nate Finch (natefinch) → nobody |
Changed in juju-core: | |
assignee: | nobody → Nate Finch (natefinch) |
status: | Triaged → In Progress |
tags: | added: kanban-cross-team |
tags: | removed: kanban-cross-team |
Changed in juju-core: | |
status: | In Progress → Fix Committed |
Changed in juju-core: | |
status: | Fix Committed → Fix Released |
affects: | juju-core → juju |
Changed in juju: | |
milestone: | 2.0-alpha1 → none |
milestone: | none → 2.0-alpha1 |
Changed in juju-core: | |
assignee: | nobody → Nate Finch (natefinch) |
importance: | Undecided → High |
status: | New → Fix Released |
It is not atomic indeed - it can fail mid-way for various reasons, like "charm not found" (which leaves the service "empty" but existing).