_wait_all_computers gives up too soon
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Autopilot Log Analyser |
Fix Committed
|
Undecided
|
Francis Ginther | ||
Landscape Server |
Fix Released
|
Medium
|
Chad Smith |
Bug Description
This was seen in CI: https:/
Landscape server: 17.01~bzr10865+
This is very similar to lp:1660000 in which a slowly progress rabbitmq-server charm prevents landscape-client from making forward progress. But in this case, rabbitmqctl was not impacted by the known bug and the charm was progressing as fast as possible, it just has a lot to do.
Charm hooks are executing in serial, so as long as rabbitmq-server is busy processing hooks, the landscape-client suboridinate hooks don't get a chance to run and it doesn't make it far enough to register the client until it's too late.
Here's what happened:
09:48:30 first retry of _wait_all_computers (job-handler.log)
09:49:28 start amqp-relation-
10:39:36 amqp-relation-
10:39:39 landscape-
10:40:00 another amqp-relation-
10:48:02 Last retry of _wait_all_computers (job-handler.log)
10:49:02 amqp-relation-
10:49:05 landscape-client/21 runs container-
10:49:06 landscape-client/21 registers computer (unit-landscape
The last retry of _wait_all_computers occurred just over a minute before the client was actually registered.
Logs from the rabbitmq-server/0 unit and the job-handler are attached.
information type: | Proprietary → Public |
Changed in autopilot-log-analyser: | |
assignee: | nobody → Francis Ginther (fginther) |
status: | New → In Progress |
Changed in autopilot-log-analyser: | |
status: | In Progress → Fix Committed |
Changed in landscape: | |
milestone: | 17.01 → 17.02 |
Changed in landscape: | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Chad Smith (chad.smith) |
Changed in landscape: | |
status: | In Progress → Fix Committed |
Changed in landscape: | |
status: | Fix Committed → Fix Released |
It waits an hour, how long are you suggesting it waits for?