controller restart meant sidecar charm k8s workloads restarts
Bug #2036594 reported by
Tom Haddon
This bug affects 10 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Harry Pidcock | ||
3.1 |
Triaged
|
High
|
Unassigned | ||
3.2 |
Triaged
|
High
|
Unassigned |
Bug Description
We recently had a controller restart to run a mgopurge to try and address some performance issues with the controllers (juju status taking more than 2 minutes on particular models, for instance). Here's what was done (sorry, Canonical internal only): https:/
In doing so, we saw k8s models attached to this cluster get pods rescheduled. We assume this is because pebble was having problems contacted the controller during the restarts. Here's a charm log from the time of the incident: https:/
The controller and model version is juju 2.9.44.
tags: | added: canonical-is |
description: | updated |
Changed in juju: | |
status: | New → Confirmed |
Changed in juju: | |
importance: | Medium → High |
To post a comment you must log in.
I've been able to reproduce this locally. If I deploy juju 3.1.5 on microk8s and then deploy a sidecar charm into a model (in my case I've been testing with discourse-k8s) I'm able to go from the application working fine to the charm container being restarted by running `/opt/pebble stop jujud` in the api-server container of the controller-0 pod.
Here are the logs from the charm container before it's killed from the point I run `/opt/pebble stop jujud`:
2023-09- 19T15:25: 45.468Z [container-agent] 2023-09-19 15:25:45 ERROR juju.worker. dependency engine.go:695 "api-caller" manifold worker returned unexpected error: api connection broken unexpectedly 19T15:25: 45.468Z [container-agent] 2023-09-19 15:25:45 INFO juju.worker.logger logger.go:136 logger worker stopped 19T15:25: 45.468Z [container-agent] 2023-09-19 15:25:45 INFO juju.worker.uniter uniter.go:338 unit "discourse-k8s/0" shutting down: catacomb 0xc00054e000 is dying 19T15:25: 51.971Z [pebble] Check "liveness" failure 1 (threshold 3): received non-20x status code 404 19T15:25: 51.972Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 404 19T15:26: 01.972Z [pebble] Check "liveness" failure 2 (threshold 3): received non-20x status code 404 19T15:26: 01.972Z [pebble] Check "readiness" failure 2 (threshold 3): received non-20x status code 404 19T15:26: 04.589Z [container-agent] 2023-09-19 15:26:04 ERROR juju.worker. dependency engine.go:695 "api-caller" manifold worker returned unexpected error: [b7ee1c] "unit-discourse -k8s-0" cannot open api: unable to connect to API: dial tcp 10.152. 183.49: 17070: connect: connection refused 19T15:26: 11.970Z [pebble] Check "readiness" failure 3 (threshold 3): received non-20x status code 404 19T15:26: 11.970Z [pebble] Check "readiness" failure threshold 3 hit, triggering action 19T15:26: 11.970Z [pebble] Check "liveness" failure 3 (threshold 3): received non-20x status code 404 19T15:26: 11.970Z [pebble] Check "liveness" failure threshold 3 hit, triggering action 19T15:26: 21.970Z [pebble] Check "readiness" failure 4 (threshold 3): received non-20x status code 404 19T15:26: 21.970Z [pebble] Check "liveness" failure 4 (threshold 3): received non-20x status code 404 19T15:26: 25.552Z [container-agent] 2023-09-19 15:26:25 ERROR juju.worker. dependency engine.go:695 "api-caller" manifold worker returned unexpected error: [b7ee1c] "unit-discourse -k8s-0" cannot open api: unable to connect to API: dial tcp 10.152. 183.49: 17070: connect: connection refused 19T15:26: 31.970Z [pebble] Check "liveness" failure 5 (threshold 3): received non-20x status code 404 19T15:26: 31.970Z [pebble] Check "readiness" failure 5 (threshold 3): received non-20x status code 404 19T15:26: 41.970Z [pebble] Check "liveness" failure 6 (threshold 3): received non-20x status code 404 19T15:26: 41.970Z [pebble] Check "readiness" failure 6 (threshold 3): received non-20x status code 404
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-
2023-09-