Upgrading a sidecar charm with one workload container to one with no workload containers results in workload container still defined

Bug #1991955 reported by Tom Haddon
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
In Progress
Harry Pidcock

Bug Description

To reproduce this:

1. `juju deploy nginx-ingress-integrator --channel=edge nginx-not-upgraded`
2. Confirm revision 31 is deployed and inspect kubectl to confirm the pod has 1/1 containers.
3. `juju deploy nginx-ingress-integrator --channel=edge --revision=29 --resource placeholder-image='google/pause'`
4. Confirm revision 29 is deployed and inspect kubectl to confirm the pod has 2/2 containers.
5 `juju refresh nginx-ingress-integrator`
6. Confirm the charm is now running revision 31, but still has 2/2 containers.

Here's some output showing the problem. In this case `ingress-edge` was deployed fresh, while `nginx-ingress-integrator` was deployed with revision 29 and then upgraded.
mthaddon@finistere:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
i-test microk8s-localhost microk8s/localhost 2.9.34 unsupported 16:45:53+02:00

App Version Status Scale Charm Channel Rev Address Exposed Message
ingress-edge active 1 nginx-ingress-integrator edge 31 no
nginx-ingress-integrator active 1 nginx-ingress-integrator edge 31 no Ingress with service IP(s):

Unit Workload Agent Address Ports Message
ingress-edge/0* active idle
nginx-ingress-integrator/0* active idle Ingress with service IP(s):

mthaddon@finistere:~$ microk8s kubectl get pods -n i-test
modeloperator-85bb89747-mvwfm 1/1 Running 0 22m
nginx-ingress-integrator-0 2/2 Running 0 10m
ingress-edge-0 1/1 Running 0 3m47s

Tags: canonical-is
Tom Haddon (mthaddon)
tags: added: canonical-is
Revision history for this message
John A Meinel (jameinel) wrote :

This should get fixed in the 2.9 series. We did implement support for upgrading from a pod spec charm to a sidecar charm, it seems we need to look at how our sidecar charms themselves upgrade when the topology changes, and ensure that new versions of the charm get the new topology.

Changed in juju:
importance: Undecided → High
milestone: none → 2.9-next
status: New → Triaged
assignee: nobody → Harry Pidcock (hpidcock)
Revision history for this message
Ian Booth (wallyworld) wrote :

This is likely fixed now. Please re-open if still an issue.

Changed in juju:
status: Triaged → Incomplete
assignee: Harry Pidcock (hpidcock) → nobody
milestone: 2.9-next → none
Revision history for this message
Tom Haddon (mthaddon) wrote (last edit ):

I've just tested this on 2.9.42 and it still leaves me with a pod with two containers:
juju deploy nginx-ingress-integrator --channel=edge --revision=29 --resource placeholder-image='google/pause'
# Look at microk8s containers, two found
juju refresh nginx-ingress-integrator
# Look at microk8s containers, still 2, should be 1

Changed in juju:
status: Incomplete → Confirmed
Harry Pidcock (hpidcock)
Changed in juju:
assignee: nobody → Harry Pidcock (hpidcock)
milestone: none → 2.9.44
Harry Pidcock (hpidcock)
Changed in juju:
status: Confirmed → In Progress
Revision history for this message
Harry Pidcock (hpidcock) wrote :

This issue is due to how Juju patches the StatefulSet. I'm exploring options that we can maintain the ability for other entities (such as charms) to alter the StatefulSet and for those changes, where possible to persist. But it may be that we must overwrite the StatefulSet, which will require entities that wish to mutate the StatefulSet, to reapply their changes, which is not ideal.

Changed in juju:
milestone: 2.9.44 → 2.9.45
Revision history for this message
Tom Haddon (mthaddon) wrote :

We've just been bitten by this again as part of an indico upgrade on staging. Revision 126 of the charm specifies some resources that no longer exist in revision 133. If you try to deploy revision 126 now it will completely fail with something like https://paste.ubuntu.com/p/t7PjNqmgJX/.

In our staging deployment the charm upgrade is unable to happen because the statefulset defining containers that no longer exist and juju is trying to pull secrets for those images and the secrets no longer exist. We therefore have one pod stuck in "Terminating" status. Killing the pod doesn't help, it just respawns and we see the same `failed to introduce pod indico-1: already exists...` message in the logs for the charm-init container.

We're not sure if we can just edit the statefulset manually to allow the deployment to proceed.

Revision history for this message
Ian Booth (wallyworld) wrote :

I ran a test and confirmed that editing the statefulset pod template to remove the (I think 6) prometheus containers that are no longer in the new charm, plus the relevant (prometheus container) image pull secrets seems to do the trick.

My steps were to deploy stable on focal (rev 126) and then upgrade to edge (rev 134). juju status plus logs contained various errors, eg status showed

Unit Workload Agent Address Ports Message
indico/0* error idle crash loop backoff: back-off 5m0s restarting failed container=indico pod=indico-0_m(fc8e8b2c-a1e9-4409-a7c7-589dd7fe9...

Editing the statefulset as above

Unit Workload Agent Address Ports Message
indico/0* waiting idle Waiting for redis-broker availability

Changed in juju:
milestone: 2.9.45 → 2.9.46
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.