[2.8/stable] caas-unit-init hangs forever while initializing the workload pod

Bug #1887228 reported by Vladimir Grevtsev
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

tl;dr Trying to deploy the Operator framework charm on top of the microk8s with Juju 2.8, but the init container never reaches out to the "Ready" state, leaving the workload pod in the "Init:0/1" status forever. The same steps BUT using Juju 2.7 are working good, so I think it's a Juju issue.

Steps to reproduce:

1. create an aws vm, t2.xlarge, AMI ID ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20200611 (ami-0d359437d1756caa8) (Bionic)

ubuntu@ip-172-31-40-244:~$ sudo snap install microk8s --classic
microk8s v1.18.4 from Canonical✓ installed
ubuntu@ip-172-31-40-244:~$ sudo snap install juju --classic
juju 2.8.0 from Canonical✓ installed
ubuntu@ip-172-31-40-244:~$ sudo snap list
Name Version Rev Tracking Publisher Notes
amazon-ssm-agent 2.3.714.0 1566 latest/stable/… aws✓ classic
core 16-2.45 9289 latest/stable canonical✓ core
core18 20200427 1754 latest/stable canonical✓ base
juju 2.8.0 12370 latest/stable canonical✓ classic
microk8s v1.18.4 1503 latest/stable canonical✓ classic
ubuntu@ip-172-31-40-244:~$

ubuntu@ip-172-31-40-244:~$ git clone https://github.com/charmed-lma/charm-k8s-prometheus
Cloning into 'charm-k8s-prometheus'...
remote: Enumerating objects: 60, done.
remote: Counting objects: 100% (60/60), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 795 (delta 25), reused 36 (delta 14), pack-reused 735
Receiving objects: 100% (795/795), 181.29 KiB | 914.00 KiB/s, done.
Resolving deltas: 100% (470/470), done.
ubuntu@ip-172-31-40-244:~$ cd charm-k8s-prometheus/
ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$ git submodule update --init --recursive
Submodule 'mod/jinja' (https://github.com/pallets/jinja) registered for path 'mod/jinja'
Submodule 'mod/operator' (https://github.com/canonical/operator) registered for path 'mod/operator'
Cloning into '/home/ubuntu/charm-k8s-prometheus/mod/jinja'...
Cloning into '/home/ubuntu/charm-k8s-prometheus/mod/operator'...
Submodule path 'mod/jinja': checked out 'b5f454559fd7143c0fc5342141a1c569d8ffcf7a'
Submodule path 'mod/operator': checked out 'beca3da58af1485fc1143839ef2caed9fb4afbdb'
ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$ sudo microk8s.enable dns dashboard registry storage metrics-server ingress
Enabling DNS
Applying manifest
serviceaccount/coredns created
configmap/coredns created
deployment.apps/coredns created
service/kube-dns created
clusterrole.rbac.authorization.k8s.io/coredns created
clusterrolebinding.rbac.authorization.k8s.io/coredns created
Restarting kubelet
DNS is enabled
Enabling Kubernetes Dashboard
Enabling Metrics-Server
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/microk8s-admin created
Metrics-Server is enabled
Applying manifest
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created

If RBAC is not enabled access the dashboard using the default token retrieved with:

token=$(microk8s kubectl -n kube-system get secret | grep default-token | cut -d " " -f1)
microk8s kubectl -n kube-system describe secret $token

In an RBAC enabled setup (microk8s enable RBAC) you need to create a user with restricted
permissions as shown in:
https://github.com/kubernetes/dashboard/blob/master/docs/user/access-control/creating-sample-user.md

The registry will be created with the default size of 20Gi.
You can use the "size" argument while enabling the registry, eg microk8s.enable registry:size=30Gi
Enabling default storage class
deployment.apps/hostpath-provisioner created
storageclass.storage.k8s.io/microk8s-hostpath created
serviceaccount/microk8s-hostpath created
clusterrole.rbac.authorization.k8s.io/microk8s-hostpath created
clusterrolebinding.rbac.authorization.k8s.io/microk8s-hostpath created
Storage will be available soon
Applying registry manifest
namespace/container-registry created
persistentvolumeclaim/registry-claim created
deployment.apps/registry created
service/registry created
The registry is enabled
Enabling default storage class
deployment.apps/hostpath-provisioner unchanged
storageclass.storage.k8s.io/microk8s-hostpath unchanged
serviceaccount/microk8s-hostpath unchanged
clusterrole.rbac.authorization.k8s.io/microk8s-hostpath unchanged
clusterrolebinding.rbac.authorization.k8s.io/microk8s-hostpath unchanged
Storage will be available soon
Addon metrics-server is already enabled.
Enabling Ingress
namespace/ingress created
serviceaccount/nginx-ingress-microk8s-serviceaccount created
clusterrole.rbac.authorization.k8s.io/nginx-ingress-microk8s-clusterrole created
role.rbac.authorization.k8s.io/nginx-ingress-microk8s-role created
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-microk8s created
rolebinding.rbac.authorization.k8s.io/nginx-ingress-microk8s created
configmap/nginx-load-balancer-microk8s-conf created
configmap/nginx-ingress-tcp-microk8s-conf created
configmap/nginx-ingress-udp-microk8s-conf created
daemonset.apps/nginx-ingress-microk8s-controller created
Ingress is enabled
ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$ sudo usermod -a -G microk8s $(whoami)
ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$ sudo chown -f -R $USER ~/.kube
ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$ logout
Connection to 18.184.45.5 closed.

<logged back>

ubuntu@ip-172-31-40-244:~$ juju bootstrap microk8s mk8s
Since Juju 2 is being run for the first time, downloaded the latest public cloud information.
Creating Juju controller "mk8s" on microk8s/localhost
Creating k8s resources for controller "controller-mk8s"
Starting controller pod
Bootstrap agent now started
Contacting Juju controller at 10.152.183.61 to verify accessibility...

Bootstrap complete, controller "mk8s" is now available in namespace "controller-mk8s"

Now you can run
 juju add-model <model-name>
to create a new model to deploy k8s workloads.
ubuntu@ip-172-31-40-244:~$

ubuntu@ip-172-31-40-244:~$ juju create-storage-pool operator-storage kubernetes storage-class=microk8s-hostpath
ubuntu@ip-172-31-40-244:~$ juju add-model lma
Added 'lma' model on microk8s/localhost with credential 'microk8s' for user 'admin'
ubuntu@ip-172-31-40-244:~$ cd charm-k8s-prometheus/
ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$ juju deploy . --resource prometheus-image=prom/prometheus:v2.18.1 --resource nginx-image=nginx:1.19.0
Deploying charm "local:kubernetes/prometheus-0".
ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$

############# Wait for some time to let it settle #########

ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$ microk8s.kubectl get all -n lma
NAME READY STATUS RESTARTS AGE
pod/modeloperator-65b967f45d-78vhr 1/1 Running 0 2m10s
pod/prometheus-0 0/2 Init:0/1 0 110s
pod/prometheus-operator-0 1/1 Running 0 119s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/modeloperator ClusterIP 10.152.183.64 <none> 17071/TCP 2m10s
service/prometheus ClusterIP 10.152.183.105 <none> 80/TCP,443/TCP 111s
service/prometheus-endpoints ClusterIP None <none> <none> 111s
service/prometheus-operator ClusterIP 10.152.183.52 <none> 30666/TCP 119s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/modeloperator 1/1 1 1 2m10s

NAME DESIRED CURRENT READY AGE
replicaset.apps/modeloperator-65b967f45d 1 1 1 2m10s

NAME READY AGE
statefulset.apps/prometheus 0/1 110s
statefulset.apps/prometheus-operator 1/1 119s

ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$ microk8s.kubectl describe pod/prometheus-0 -n lma
Name: prometheus-0
Namespace: lma
Priority: 0
Node: ip-172-31-40-244/172.31.40.244
Start Time: Sat, 11 Jul 2020 12:50:15 +0000
Labels: controller-revision-hash=prometheus-d68d6b4c7
              juju-app=prometheus
              statefulset.kubernetes.io/pod-name=prometheus-0
Annotations: apparmor.security.beta.kubernetes.io/pod: runtime/default
              juju.io/controller: cddc39a7-4a74-4c27-8ada-b14c0b0987a6
              juju.io/model: 23c5a8d7-a3b6-4e23-8a50-550c2f3ced59
              juju.io/unit: prometheus/0
              seccomp.security.beta.kubernetes.io/pod: docker/default
Status: Pending
IP: 10.1.11.12
IPs:
  IP: 10.1.11.12
Controlled By: StatefulSet/prometheus
Init Containers:
  juju-pod-init:
    Container ID: containerd://60314274727d6c2652764db63c413d2cb7329d1afd7b60d415762f8bea2d36f7
    Image: jujusolutions/jujud-operator:2.8.0
    Image ID: sha256:d7a84ce8e5ca840e5c38b399f5ad706674c29a7da4b31ee4830640db64acc870
    Port: <none>
    Host Port: <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools

      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud
      initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
      if test -n "$initCmd"; then
      $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
      else
      exit 0
      fi

    State: Running
      Started: Sat, 11 Jul 2020 12:50:15 +0000
    Ready: False
    Restart Count: 0
    Environment: <none>
    Mounts:
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lddqz (ro)
Containers:
  prometheus:
    Container ID:
    Image: prom/prometheus:v2.18.1
    Image ID:
    Port: <none>
    Host Port: <none>
    Args:
      --config.file=/etc/prometheus/prometheus.yml
      --storage.tsdb.path=/prometheus
      --web.enable-lifecycle
      --web.console.templates=/usr/share/prometheus/consoles
      --web.console.libraries=/usr/share/prometheus/console_libraries
      --log.level=info
      --web.page-title="Charmed LMA - Prometheus TSDB, Collection & Processing"
      --web.max-connections=512
      --storage.tsdb.retention.time=15d
      --alertmanager.notification-queue-capacity=10000
      --alertmanager.timeout=10s
    State: Waiting
      Reason: PodInitializing
    Ready: False
    Restart Count: 0
    Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
    Readiness: http-get http://:9090/-/ready delay=10s timeout=30s period=10s #success=1 #failure=3
    Environment: <none>
    Mounts:
      /etc/prometheus from prometheus-prom-config-config (rw)
      /prometheus from database-d3acf868 (rw)
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lddqz (ro)
  prometheus-nginx:
    Container ID:
    Image: nginx:1.19.0
    Image ID:
    Ports: 80/TCP, 443/TCP
    Host Ports: 0/TCP, 0/TCP
    State: Waiting
      Reason: PodInitializing
    Ready: False
    Restart Count: 0
    Environment: <none>
    Mounts:
      /etc/nginx/conf.d from prometheus-nginx-config-config (rw)
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lddqz (ro)
Conditions:
  Type Status
  Initialized False
  Ready False
  ContainersReady False
  PodScheduled True
Volumes:
  database-d3acf868:
    Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName: database-d3acf868-prometheus-0
    ReadOnly: false
  juju-data-dir:
    Type: EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit: <unset>
  prometheus-prom-config-config:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: prometheus-prom-config-config
    Optional: false
  prometheus-nginx-config-config:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: prometheus-nginx-config-config
    Optional: false
  default-token-lddqz:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-lddqz
    Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning FailedScheduling 2m34s (x2 over 2m34s) default-scheduler running "VolumeBinding" filter plugin for pod "prometheus-0": pod has unbound immediate PersistentVolumeClaims
  Normal Scheduled 2m33s default-scheduler Successfully assigned lma/prometheus-0 to ip-172-31-40-244
  Normal Pulled 2m32s kubelet, ip-172-31-40-244 Container image "jujusolutions/jujud-operator:2.8.0" already present on machine
  Normal Created 2m32s kubelet, ip-172-31-40-244 Created container juju-pod-init
  Normal Started 2m32s kubelet, ip-172-31-40-244 Started container juju-pod-init
ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$

ubuntu@ip-172-31-40-244:~/charm-k8s-prometheus$ microk8s.kubectl logs -f -n lma -c juju-pod-init prometheus-0
2020-07-11 12:50:16 INFO juju.cmd supercommand.go:91 running jujud [2.8.0 0 d816abe62fbf6787974e5c4e140818ca08586e44 gc go1.14.4]
2020-07-11 12:50:16 DEBUG juju.cmd supercommand.go:92 args: []string{"/var/lib/juju/tools/jujud", "caas-unit-init", "--debug", "--wait"}

Revision history for this message
Ian Booth (wallyworld) wrote :

prometheus charm fails for me also, on a microk8s without rbac

deploying other k8s charms work, so there's some interaction there that needs debugging

also, since juju 2.6, you don't need to set operator-storage manually - juju detects the type of k8s cluster and uses the appropriate storage

tags: added: k8s
Changed in juju:
assignee: nobody → Yang Kelvin Liu (kelvin.liu)
status: New → Triaged
status: Triaged → In Progress
Revision history for this message
Ian Booth (wallyworld) wrote :

The problem is in the config changed hook of the charm.

The config changed hook calls pod-spec-set and then blocks on:
wait_for_pod_readiness()

However, this causes a deadlock because Juju needs to finish the hook execution to process pod-spec-set. And the hook never exits because it is waiting for something that won't happen.

The reason it worked in 2.7 is because 2.7 did not process pod-spec-set as part of the hook commit - it was done straight away during the hook execution. Juju 2.8 changed the behaviour to align it with other hook calls which affect the model (and potentially other related units) - they are all processed atomically during hook commit. There has been some debate as to whether this behaviour is desired for pod-spec-set or not.

Changed in juju:
importance: Undecided → Critical
milestone: none → 2.8.2
Ian Booth (wallyworld)
Changed in juju:
assignee: Yang Kelvin Liu (kelvin.liu) → nobody
importance: Critical → High
status: In Progress → Triaged
Revision history for this message
Ian Booth (wallyworld) wrote :

After more investigation, the issue is not strictly the change in how pod_spec_set is handled, but rather what appears to be a timing issue. The sequence of events is something like:

- juju runs config changed hook but it is skipped as it doesn't exist yet
- juju run start hook which creates symlinks for all the hooks and also calls pod spec set
- juju creates prometheus pod
- prometheus pod runs init container which blocks waiting for juju to send info
- juju skips sending info because it has not yet been notified of the pod's run state
- juju runs config changed hook, hook blocks on pod being ready
- k8s event arrives into juju that pod is initialising
- juju cannot finish pod initialisation because it is blocked in config change hook

So the issue is a cluster event that informs juju that the pod is initialising, and hence allows Juju to send info to finish the pod init and allow the pod to run, arrives after juju is blocked inside the config change hook waiting for the pod to run.

There may be a subtle difference between 2.7 and 2,8 that surfaces this race - I think Juju used to not wait for the pod init notification and this caused issues trying to initialise a pod that didn't exist yet. But the change in how pod spec set is applied is not the root cause because pod spec set is fully applied in the start hook before juju gets blocked inside the config change hook.

The charm needs to not make blocking calls inside a hook until the pod initialisation has been completed by juju, or else the pod init container will never exit and the pod will never start.

This issue would be alleviated if we were at the stage where we fed events from k8s into the charm. But that's not yet implemented.

Revision history for this message
Achilleas Anagnostopoulos (achilleasa) wrote :

The major change between 2.7 and 2.8 is that in 2.7, hook changes are applied _individually_ in a best-effort approach. This means that if any of the change operations fails, juju will still attempt to apply the rest and return back the first-seen error. On the other hand, in 2.8, all hook changes (pod-spec included) get applied in a single transaction and all changes get rolled back in case on an error.

AFAICT from https://github.com/juju/juju/blob/2.7/worker/uniter/runner/context/context.go#L878-L883, 2.7 defers setting the podspec until the hook flush time so the issue with 2.8 might be caused by the combination of changes being applied together.

Revision history for this message
Ian Booth (wallyworld) wrote :

RE: comment #4, I don't think pod spec set being done as part f the hook flush is the root cause here. I thought it was to start with, but as explained in comment #3, the issue due to a hook waiting for a pod state that can never be reach because the hook blocks Juju from completing pod initialisation. If I recall correctly, in Juju 2.7, we did this init completion without confirming the pod was created, and this led to issues the other way; trying to exec into a pod that wasn't there. Now we defer the init until we know the pod is started, but this needs juju to react to an event from the cluster, and it can't do this if a hook is blocking the unit agent from processing such events. The real solution is to feed events from k8s to the charm, but that's not done yet.

Changed in juju:
milestone: 2.8.2 → 2.8.3
Pen Gale (pengale)
Changed in juju:
milestone: 2.8.4 → 2.9-beta1
Changed in juju:
milestone: 2.9-beta1 → 2.9-rc1
Pen Gale (pengale)
Changed in juju:
milestone: 2.9-rc1 → none
importance: High → Medium
Revision history for this message
John A Meinel (jameinel) wrote :

Going off of these lines:

"
- juju runs config changed hook, hook blocks on pod being ready
- k8s event arrives into juju that pod is initialising
- juju cannot finish pod initialisation because it is blocked in config change hook
"

In 2.8 (IIRC), we changed 'pod-spec-set' to not take effect until the hook exits. Which means that if config-changed is triggering a change to the pod specification, there it will loop forever because the pod isn't actually starting.

Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

There was another bug reported and seems it is the same issue.
There are steps for a possible work-around and an explanation of where the issue may be: https://bugs.launchpad.net/juju/+bug/1935832/comments/1

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This Medium-priority bug has not been updated in 60 days, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.