controller pod restarted on microk8s as part of bootstrap

Bug #2003055 reported by Tom Haddon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

I've noticed both on 2.9.37 and 3.0.2 that when bootstrapping Juju on microk8s I see a pod restart of the controller-0 pod. Here's the output of `get pods` immediately after a bootstrap in Juju 3.0.2:
```
[...]
Bootstrap complete, controller "microk8s-localhost" is now available in namespace "controller-microk8s-localhost"

Now you can run
        juju add-model <model-name>
to create a new model to deploy k8s workloads.
mthaddon@finistere:~$ microk8s kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-txnvf 1/1 Running 0 7m22s
kube-system coredns-d489fb88-rcrkm 1/1 Running 0 6m36s
kube-system calico-kube-controllers-dfbcc8b44-nd98s 1/1 Running 0 7m22s
kube-system hostpath-provisioner-766849dd9d-c968j 1/1 Running 0 5m6s
ingress nginx-ingress-microk8s-controller-6kjm2 1/1 Running 0 5m6s
container-registry registry-6674bf676f-tdtbh 1/1 Running 0 5m6s
controller-microk8s-localhost controller-0 3/3 Running 1 (41s ago) 4m54s
controller-microk8s-localhost modeloperator-66fc59d5ff-jcr69 1/1 Running 0 24s
```

Looking at the logs in 3.0.2 I see the following:

```
2023-01-17T08:28:16.793Z [pebble] HTTP API server listening on ":38812".
2023-01-17T08:28:16.793Z [pebble] Started daemon.
2023-01-17T08:28:16.796Z [pebble] POST /v1/services 3.011123ms 202
2023-01-17T08:28:16.797Z [pebble] Started default services with change 1.
2023-01-17T08:28:16.798Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0 --controller
2023-01-17T08:28:16.820Z [container-agent] 2023-01-17 08:28:16 INFO juju.cmd supercommand.go:56 running containerAgent [3.0.2 8bf53dc35b25145ef39051fe4136135a3dd53d5d gc go1.19.3]
2023-01-17T08:28:16.820Z [container-agent] starting containeragent unit command
2023-01-17T08:28:16.820Z [container-agent] containeragent unit "unit-controller-0" start (3.0.2 [gc])
2023-01-17T08:28:16.820Z [container-agent] 2023-01-17 08:28:16 INFO juju.cmd.containeragent.unit runner.go:556 start "unit"
2023-01-17T08:28:16.820Z [container-agent] 2023-01-17 08:28:16 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.0.2 have already been run.
2023-01-17T08:28:16.821Z [container-agent] 2023-01-17 08:28:16 INFO juju.worker.probehttpserver server.go:157 starting http server on 127.0.0.1:65301
2023-01-17T08:28:17.232Z [container-agent] 2023-01-17 08:28:17 ERROR juju.worker.dependency engine.go:695 "api-caller" manifold worker returned unexpected error: [3ef264] "unit-controller-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
2023-01-17T08:28:21.267Z [container-agent] 2023-01-17 08:28:21 ERROR juju.worker.dependency engine.go:695 "api-caller" manifold worker returned unexpected error: [3ef264] "unit-controller-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
2023-01-17T08:28:25.933Z [container-agent] 2023-01-17 08:28:25 ERROR juju.worker.dependency engine.go:695 "api-caller" manifold worker returned unexpected error: [3ef264] "unit-controller-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
2023-01-17T08:28:26.797Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 404
2023-01-17T08:28:26.797Z [pebble] Check "liveness" failure 1 (threshold 3): received non-20x status code 404
2023-01-17T08:28:31.018Z [container-agent] 2023-01-17 08:28:31 ERROR juju.worker.dependency engine.go:695 "api-caller" manifold worker returned unexpected error: [3ef264] "unit-controller-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
2023-01-17T08:28:36.799Z [pebble] Check "readiness" failure 2 (threshold 3): received non-20x status code 404
2023-01-17T08:28:36.799Z [pebble] Check "liveness" failure 2 (threshold 3): received non-20x status code 404
2023-01-17T08:28:37.157Z [container-agent] 2023-01-17 08:28:37 ERROR juju.worker.dependency engine.go:695 "api-caller" manifold worker returned unexpected error: [3ef264] "unit-controller-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
2023-01-17T08:28:45.721Z [container-agent] 2023-01-17 08:28:45 ERROR juju.worker.dependency engine.go:695 "api-caller" manifold worker returned unexpected error: [3ef264] "unit-controller-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
2023-01-17T08:28:46.796Z [pebble] Check "readiness" failure 3 (threshold 3): received non-20x status code 404
2023-01-17T08:28:46.796Z [pebble] Check "readiness" failure threshold 3 hit, triggering action
2023-01-17T08:28:46.796Z [pebble] Check "liveness" failure 3 (threshold 3): received non-20x status code 404
2023-01-17T08:28:46.796Z [pebble] Check "liveness" failure threshold 3 hit, triggering action
2023-01-17T08:28:46.796Z [pebble] Service "container-agent" on-check-failure action is "restart", terminating process before restarting
2023-01-17T08:28:46.801Z [pebble] Service "container-agent" exited after check failure, restarting
2023-01-17T08:28:46.801Z [pebble] Service "container-agent" on-check-failure action is "restart", waiting ~500ms before restart (backoff 1)
2023-01-17T08:28:47.340Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0 --controller
2023-01-17T08:28:47.368Z [container-agent] 2023-01-17 08:28:47 INFO juju.cmd supercommand.go:56 running containerAgent [3.0.2 8bf53dc35b25145ef39051fe4136135a3dd53d5d gc go1.19.3]
2023-01-17T08:28:47.368Z [container-agent] starting containeragent unit command
2023-01-17T08:28:47.368Z [container-agent] containeragent unit "unit-controller-0" start (3.0.2 [gc])
2023-01-17T08:28:47.368Z [container-agent] 2023-01-17 08:28:47 INFO juju.cmd.containeragent.unit runner.go:556 start "unit"
```

Tags: canonical-is
Tom Haddon (mthaddon)
summary: - controller pods restarted on microk8s as part of bootstrap
+ controller pod restarted on microk8s as part of bootstrap
tags: added: canonical-is
Revision history for this message
Ian Booth (wallyworld) wrote :

This seems like a timing / order of startup issue with the juju workers and how the probes have been configured. Since everything does come good, it's more annoying than a critical issue, but one we should address for sure.

Changed in juju:
milestone: none → 3.2-rc1
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Tom Haddon (mthaddon) wrote :

A bit more detail on this. I've deployed Juju 3.1.0, and see the same restart of the controller pod. It's the "mongodb" pod that restarting, and looking at the log for that pod I see:

```
$ microk8s kubectl logs -p -n controller-microk8s-localhost controller-0 -c mongodb
{"t":{"$date":"2023-03-03T13:30:08.899+00:00"},"s":"E", "c":"NETWORK", "id":23248, "ctx":"main","msg":"Cannot read certificate file","attr":{"keyFile":"/var/lib/juju/server.pem","error":"error:02001002:system library:fopen:No such file or directory"}}
{"t":{"$date":"2023-03-03T13:30:08.899+00:00"},"s":"F", "c":"CONTROL", "id":20574, "ctx":"main","msg":"Error during global initialization","attr":{"error":{"code":140,"codeName":"InvalidSSLConfiguration","errmsg":"Can not set up PEM key file."}}}
$
```

Changed in juju:
milestone: 3.2-rc1 → 3.2.0
Revision history for this message
Patrizio Bassi (patrizio-bassi) wrote :

i have the same issue on charmed kubernetes:

kubectl logs pod/controller-0 -n controller-paas-klab01-tst2-cloud
Defaulted container "mongodb" out of: mongodb, api-server
{"t":{"$date":"2023-05-23T13:00:01.495Z"},"s":"W", "c":"CONTROL", "id":23322, "ctx":"main","msg":"Option: sslMode is deprecated. Please use tlsMode instead."}
{"t":{"$date":"2023-05-23T13:00:01.495Z"},"s":"W", "c":"CONTROL", "id":23321, "ctx":"main","msg":"Option: This name is deprecated. Please use the preferred name instead.","attr":{"deprecatedName":"sslPEMKeyFile","preferredName":"tlsCertificateKeyFile"}}
{"t":{"$date":"2023-05-23T13:00:01.495Z"},"s":"W", "c":"CONTROL", "id":23321, "ctx":"main","msg":"Option: This name is deprecated. Please use the preferred name instead.","attr":{"deprecatedName":"sslPEMKeyPassword","preferredName":"tlsCertificateKeyFilePassword"}}
{"t":{"$date":"2023-05-23T13:00:01.498+00:00"},"s":"E", "c":"NETWORK", "id":23248, "ctx":"main","msg":"Cannot read certificate file","attr":{"keyFile":"/var/lib/juju/server.pem","error":"error:02001002:system library:fopen:No such file or directory"}}
{"t":{"$date":"2023-05-23T13:00:01.498+00:00"},"s":"F", "c":"CONTROL", "id":20574, "ctx":"main","msg":"Error during global initialization","attr":{"error":{"code":140,"codeName":"InvalidSSLConfiguration","errmsg":"Can not set up PEM key file."}}}

Changed in juju:
milestone: 3.2.0 → 3.2.1
Changed in juju:
milestone: 3.2.1 → 3.2.2
Changed in juju:
milestone: 3.2.2 → 3.2.3
Changed in juju:
milestone: 3.2.3 → 3.2.4
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.