Unit in unknown status - Too little info about what went wrong
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Unassigned |
Bug Description
If you deploy 2 prometheus[1] units and one extra unit of another charm, one of these units ends up in an unknown state, with to little information about what went wrong.
To Reproduce:
- Deploy one prometheus called "external-
juju deploy ./*.charm external-prometheus --resource prometheus image=ubuntu/
- Deploy two prometheus units:
juju deploy ./*.charm prometheus -n 2 --resource prometheus-
And I get the following status:
$ juju status --color --relations
Model Controller Cloud/Region Version SLA Timestamp
cos-lite charm-dev microk8s/localhost 2.9.35 unsupported 17:19:56-03:00
App Version Status Scale Charm Channel Rev Address Exposed Message
external-prometheus 2.33.5 active 1 prometheus-k8s 7 10.152.183.143 no
prometheus 2.33.5 waiting 1/2 prometheus-k8s 8 10.152.183.35 no installing agent
Unit Workload Agent Address Ports Message
external-
prometheus/0* unknown lost agent lost, see 'juju show-status-log prometheus/0'
prometheus/1 active idle 10.1.207.141
Relation provider Requirer Interface Type Message
external-
prometheus:
juju debug-log:
unit-prometheus-0: 17:19:04.625 WARNING juju.worker.
unit-prometheus-0: 17:19:04.647 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/
unit-prometheus-0: 17:19:04.663 INFO juju.worker.
unit-prometheus-0: 17:19:04.663 INFO juju.worker.
unit-prometheus-0: 17:19:04.749 INFO juju.worker.uniter unit "prometheus/0" started
unit-prometheus-0: 17:19:04.784 INFO juju.worker.uniter hooks are retried true
unit-prometheus-0: 17:19:04.963 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-prometheus-0: 17:19:05.537 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-prometheus-1: 17:19:05.569 INFO juju.worker.uniter awaiting error resolution for "config-changed" hook
unit-prometheus-0: 17:19:06.551 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-prometheus-1: 17:19:07.472 INFO unit.prometheus
unit-prometheus-1: 17:19:07.538 INFO unit.prometheus
unit-prometheus-1: 17:19:07.785 INFO unit.prometheus
unit-prometheus-1: 17:19:09.183 INFO unit.prometheus
unit-prometheus-1: 17:19:09.687 INFO juju.worker.
unit-prometheus-0: 17:19:09.752 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-prometheus-1: 17:19:09.775 INFO juju.worker.uniter found queued "start" hook
unit-prometheus-0: 17:19:10.006 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-prometheus-1: 17:19:10.903 INFO unit.prometheus
unit-prometheus-0: 17:19:11.308 INFO unit.prometheus
unit-prometheus-1: 17:19:12.415 INFO juju.worker.
unit-prometheus-1: 17:19:14.023 INFO juju.worker.
unit-prometheus-0: 17:19:14.767 INFO juju.worker.
unit-prometheus-1: 17:19:15.343 INFO unit.prometheus
unit-prometheus-1: 17:19:16.238 INFO juju.worker.
unit-prometheus-1: 17:19:18.068 INFO juju.worker.
unit-prometheus-0: 17:19:18.517 INFO unit.prometheus
unit-prometheus-0: 17:19:20.051 INFO juju.worker.
unit-prometheus-1: 17:19:20.398 INFO juju.worker.
unit-prometheus-0: 17:19:22.630 INFO juju.worker.
unit-prometheus-0: 17:19:22.840 ERROR juju.worker.
unit-prometheus-0: 17:19:22.851 INFO juju.worker.uniter awaiting error resolution for "relation-joined" hook
unit-prometheus-0: 17:19:23.190 INFO juju.worker.uniter awaiting error resolution for "relation-joined" hook
unit-prometheus-1: 17:20:05.429 INFO juju.worker.
unit-prometheus-1: 17:20:05.464 INFO juju.worker.uniter found queued "leader-elected" hook
unit-prometheus-1: 17:20:07.214 INFO juju.worker.
And the status log:
$ juju show-status-log prometheus/0
Time Type Status Message
14 Oct 2022 17:18:05-03:00 juju-unit executing running prometheus-
14 Oct 2022 17:18:17-03:00 juju-unit error hook failed: "prometheus-
14 Oct 2022 17:18:22-03:00 juju-unit executing running prometheus-
14 Oct 2022 17:18:24-03:00 juju-unit executing running leader-elected hook
14 Oct 2022 17:18:29-03:00 juju-unit executing running prometheus-
14 Oct 2022 17:18:34-03:00 juju-unit executing running database-
14 Oct 2022 17:18:37-03:00 juju-unit executing running config-changed hook
14 Oct 2022 17:18:43-03:00 juju-unit executing running start hook
14 Oct 2022 17:18:46-03:00 juju-unit error hook failed: "start"
14 Oct 2022 17:18:55-03:00 juju-unit error crash loop backoff: back-off 10s restarting failed container=charm pod=prometheus-
14 Oct 2022 17:18:55-03:00 workload maintenance installing charm software
14 Oct 2022 17:19:04-03:00 juju-unit error hook failed: "start"
14 Oct 2022 17:19:10-03:00 juju-unit executing running start hook
14 Oct 2022 17:19:14-03:00 workload unknown
14 Oct 2022 17:19:15-03:00 juju-unit executing running prometheus-
14 Oct 2022 17:19:18-03:00 workload waiting Waiting for resource limit patch to apply
14 Oct 2022 17:19:20-03:00 juju-unit executing running prometheus-
14 Oct 2022 17:19:22-03:00 juju-unit error hook failed: "prometheus-
14 Oct 2022 17:19:36-03:00 juju-unit idle
14 Oct 2022 17:19:36-03:00 workload blocked 0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
And K8s describe pod:
$ microk8s.kubectl -n cos-lite describe pods/prometheus-0
Name: prometheus-0
Namespace: cos-lite
Priority: 0
Service Account: prometheus
Node: <none>
Labels: app.kubernetes.
Annotations: controller.
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/
Init Containers:
charm-init:
Image: jujusolutions/
Port: <none>
Host Port: <none>
Command:
/
Args:
init
-
/
-
0
--data-dir
/var/lib/juju
--bin-dir
/charm/bin
Environment Variables from:
prometheu
Environment:
JUJU_
JUJU_
JUJU_
Mounts:
/charm/bin from charm-data (rw,path=
/
/
/var/lib/juju from charm-data (rw,path=
/
Containers:
charm:
Image: jujusolutions/
Port: <none>
Host Port: <none>
Command:
/
Args:
run
--http
:38812
--verbose
Liveness: http-get http://
Readiness: http-get http://
Environment:
JUJU_
HTTP_
Mounts:
/charm/bin from charm-data (ro,path=
/
/var/lib/juju from charm-data (rw,path=
/
/
/
prometheus:
Image: ubuntu/
Port: <none>
Host Port: <none>
Command:
/
Args:
run
--create-dirs
--hold
--http
:38813
--verbose
Requests:
cpu: 250m
memory: 200Mi
Liveness: http-get http://
Readiness: http-get http://
Environment:
JUJU_
PEBBLE_
Mounts:
/
/
/
/
Conditions:
Type Status
PodScheduled False
Volumes:
prometheus-
Type: PersistentVolum
ClaimName: prometheus-
ReadOnly: false
charm-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpira
ConfigMapName: kube-root-ca.crt
ConfigMapOp
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.
Tolerations: node.kubernetes
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5s default-scheduler 0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Warning FailedScheduling 4s default-scheduler 0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
More info about this issue can be found here: https:/
Changed in juju: | |
milestone: | 2.9.37 → 2.9.38 |
Changed in juju: | |
milestone: | 2.9.38 → 2.9.39 |
Changed in juju: | |
milestone: | 2.9.39 → 2.9.40 |
Changed in juju: | |
milestone: | 2.9.40 → 2.9.41 |
Changed in juju: | |
milestone: | 2.9.41 → 2.9.42 |
Changed in juju: | |
milestone: | 2.9.42 → 2.9.43 |
Changed in juju: | |
milestone: | 2.9.43 → 2.9.44 |
Changed in juju: | |
milestone: | 2.9.44 → 2.9.45 |
Changed in juju: | |
milestone: | 2.9.45 → 2.9.46 |
I will set this bug to invalid because I see that the parallel discussion on Github [1] has been closed.
[1]https:/ /github. com/canonical/ prometheus- k8s-operator/ issues/ 389