So you say:
"In alertmanager, stop is not even observed."
But then you say:
"INFO juju.model:model.py:2690 Waiting for model:
alertmanager/0 [idle] error: hook failed: "stop""
However, that clearly indicates that stop was observed, as you got an error in the hook event.
Plausibly (given the logs that you linked to) you meant to say:
"In traefik, stop is not even observed."
I can't tell, because I do see the statuses getting to:
"
INFO juju.model:model.py:2690 Waiting for model:
grafana/0 [idle] error: hook failed: "stop"
loki/0 [idle] error: hook failed: "stop"
traefik/0 [idle] active:
INFO juju.model:model.py:2690 Waiting for model:
grafana/0 [idle] error: hook failed: "stop"
loki/0 [idle] error: hook failed: "stop"
INFO juju.model:model.py:2690 Waiting for model:
grafana/0 [idle] error: hook failed: "stop"
loki/0 [idle] error: hook failed: "stop"
"
That seems to say that traefik did its thing, was then happy, and the problem is only that both grafana and loki are in an error state.
Now, I don't know why those are in error state, from what you linked in the charm, there doesn't seem to be much to go wrong (all I'm doing is setting the unit status.)
However, there is a *lot* of other code that gets executed while running that stop hook, which could be failing. For example:
class GrafanaCharm(CharmBase):
...
def __init__(self, *args):
... self.containers = { "workload": self.unit.get_container(self.name), "replication": self.unit.get_container("litestream"),
}
^- is there something problematic while trying to grab containers while tearing down?
... self.metrics_endpoint = MetricsEndpointProvider( charm=self, jobs=self._scrape_jobs, refresh_event=[ self.on.grafana_pebble_ready, # pyright: ignore self.on.update_status,
],
)
^- is MetricsEndpoint running into anything. (I'm guessing you're passing in the events that you're asking it to event on, but since you're passing in 'self' here, it could be doing lots of things in '__init__' possibly even registering an on.stop handler.
What we'd really need is to see more of why the hooks themselves failed. Which doesn't seem to be exposed by the CI suite.
All it says is "I didn't become idle" but no recursion into "here's the thing that didn't become idle, and what rationale it has for not being happy"
That might be something that we think should be addressed in python-libjuju and `wait_for_idle` though that is really a very heavy lift for that simple function.
So you say: model.py: 2690 Waiting for model:
"In alertmanager, stop is not even observed."
But then you say:
"INFO juju.model:
alertmanager/0 [idle] error: hook failed: "stop""
However, that clearly indicates that stop was observed, as you got an error in the hook event.
Plausibly (given the logs that you linked to) you meant to say:
"In traefik, stop is not even observed."
I can't tell, because I do see the statuses getting to: model.py: 2690 Waiting for model: model.py: 2690 Waiting for model: model.py: 2690 Waiting for model:
"
INFO juju.model:
grafana/0 [idle] error: hook failed: "stop"
loki/0 [idle] error: hook failed: "stop"
traefik/0 [idle] active:
INFO juju.model:
grafana/0 [idle] error: hook failed: "stop"
loki/0 [idle] error: hook failed: "stop"
INFO juju.model:
grafana/0 [idle] error: hook failed: "stop"
loki/0 [idle] error: hook failed: "stop"
"
That seems to say that traefik did its thing, was then happy, and the problem is only that both grafana and loki are in an error state.
Now, I don't know why those are in error state, from what you linked in the charm, there doesn't seem to be much to go wrong (all I'm doing is setting the unit status.)
However, there is a *lot* of other code that gets executed while running that stop hook, which could be failing. For example:
class GrafanaCharm( CharmBase) :
self.container s = {
"workload" : self.unit. get_container( self.name) ,
"replicati on": self.unit. get_container( "litestream" ),
self.metrics_ endpoint = MetricsEndpoint Provider(
charm= self,
jobs= self._scrape_ jobs,
refresh_ event=[
self. on.grafana_ pebble_ ready, # pyright: ignore
self. on.update_ status,
...
def __init__(self, *args):
...
}
^- is there something problematic while trying to grab containers while tearing down?
...
],
)
^- is MetricsEndpoint running into anything. (I'm guessing you're passing in the events that you're asking it to event on, but since you're passing in 'self' here, it could be doing lots of things in '__init__' possibly even registering an on.stop handler.
What we'd really need is to see more of why the hooks themselves failed. Which doesn't seem to be exposed by the CI suite.
All it says is "I didn't become idle" but no recursion into "here's the thing that didn't become idle, and what rationale it has for not being happy"
That might be something that we think should be addressed in python-libjuju and `wait_for_idle` though that is really a very heavy lift for that simple function.