Cinder-ceph stuck waiting: Ceph broker request incomplete
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Cinder-Ceph charm |
New
|
Undecided
|
Unassigned | ||
OpenStack Glance Charm |
New
|
Undecided
|
Unassigned |
Bug Description
In testrun: https:/
Cinder-ceph gets stuck waiting with message Ceph broker request incomplete:
```
cinder/0 active idle 3/lxd/2 10.246.166.187 8776/tcp Unit is ready
cinder-ceph/2 active idle 10.246.166.187 Unit is ready
cinder-
filebeat/55 active idle 10.246.166.187 Filebeat ready.
hacluster-
landscape-
logrotated/50 active idle 10.246.166.187 Unit is ready.
nrpe/61 active idle 10.246.166.187 icmp,5666/tcp Ready
public-
telegraf/54 active idle 10.246.166.187 9103/tcp Monitoring cinder/0 (source version/commit cc7fa21)
cinder/1* active idle 4/lxd/2 10.246.167.106 8776/tcp Unit is ready
cinder-ceph/0* waiting idle 10.246.167.106 Ceph broker request incomplete
cinder-
filebeat/42 active idle 10.246.167.106 Filebeat ready.
hacluster-
landscape-
logrotated/36 active idle 10.246.167.106 Unit is ready.
nrpe/46 active idle 10.246.167.106 icmp,5666/tcp Ready
public-
telegraf/42 active idle 10.246.167.106 9103/tcp Monitoring cinder/1 (source version/commit cc7fa21)
cinder/2 active idle 5/lxd/2 10.246.166.236 8776/tcp Unit is ready
cinder-ceph/1 active idle 10.246.166.236 Unit is ready
cinder-
filebeat/48 active idle 10.246.166.236 Filebeat ready.
hacluster-
landscape-
logrotated/42 active idle 10.246.166.236 Unit is ready.
nrpe/52 active idle 10.246.166.236 icmp,5666/tcp Ready
public-
telegraf/48 active idle 10.246.166.236 9103/tcp Monitoring cinder/2 (source version/commit cc7fa21)
```
The relation with ceph is rendered correctly, and in the logs we only see this message indicating a problem:
```
2022-05-31 12:00:59 DEBUG unit.cinder-
```
Link to crashdumps:
https:/
Hi, changed) on the affected unit. /opendev. org/openstack/ charm-cinder- ceph/src/ commit/ a973d9351ed6123 d2be4dce909acca 91bcca245d/ charmhelpers/ contrib/ storage/ linux/ceph. py#L2220) and therefore forcing it to create a new request when I manually ran the ceph-relation- changed hook. I think just removing the "broker- rsp-cinder- ceph-0" relation data for ceph-mon might also have worked, without hacking the code.
I ran into this issue as well. In my case, for some reason the ceph config on the faulty cinder-ceph unit wasn't generated correctly and was lacking the necessary entries to connect to ceph.
I fixed that by running manually running a config changed hook (juju run -u cinder-ceph/0 hooks/config-
That generated the config (I restarted all cinder services on the unit as well) but the unit was still stuck in waiting.
My guess from skimming the code: It is waiting for a response from ceph that was never really sent. I commented out the check if the response had already been sent (https:/
Anyway, I hope this helps any future travelers coming across this issue.