Actions do not run correctly on new OSDs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
New
|
Undecided
|
Unassigned |
Bug Description
Hi,
I have a Ceph Cluster with 3 ceph-mons (rev 58) and 3 ceph-osd (rev 312).
In the initial cluster I had one osd-device per OSD, this is a testing env so I used folder based OSDs:
$ juju config ceph-osd osd-devices
/srv/osd
I've added one more osd-device per OSD and everything was fine - used crush-initial-
$ juju config ceph-osd osd-devices=
$ juju ssh ceph-mon/0 sudo ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.42339 root default
-3 0.14169 host juju-a21053-4
1 hdd 0.14169 osd.1 up 1.00000 1.00000
5 hdd 0 osd.5 up 1.00000 1.00000
-5 0.14090 host juju-a21053-5
0 hdd 0.14090 osd.0 up 1.00000 1.00000
4 hdd 0 osd.4 up 1.00000 1.00000
-7 0.14079 host juju-a21053-6
2 hdd 0.14079 osd.2 up 1.00000 1.00000
3 hdd 0 osd.3 up 1.00000 1.00000
Now I wanted to take the new osd-devices, 3 4 5, out, it only worked for the one with ID '5', for the rest, juju failed to recognize them:
$ juju run-action ceph-osd/0 --wait osd-out osds=5
unit-ceph-osd-0:
UnitId: ceph-osd/0
id: "66"
results:
message: osd-out action was successfully executed for ceph OSD devices [5]
outputs: "marked out osd.5. \n"
status: completed
timing:
completed: 2021-10-22 11:16:20 +0000 UTC
enqueued: 2021-10-22 11:16:14 +0000 UTC
started: 2021-10-22 11:16:18 +0000 UTC
$ juju run-action ceph-osd/0 --wait osd-out osds=3
unit-ceph-osd-0:
UnitId: ceph-osd/0
id: "68"
message: 'invalid ceph OSD device id: 3'
results: {}
status: failed
timing:
completed: 2021-10-22 11:16:23 +0000 UTC
enqueued: 2021-10-22 11:16:22 +0000 UTC
started: 2021-10-22 11:16:22 +0000 UTC
$ juju run-action ceph-osd/0 --wait osd-out osds=4
unit-ceph-osd-0:
UnitId: ceph-osd/0
id: "70"
message: 'invalid ceph OSD device id: 4'
results: {}
status: failed
timing:
completed: 2021-10-22 11:16:28 +0000 UTC
enqueued: 2021-10-22 11:16:26 +0000 UTC
started: 2021-10-22 11:16:27 +0000 UTC
However, I can see the OSDs with "ceph osd find":
$ juju ssh ceph-mon/0 sudo ceph osd find osd.3
{
"osd": 3,
"ip": "10.0.8.
"osd_fsid": "9cbe1c64-
"crush_
"host": "juju-a21053-6",
"root": "default"
}
}
$ juju ssh ceph-mon/0 sudo ceph osd find osd.5
{
"osd": 5,
"ip": "10.0.8.
"osd_fsid": "3741ec57-
"crush_
"host": "juju-a21053-4",
"root": "default"
}
}
$ juju ssh ceph-mon/0 sudo ceph osd find 4
{
"osd": 4,
"ip": "10.0.8.
"osd_fsid": "fdd8546d-
"crush_
"host": "juju-a21053-5",
"root": "default"
}
}
Using "osd-out osds=osd.3" has the same result, no osd found.
Other actions have the same result:
$ juju run-action ceph-osd/0 --wait stop osds=4
unit-ceph-osd-0:
UnitId: ceph-osd/0
id: "100"
message: 'Action ''stop'' failed: Some services are not present on this unit: [''ceph-
results: {}
status: failed
timing:
completed: 2021-10-22 11:29:33 +0000 UTC
enqueued: 2021-10-22 11:29:31 +0000 UTC
started: 2021-10-22 11:29:33 +0000 UTC