Eg: when thread 1 is deleting volume A, then thread 2 that deleted volume B is trying to update the usage and when it queried the database volume A was still "available" but then when it queries Ceph about that volume it's no longer there, so basically the logic to update the usage statistics is racy.
_get_usage_info is run in a native thread via RBDVolumeProxy, native thread uses blocking mode, it means once _get_usage_info throws an exception ImageNotFound, the green thread which spawns this native thread won't be able to yield in time so that all other green threads will be blocked as well, at this time, any image operation like delete/query/modify/create will become impossible, then you will see all kinds of strange questions.
When using rbd_exclusive_ cinder_ pool default value False, there is a racy between _get_usage_info and delete_volume.
2018-08-29 06:57:41.604 1586622 DEBUG cinder. volume. drivers. rbd [req-f885ea4a- 3e98-4aad- bb3f-102240aebc a1 10cbcda6a7854fa 79cfc37dc1945cb 6d 5d5d0f0ab738467 f8ca813dd41432a fa - a51502c6e125414 fbba0cc95decd86 c5 a51502c6e125414 fbba0cc95decd86 c5] deleting rbd volume volume- 2d5951be- 25bf-4313- a706-593664f6cd 2e delete_volume /usr/lib/ python2. 7/dist- packages/ cinder/ volume/ drivers/ rbd.py: 977 volume. drivers. rbd [req-f885ea4a- 3e98-4aad- bb3f-102240aebc a1 10cbcda6a7854fa 79cfc37dc1945cb 6d 5d5d0f0ab738467 f8ca813dd41432a fa - a51502c6e125414 fbba0cc95decd86 c5 a51502c6e125414 fbba0cc95decd86 c5] error opening rbd image volume- 005a04e7- a113-4ebb- bd77-1d9d3221d8 f2: ImageNotFound: [errno 2] error opening image volume- 005a04e7- a113-4ebb- bd77-1d9d3221d8 f2 at snapshot None volume. drivers. rbd Traceback (most recent call last): volume. drivers. rbd File "/usr/lib/ python2. 7/dist- packages/ cinder/ volume/ drivers/ rbd.py" , line 147, in __init__ volume. drivers. rbd read_only= read_only) volume. drivers. rbd File "rbd.pyx", line 1392, in rbd.Image.__init__ (/build/ ceph-B2ToPL/ ceph-12. 2.4/obj- x86_64- linux-gnu/ src/pybind/ rbd/pyrex/ rbd.c:13545) volume. drivers. rbd ImageNotFound: [errno 2] error opening image volume- 005a04e7- a113-4ebb- bd77-1d9d3221d8 f2 at snapshot None volume. drivers. rbd volume. drivers. rbd [req-f885ea4a- 3e98-4aad- bb3f-102240aebc a1 10cbcda6a7854fa 79cfc37dc1945cb 6d 5d5d0f0ab738467 f8ca813dd41432a fa - a51502c6e125414 fbba0cc95decd86 c5 a51502c6e125414 fbba0cc95decd86 c5] Image volume- 005a04e7- a113-4ebb- bd77-1d9d3221d8 f2 is not found. _get_usage_info /usr/lib/ python2. 7/dist- packages/ cinder/ volume/ drivers/ rbd.py: 409
2018-08-29 06:57:41.610 1586622 ERROR cinder.
2018-08-29 06:57:41.610 1586622 ERROR cinder.
2018-08-29 06:57:41.610 1586622 ERROR cinder.
2018-08-29 06:57:41.610 1586622 ERROR cinder.
2018-08-29 06:57:41.610 1586622 ERROR cinder.
2018-08-29 06:57:41.610 1586622 ERROR cinder.
2018-08-29 06:57:41.610 1586622 ERROR cinder.
2018-08-29 06:57:41.612 1586622 DEBUG cinder.
Eg: when thread 1 is deleting volume A, then thread 2 that deleted volume B is trying to update the usage and when it queried the database volume A was still "available" but then when it queries Ceph about that volume it's no longer there, so basically the logic to update the usage statistics is racy.
_get_usage_info is run in a native thread via RBDVolumeProxy, native thread uses blocking mode, it means once _get_usage_info throws an exception ImageNotFound, the green thread which spawns this native thread won't be able to yield in time so that all other green threads will be blocked as well, at this time, any image operation like delete/ query/modify/ create will become impossible, then you will see all kinds of strange questions.