Activity log for bug #1789828

Date Who What changed Old value New value Message
2018-08-30 06:41:24 Hua Zhang bug added bug
2018-08-30 06:44:40 Dominique Poulain bug added subscriber Dominique Poulain
2018-08-30 07:01:00 Hua Zhang description When using rbd_exclusive_cinder_pool default value False, there is a racy between _get_usage_info and delete_volume. 2018-08-29 06:57:41.604 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] deleting rbd volume volume-2d5951be-25bf-4313-a706-593664f6cd2e delete_volume /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:977 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] error opening rbd image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2: ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd Traceback (most recent call last): 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 147, in __init__ 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd read_only=read_only) 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd File "rbd.pyx", line 1392, in rbd.Image.__init__ (/build/ceph-B2ToPL/ceph-12.2.4/obj-x86_64-linux-gnu/src/pybind/rbd/pyrex/rbd.c:13545) 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd 2018-08-29 06:57:41.612 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] Image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 is not found. _get_usage_info /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:409 Eg: when thread 1 is deleting volume A, then thread 2 that deleted volume B is trying to update the usage and when it queried the database volume A was still "available" but then when it queries Ceph about that volume it's no longer there, so basically the logic to update the usage statistics is racy. _get_usage_info is run in a native thread via RBDVolumeProxy, native thread uses blocking mode, it means once _get_usage_info throws an exception ImageNotFound, the green thread which spawns this native thread won't be able to yield in time so that all other green threads will be blocked as well, at this time, any image operation like delete/query/modify/create will become impossible, then you will see all kinds of strange questions. When using rbd_exclusive_cinder_pool default value False, there is a racy between _get_usage_info and delete_volume. 2018-08-29 06:57:41.604 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] deleting rbd volume volume-2d5951be-25bf-4313-a706-593664f6cd2e delete_volume /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:977 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] error opening rbd image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2: ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd Traceback (most recent call last): 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 147, in __init__ 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd read_only=read_only) 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd File "rbd.pyx", line 1392, in rbd.Image.__init__ (/build/ceph-B2ToPL/ceph-12.2.4/obj-x86_64-linux-gnu/src/pybind/rbd/pyrex/rbd.c:13545) 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None 2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd 2018-08-29 06:57:41.612 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] Image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 is not found. _get_usage_info /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:409 Eg: when thread 1 is deleting volume A, then thread 2 that deleted volume B is trying to update the usage and when it queried the database volume A was still "available" but then when it queries Ceph about that volume it's no longer there, so basically the logic to update the usage statistics is racy. _get_usage_info is run in a native thread via RBDVolumeProxy, native thread uses blocking mode, it means once _get_usage_info throws an exception ImageNotFound, the green thread which spawns this native thread won't be able to yield in time so that all other green threads will be blocked as well, at this time, any image operation like delete/query/modify/create will become impossible, then you will see all kinds of strange problems. How to reproduce: 1, set up a test env with the default rbd_exclusive_cinder_pool=False 2, create 100 test volumes 3, delete 100 test volumes - openstack volume list| egrep -v "^\+-+|ID"| awk '{print $2}'| xargs openstack volume delete 4, then the exception ImageNotFound should can be seen from cinder-volume.log (if not see, can create/delete more volumes), at this time the time will obviously increase when deleting a volume.
2018-08-30 13:21:54 Sean McGinnis tags ceph drivers rbd
2018-08-30 13:25:15 Felipe Reyes bug added subscriber Felipe Reyes
2018-08-31 03:45:01 Xav Paice tags ceph drivers rbd canonical-bootstack ceph drivers rbd
2018-08-31 03:45:24 Xav Paice bug added subscriber Canonical IS BootStack
2018-09-03 15:55:14 Dmitrii Shcherbakov bug added subscriber Dmitrii Shcherbakov
2018-09-03 19:59:23 Edward Hope-Morley summary racy between _get_usage_info and delete_volume race between threaded rbd operations
2018-09-05 11:47:16 Edward Hope-Morley bug task added charm-cinder
2018-09-05 11:47:27 Edward Hope-Morley charm-cinder: milestone 18.11
2018-09-05 11:47:36 Edward Hope-Morley charm-cinder: importance Undecided Medium
2018-09-05 11:47:40 Edward Hope-Morley charm-cinder: assignee Edward Hope-Morley (hopem)
2018-09-05 11:48:42 Edward Hope-Morley affects charm-cinder charm-cinder-ceph
2018-09-05 11:48:42 Edward Hope-Morley charm-cinder-ceph: milestone 18.11
2018-09-05 11:48:42 Edward Hope-Morley charm-cinder-ceph: assignee Edward Hope-Morley (hopem)
2018-09-05 11:48:54 Edward Hope-Morley charm-cinder-ceph: milestone 18.11
2018-09-05 11:48:59 Edward Hope-Morley charm-cinder-ceph: assignee Edward Hope-Morley (hopem)
2018-09-05 13:57:01 OpenStack Infra charm-cinder-ceph: status New In Progress
2018-09-13 17:30:48 Edward Hope-Morley tags canonical-bootstack ceph drivers rbd canonical-bootstack ceph drivers rbd stable-backport
2018-09-14 16:19:20 OpenStack Infra charm-cinder-ceph: status In Progress Fix Committed
2018-11-20 00:10:32 David Ames charm-cinder-ceph: status Fix Committed Fix Released