race between threaded rbd operations

Bug #1789828 reported by Hua Zhang
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned
OpenStack Cinder-Ceph charm
Fix Released
Medium
Edward Hope-Morley

Bug Description

When using rbd_exclusive_cinder_pool default value False, there is a racy between _get_usage_info and delete_volume.

2018-08-29 06:57:41.604 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] deleting rbd volume volume-2d5951be-25bf-4313-a706-593664f6cd2e delete_volume /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:977
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] error opening rbd image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2: ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 147, in __init__
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd read_only=read_only)
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd File "rbd.pyx", line 1392, in rbd.Image.__init__ (/build/ceph-B2ToPL/ceph-12.2.4/obj-x86_64-linux-gnu/src/pybind/rbd/pyrex/rbd.c:13545)
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd
2018-08-29 06:57:41.612 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] Image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 is not found. _get_usage_info /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:409

Eg: when thread 1 is deleting volume A, then thread 2 that deleted volume B is trying to update the usage and when it queried the database volume A was still "available" but then when it queries Ceph about that volume it's no longer there, so basically the logic to update the usage statistics is racy.

_get_usage_info is run in a native thread via RBDVolumeProxy, native thread uses blocking mode, it means once _get_usage_info throws an exception ImageNotFound, the green thread which spawns this native thread won't be able to yield in time so that all other green threads will be blocked as well, at this time, any image operation like delete/query/modify/create will become impossible, then you will see all kinds of strange problems.

How to reproduce:

1, set up a test env with the default rbd_exclusive_cinder_pool=False
2, create 100 test volumes
3, delete 100 test volumes -
   openstack volume list| egrep -v "^\+-+|ID"| awk '{print $2}'| xargs openstack volume delete
4, then the exception ImageNotFound should can be seen from cinder-volume.log (if not see, can create/delete more volumes), at this time the time will obviously increase when deleting a volume.

Hua Zhang (zhhuabj)
description: updated
Revision history for this message
Edward Hope-Morley (hopem) wrote :

So this is mostly correct but what I've seen is that the green thread that is waiting on the native thread never gets re-scheduled in the case where the threaded operation raises an exception and doesn't explicitly handle it. In other words the green threads calling the native threads gets stuck but other green threads still operate as normal. This simple test script seems to demonstrate the same behaviour - https://pastebin.ubuntu.com/p/JXgzhppKwr/

tags: added: ceph drivers rbd
Xav Paice (xavpaice)
tags: added: canonical-bootstack
Revision history for this message
Gorka Eguileor (gorka) wrote :

This looks like a duplicate of https://bugs.launchpad.net/cinder/+bug/1765845 which has already been fixed.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

@gorka that patch fixed a separate instance of this issue i.e. it catches ImageNotFound if it is raised at [0] but not if it is raised at [1] which is where this issue is still occurring.

[0] https://github.com/openstack/cinder/blob/12.0.3/cinder/volume/drivers/rbd.py#L408 (queens)
[1] https://github.com/openstack/cinder/blob/12.0.3/cinder/volume/drivers/rbd.py#L144 (queens)

summary: - racy between _get_usage_info and delete_volume
+ race between threaded rbd operations
Revision history for this message
Edward Hope-Morley (hopem) wrote :

@zhhuabj see my above comment. I believe that the error you saw has actually been resolved and backported to ocata (i.e. the bug ref'd by @gorka) but there is still a related bug in the rbd driver which is not yet fixed. I am able to 100% repro this issue by deploying Queens, creating 100 volumes then deleting them all at once. The error in the logs is https://pastebin.ubuntu.com/p/zBFHPffzgh/ which is the same exception that you mentioned but is raised in a different place.

Revision history for this message
Hua Zhang (zhhuabj) wrote :

@hopem, yeah, I think you're totally right, all the places calling RBDVolumeProxy should have the same problem. This is a problem caused by the mix of green thread and local thread, the race between _get_usage_info and delete_volume is just one of them, so thank you very much to amend my previous title from 'racy between _get_usage_info and delete_volume' to 'race between threaded rbd operations' to react more closely to practical problem.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Ok I think I have a more accurate picture of what is going on here. For context I should say that we have observed this issue in an environment where a very large number (several thousand) of volumes where deleted at once and this manifested as all rpc threads consumed by the tasks that ensue. The *real* problem here is simply that when deleting an RBD volume, each rpc threads servicing the request will (by default) list and query *every* volume. This has two fallouts, the first being that the get_usage_info loop [0] is racing with deletes because each time it gets a list of volumes, by the time it queries each entry in the list that volume may have been deleted. The second issue is simply that is it doing this query of all volumes for every query all the time which simply doesn't scale. The first (images not found) has been fixed as previously mentioned but the second issue is not resolved. There is a way to stop this behaviour from happening which is to enable rbd_exclusive_cinder_pool [1] such that provisioned_capacity_gb stats are not queried and updated on deletes (or in the periodic task for that matter). Apart from the caveat described in the config desc I don't see any pitfall since, so long as the pool is only used by cinder, the allocated_capacity_gb should not get out of sync with provisioned_capacity_gb although probably worth keeping an eye on this.

An improvement to the volume/manager.py might be to allow a driver to differentiate between stats gathering from the periodic task [2] vs. an operation such as a delete [3] so that it can decide whether or not to do it.

[0] https://github.com/openstack/cinder/blob/12.0.3/cinder/volume/drivers/rbd.py#L402
[1] https://github.com/openstack/cinder/blob/12.0.3/cinder/volume/drivers/rbd.py#L106
[2] https://github.com/openstack/cinder/blob/12.0.3/cinder/volume/manager.py#L2544
[3] https://github.com/openstack/cinder/blob/12.0.3/cinder/volume/manager.py#L887

Changed in charm-cinder:
milestone: none → 18.11
importance: Undecided → Medium
assignee: nobody → Edward Hope-Morley (hopem)
affects: charm-cinder → charm-cinder-ceph
Changed in charm-cinder-ceph:
assignee: Edward Hope-Morley (hopem) → nobody
milestone: 18.11 → none
milestone: none → 18.11
assignee: nobody → Edward Hope-Morley (hopem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-cinder-ceph (master)

Fix proposed to branch: master
Review: https://review.openstack.org/600048

Changed in charm-cinder-ceph:
status: New → In Progress
tags: added: stable-backport
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-cinder-ceph (master)

Reviewed: https://review.openstack.org/600048
Committed: https://git.openstack.org/cgit/openstack/charm-cinder-ceph/commit/?id=b04b5f3b5f00f67eef2b76fbb5b64aba2b80719d
Submitter: Zuul
Branch: master

commit b04b5f3b5f00f67eef2b76fbb5b64aba2b80719d
Author: Edward Hope-Morley <email address hidden>
Date: Wed Sep 5 14:31:15 2018 +0100

    Add support for rbd_exclusive_cinder_pool

    As of the of the queens release cinder supports this config
    option which, if enabled, stops cinder from query all
    volumes in a pool every time it does a delete in order to
    get accurate pool usage stats. The problem is that this
    causes tons of non-fatal race conditions and slows down deletes
    to the point where the rpc thread pool fills up blocking
    further requests. Our charms do not configure pool by default
    and we are not aware of anyone doing this in the field so
    this patch enables this option by default.

    Change-Id: I5377e2886a6e206d30bd7dc38a7e43a085aa524c
    Closes-Bug: 1789828

Changed in charm-cinder-ceph:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-cinder-ceph (stable/18.08)

Fix proposed to branch: stable/18.08
Review: https://review.openstack.org/602676

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-cinder-ceph (stable/18.08)

Reviewed: https://review.openstack.org/602676
Committed: https://git.openstack.org/cgit/openstack/charm-cinder-ceph/commit/?id=3fb70295afadec26dc2d0fdae822bc0184b87e87
Submitter: Zuul
Branch: stable/18.08

commit 3fb70295afadec26dc2d0fdae822bc0184b87e87
Author: Edward Hope-Morley <email address hidden>
Date: Wed Sep 5 14:31:15 2018 +0100

    Add support for rbd_exclusive_cinder_pool

    As of the of the queens release cinder supports this config
    option which, if enabled, stops cinder from query all
    volumes in a pool every time it does a delete in order to
    get accurate pool usage stats. The problem is that this
    causes tons of non-fatal race conditions and slows down deletes
    to the point where the rpc thread pool fills up blocking
    further requests. Our charms do not configure pool by default
    and we are not aware of anyone doing this in the field so
    this patch enables this option by default.

    Change-Id: I5377e2886a6e206d30bd7dc38a7e43a085aa524c
    Closes-Bug: 1789828
    (cherry picked from commit b04b5f3b5f00f67eef2b76fbb5b64aba2b80719d)

David Ames (thedac)
Changed in charm-cinder-ceph:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.