409 Conflict Container DELETE failed during glance image deletion

Bug #1899495 reported by Rajiv Mucheli
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-swiftclient
In Progress
Undecided
Tim Burke

Bug Description

Hi,

Openstack Glance Release : Train
Openstack Swift Release : Ussuri

In my production swift is the backend for glance, during image deletion, the below error msg is generated sporadically :

Container DELETE failed: https://objectstore-3:443/v1/AUTH_bf6877e6de1b4323973cba0889a1280f/glance_a28274f7-47f4-4fa4-9ff6-2258ffdde142 409 Conflict [first 60 chars of response] b'<html><h1>Conflict</h1><p>There was a conflict when trying t'

I did look into the below issues, but they are old, hence assuming they are fixed :

https://bugs.launchpad.net/swift/+bug/1546865
https://bugs.launchpad.net/swift/+bug/1253478
https://bugs.launchpad.net/swift/+bug/1233111
https://bugs.launchpad.net/swift/+bug/1207108

When i increased the glance default glance chunk size from 200Mb to 500Mb, there were several error msgs but restoring it to default value of 200Mb, the error count reduced but still exists.

Please let me know if further info is required and how to fix this.

Regards,
Rajiv

Revision history for this message
kiran pawar (kpdev) wrote :

I have tested this with glance images of size 1,5,10,20 GB.
All images upload with chunk size 100,200,500 MB and then delete those images. All delete operations were fine without any 409 error.
Are steps of reproduction correct ?

Revision history for this message
Rajiv Mucheli (rajiv.mucheli) wrote :

Hi Kiran,

Thanks for testing, could test with bigger image sizes ? approx +200Gb vmdks.

As you are aware, HTTP 409 is generated if there is conflict with the current operation( busy/occupied). Could this be related swift/glance wrt serial/parallel operations ?

Regards,
Rajiv

Revision history for this message
Tim Burke (1-tim-z) wrote :

I've got a suspicion that the SLO delete at https://github.com/openstack/glance_store/blob/68200a88/glance_store/_drivers/swift/store.py#L1110-L1112 may be returning an error in the body -- in Swift, it piggy-backs off the bulk-delete functionality (https://github.com/openstack/swift/blob/2.28.0/swift/common/middleware/slo.py#L1639-L1642) which can stuff failures in the body: https://github.com/openstack/swift/blob/2.28.0/swift/common/middleware/bulk.py#L228-L273

I *think* what we need is to have swiftclient to notice the `multipart-manifest=delete` in the query string, (probably tack on an `Accept: application/json` header,) parse the body to check for errors around https://github.com/openstack/python-swiftclient/blob/3.12.0/swiftclient/client.py#L1653-L1664, and raise if there are any.

affects: swift → python-swiftclient
Changed in python-swiftclient:
assignee: nobody → Tim Burke (1-tim-z)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-swiftclient (master)
Changed in python-swiftclient:
status: New → In Progress
Revision history for this message
Rajiv Mucheli (rajiv.mucheli) wrote :

Hi Tim,

Thanks for looking into this but Glance uses DLO and not SLO, PFB:

https://github.com/openstack/glance_store/blob/stable/wallaby/glance_store/_drivers/swift/store.py#L1038-L1043

Revision history for this message
Tim Burke (1-tim-z) wrote :

Oh, so it does.

Stepping back a bit, it looks like the only container delete in glance_store is in MultiTenantStore.delete: https://github.com/openstack/glance_store/blob/master/glance_store/_drivers/swift/store.py#L1507-L1512

It seems like the idea is to delete a specific image, and it's just trying to clean up the container in case it's now empty. You said it was sporadic, but it seems like if you've got multiple images in a container, deleting the first one should reliably trip the 409. Probably want to get the opinion of someone better acquainted with glance, but I think we might just need that last line to be something like

    try:
        # try to clean up the container, too
        connection.delete_container(location.store_location.container)
    except swiftclient.ClientException as e:
        if e.http_status != http_client.CONFLICT:
            raise
        # else, must not be empty yet

Revision history for this message
Rajiv Mucheli (rajiv.mucheli) wrote :

yes, we did try adding another thread for deletion (where DEFAULT_CONTAINER_DELETE_ATTEMPTS is 5)

https://github.com/sapcc/glance_store/blob/stable/xena-m3/glance_store/_drivers/swift/store.py#L1617-L1639

but the 409 conflict issues persists.

Secondly, based on https://opendev.org/openstack/glance/src/branch/stable/xena/api-ref/source/v2/images-images-v2.inc#L682-L712

The above response code should be HTTPP 409 but i get a weird HTTP 500, any suggestions why ?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.