old async pendings result in ghost listings
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
In Progress
|
Medium
|
Matthew Oliver |
Bug Description
If you leave a container disk unmounted for two long there's a really good chance you're going to end up with ghost listings [1] - this is mainly because async pendings for really old transactions are super edge-casey.
One way I've seen this happen is with write_affinity turned on. You can get an async for a PUT on some random handoff in the local region and the DELETE over on the primary. The updater will keep trying that container and getting that 507 until your fix your rings - longer after the container-
Once the new disk(s) come online and rings go out - BOTH updaters on separate nodes send their respective updates to the new database and there's a bunch of possible edge case outcomes.
1) if the DELETE async fails [2] or is lost the PUT will create the ghost listing
2) if the DELETE async 204's but gets reclaimed the PUT will create the ghost listing
I'm not sure exactly how common this is, but we know ghost listings are a problem and this is one scenario where it can definitely happen (see unittest).
I think we just need to be more careful when doing a container update for a PUT with a timestamp that's older than reclaim age.
1. Ghost Listings are rows in the container database replicas for deleted objects.
2. if the new device doesn't have a container db yet they get a 404 (Related lp bug #1328735)
Changed in swift: | |
importance: | Undecided → High |
Changed in swift: | |
importance: | High → Medium |
Changed in swift: | |
status: | Triaged → Confirmed |
I think it does not only happen when write_affinity enabled. If you try to send PUT requests multiple times and then DELETE it, you might see there is a DELETE request in async_pending on node-A and another PUT request in async_pending on node-B. So if the drive still unmount and the object was removed fully (reclaimed) from container DB, and then if the drive mounted back or a ring changed and DELETE request is finished before PUT requests. You'll see this issue.
* Force object-updater to check the handoff place when it got 507 from the primary place, and update it when the container DB is in handoff place. If not, just skip it and leave a warning msg and go for next run. I think container- replicator will take care the DB and sync it to the handoff place.