rmtree in obj replication is racy
Bug #1408061 reported by
John Dickinson
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Confirmed
|
Medium
|
Unassigned |
Bug Description
In object replication update_deleted(), there's a call to shutil.rmtree to delete the entire partition directory tree after the partition has been successfully pushed to the primary nodes.
However, it's possible[1] that a new object has been placed in the partition directory after the suffixes are calculated but before the rmtree is called. The code would delete this new object.
[1] For example if the client internal network is down, but the replication network is working just fine.
Changed in swift: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
To post a comment you must log in.
So I guess one thing we could do is lock the part dir throughout the process such that anyone trying to land an object here would get an error and go try the next handoff. Or, we can look at some of the work I'm doing on the EC version of this and once that's fully baked look at applying the same kind of scheme here. On the EC side we're not killing the entire part dir at once, we're actually removing .data files as they're sync'd (for a totally different reason) and then once we're done we attempt to remove the dir but if its not empty we just move on. The reason it wouldn't be empty isn't because of the race mentioned here, its because it could be a normal case where multiple fragment indexes are stored in the same place however that logic would eliminate the issue called out there of accidentally deleting something that landed because we could contact all the primaries and therefore made the assumption that the proxy could also... I kinda like the latter option (put this on hold til the reconstructor is done and then borrow)