Container Sync might lose the right x-sync-point2 resulting in not syncing objects
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Confirmed
|
Medium
|
Unassigned |
Bug Description
container sync scenario - error path
Consider the case where there is a problem syncing row at #357 (link to line at the end of report), and next_sync_point is keeping the pointer to the problematic row.
In lines #362-363 we fetch the next row and update+persist sync_point_2 (point2 > next_sync_point).
line #344 satisfied, and we sync more objects.
Line #364 outside of the while loop is aware of next_sync_point - and should persist the desired sync_point_2 to retry failed objects.
Now let's assume the code breaks just before performing line #364. (node fails/container sync daemon stopped)
next_sync_point holding a pointer to failed object, which failed to replicate - therefore not on target container, though should, but since it's not persistent it's lost.
On restart of the service - sync_point_2 is now (#322) more advanced and will not try to sync again the failed object(s) indicated with next_sync_point (next_sync_point < persistent sync_point_2), so I suspect we might result in objects never synced to target(?)
For the failure scenario, all the replicas have to fail before setting back x_container_
Better to persist value only if it's the right value, keeping values in memory might result in losing critical information. Also, if we fail syncing, it might be a good idea to return from the method, and not continue to next rows.
The issue is fixed and does not exists in a in-review patch by Eran,
https:/
https:/
https:/
https:/
https:/
tags: | added: container sync |
The line references are out of date with respect to master branch but I believe can see the bug as described in these lines [1]
sync of row x fails so next_sync_point is set to x
continue to successfuly sync row y, then z, then the broker sync_point2 is updated to y, then z
intention is that once the while loop completes, the broker sync_point2 is rolled back to the in memory next_sync_point, but if the process dies before that happens then the broker sync_point2 is never rolled back to the failed row.
[1] https:/ /github. com/openstack/ swift/blob/ 671254224a4a471 0e7556535ee68bd 999536ab8d/ swift/container /sync.py# L396-L407