Container sync stops if object server is down
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Darrell Bishop |
Bug Description
Container sync stops syncing a container if the object-server is down (refusing connection or timeout).
container/sync.py container_
The relevant code looks like. The suggested fix is marked with + (i.e., catch Exception and Timeout *here* rather than in the outer loop)
for node in nodes:
+ except (Exception, Timeout), err:
+ exc = err
if timestamp < looking_
I have not submitted a fix because I'm not sure I understand the implications of not getting the absolute latest object copy. I would have thought it's ok if we had a good response from *any* object-server and use it to sync the remote side. At worst, it's a old copy and the remote will ignore it because the timestamp is older....and if this container-server's database is slightly out of date, as soon as we replication brings in the latest object, it will trigger a sync again.
I see a related issue with 404 not-found in bug #1068423. In that case although we might have retrieved two copies, because one copy is missing, we raise the exc and give up.
Also note, the handoff nodes are not used. For bug #1068423 the missing copy might simply be on a handoff node and not yet replicated.
Changed in swift: | |
assignee: | nobody → Donagh McCabe (donagh-mccabe) |
status: | New → In Progress |
Changed in swift: | |
assignee: | Donagh McCabe (donagh-mccabe) → Darrell Bishop (darrellb) |
Changed in swift: | |
milestone: | none → 1.7.5 |
status: | Fix Committed → Fix Released |
On re-reading the code I see my comment about bug #1068423 is wrong -- it only applies if all copies are missing.