OpenStack Object Storage (swift)

reconstructor too aggressive to revert to handoff node

Bug #1653169 reported by clayg on 2016-12-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Fix Released	High	Unassigned

Bug Description

During a rebalance that's adding a few disks reconstructor revert jobs can get rejected for concurrency from the object-replication-server - this is normal expected behavior to limit the number of incoming streams on the receiving disks/nodes.

When the reconstructor encounters any failure from the ssync_sender - it tires to move on to revert to another handoff node.

I believe the behavior should only revert to handoff if the primary responds 507 (see lp bug #1510342)

Revision history for this message

clayg (clay-gerrard) wrote on 2017-01-26:

fix is here: https://review.openstack.org/#/c/425441/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-31: Fix merged to swift (master)

Reviewed: https://review.openstack.org/425441
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=eadb01b8af3cfdea801441744c360c200b08b8cc
Submitter: Jenkins
Branch: master

commit eadb01b8af3cfdea801441744c360c200b08b8cc
Author: Clay Gerrard <email address hidden>
Date: Wed Jan 25 11:40:54 2017 -0800

Do not revert fragments to handoffs

We're already a handoff - just wait until we can ship it to the right
primary location.

    If we timeout talking to a couple of nodes (or more likely get rejected
    for connection limits because of contention during a rebalance) we can
    actually end up making *more* work if we move the part to another node.
    I've seen clusters get stuck on rebalance just passing parts around
    handoffs for *days*.

Known-Issues:

    If we see a 507 from a primary and we're not in the handoff list (we're
    an old primary post rebalance) it'd probably be not so terrible to try
    to revert it to the first handoff if it's not already holding a part.
    But that's more work and sounds more like lp bug #1510342

Closes-Bug: #1653169

Change-Id: Ie351d8342fc8e589b143f981e95ce74e70e52784

Changed in swift:
status:	New → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-16: Fix included in openstack/swift 2.13.0

This issue was fixed in the openstack/swift 2.13.0 release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.