reconstructor too aggressive to revert to handoff node

Bug #1653169 reported by clayg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
High
Unassigned

Bug Description

During a rebalance that's adding a few disks reconstructor revert jobs can get rejected for concurrency from the object-replication-server - this is normal expected behavior to limit the number of incoming streams on the receiving disks/nodes.

When the reconstructor encounters any failure from the ssync_sender - it tires to move on to revert to another handoff node.

I believe the behavior should only revert to handoff if the primary responds 507 (see lp bug #1510342)

Revision history for this message
clayg (clay-gerrard) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/425441
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=eadb01b8af3cfdea801441744c360c200b08b8cc
Submitter: Jenkins
Branch: master

commit eadb01b8af3cfdea801441744c360c200b08b8cc
Author: Clay Gerrard <email address hidden>
Date: Wed Jan 25 11:40:54 2017 -0800

    Do not revert fragments to handoffs

    We're already a handoff - just wait until we can ship it to the right
    primary location.

    If we timeout talking to a couple of nodes (or more likely get rejected
    for connection limits because of contention during a rebalance) we can
    actually end up making *more* work if we move the part to another node.
    I've seen clusters get stuck on rebalance just passing parts around
    handoffs for *days*.

    Known-Issues:

    If we see a 507 from a primary and we're not in the handoff list (we're
    an old primary post rebalance) it'd probably be not so terrible to try
    to revert it to the first handoff if it's not already holding a part.
    But that's more work and sounds more like lp bug #1510342

    Closes-Bug: #1653169

    Change-Id: Ie351d8342fc8e589b143f981e95ce74e70e52784

Changed in swift:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.13.0

This issue was fixed in the openstack/swift 2.13.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.