DB Exception during rescheduling

Bug #1497980 reported by Eugene Nikanorov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Eugene Nikanorov

Bug Description

The follwoing trace can be seen on Kilo code during router failover:

 28608 ERROR neutron.db.l3_agentschedulers_db [req-a4af4755-6bf4-4082-bf0f-f5ad12e341ac ] Exception encountered during router rescheduling.
 28608 TRACE neutron.db.l3_agentschedulers_db Traceback (most recent call last):
 28608 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 121, in reschedule_routers_from_down_agents
 28608 TRACE neutron.db.l3_agentschedulers_db self.reschedule_router(context, binding.router_id)
 28608 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 263, in reschedule_router
 28608 TRACE neutron.db.l3_agentschedulers_db self._unbind_router(context, router_id, agent['id'])
 28608 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/l3_dvrscheduler_db.py", line 357, in _unbind_router
 28608 TRACE neutron.db.l3_agentschedulers_db self.unbind_snat_servicenode(context, router_id)
 28608 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/l3_dvrscheduler_db.py", line 317, in unbind_snat_servicenode
 28608 TRACE neutron.db.l3_agentschedulers_db binding = self.unbind_snat(context, router_id)
 28608 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/neutron/db/l3_dvrscheduler_db.py", line 265, in unbind_snat
 28608 TRACE neutron.db.l3_agentschedulers_db binding = query.one()
 28608 TRACE neutron.db.l3_agentschedulers_db File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2378, in one
 28608 TRACE neutron.db.l3_agentschedulers_db "Multiple rows were found for one()")
 28608 TRACE neutron.db.l3_agentschedulers_db MultipleResultsFound: Multiple rows were found for one()

User impact: In case such condition is hit (multiple bindings for snat router) rescheduling will always fail, potentially preventing routers to failover.

tags: added: l3-ipam-dhcp
tags: added: db
Changed in neutron:
status: New → In Progress
Revision history for this message
Kevin Benton (kevinbenton) wrote :
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

I'm interested in how this got in to this situation. In Kilo, the router should only ever be scheduled to one agent. Is the fix for this a band-aid masking another problem?

Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

My previous comment was written with the context of dvr snat in mind specifically. I didn't make that clear.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Carl, I understand your concern, the case where we found that was during some heavy issues with messaging and agents.
Multiple router reschedulings did occur that has messed things up and then it stopped working because it started to fail each time server tried to reschedule a router.

In general, failover tests are a pain in the back for us, most of them are manual and they still discover quite a bunch of hidden issues with low reproducibility.
For this particular case I decide at least to fix the state when server could not do rescheduling at all.
Kind of band-aid, yes.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/221692
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=30bcbf3df8d38f7940233ba60554d93dea4a0692
Submitter: Jenkins
Branch: master

commit 30bcbf3df8d38f7940233ba60554d93dea4a0692
Author: Eugene Nikanorov <email address hidden>
Date: Wed Sep 9 14:40:17 2015 +0400

    Change router unbinding logic to be consistent with data model

    Model allows router to be bound to different agents
    Code should not make assumptions that the correspondence is 1-to-1

    Closes-Bug: #1497980
    Change-Id: Ieda9fc6e2d5a85194f2d022ea092cefb55183750

Changed in neutron:
status: In Progress → Fix Committed
tags: added: kilo-backport-potential liberty-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/233114

Akihiro Motoki (amotoki)
tags: added: liberty-backport-potential
removed: liberty-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/233114
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9a061c1a8bfeb72d17dec1b8f04e211efd76407c
Submitter: Jenkins
Branch: stable/liberty

commit 9a061c1a8bfeb72d17dec1b8f04e211efd76407c
Author: Eugene Nikanorov <email address hidden>
Date: Wed Sep 9 14:40:17 2015 +0400

    Change router unbinding logic to be consistent with data model

    Model allows router to be bound to different agents
    Code should not make assumptions that the correspondence is 1-to-1

    Closes-Bug: #1497980
    Change-Id: Ieda9fc6e2d5a85194f2d022ea092cefb55183750
    (cherry picked from commit 30bcbf3df8d38f7940233ba60554d93dea4a0692)

tags: added: in-stable-liberty
tags: removed: liberty-backport-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b1

This issue was fixed in the openstack/neutron 8.0.0.0b1 development milestone.

Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 7.0.1

This issue was fixed in the openstack/neutron 7.0.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.