Neutron waits sequentially for read_timeout seconds for each connection in its connection pool. The default pool_size is 10 so it takes 10 minutes for Neutron server to be available after the VIP is moved.
This is log output from neutron-server after the VIP has been moved:
2014-03-05 17:48:23.844 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 17:49:23.887 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 17:50:24.055 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 17:51:24.067 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 17:52:24.079 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 17:53:24.115 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 17:54:24.123 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 17:55:24.131 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 17:56:24.143 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 17:57:24.163 9899 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
Here is the log output after the pool_size was changed to 7 and the read_timeout to 30.
2014-03-05 18:50:25.300 15731 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 18:50:55.331 15731 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 18:51:25.351 15731 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 18:51:55.387 15731 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 18:52:25.415 15731 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 18:52:55.427 15731 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 18:53:25.439 15731 WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2014-03-05 18:53:25.549 15731 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2
The issue seems to stem from the checkout pool event handler in openstack/ common/ db/sqlalchemy/ session. py. Each time a connection is requested from the pool this event handler checks that the connection is alive by running a query. In our case this query times out after 'read_timeout' seconds and the connection is recycled in the pool. This happens for each connection in the pool one at a time.
If we dispose of all connections in the pool when any connection fails the problem goes away. This does not seem like a good solution. So far in my testing I have only encountered this issue with neutron even though all services share this same code. I'm not sure why that's the case.