The issue seems to stem from the checkout pool event handler in openstack/common/db/sqlalchemy/session.py. Each time a connection is requested from the pool this event handler checks that the connection is alive by running a query. In our case this query times out after 'read_timeout' seconds and the connection is recycled in the pool. This happens for each connection in the pool one at a time.
If we dispose of all connections in the pool when any connection fails the problem goes away. This does not seem like a good solution. So far in my testing I have only encountered this issue with neutron even though all services share this same code. I'm not sure why that's the case.
The issue seems to stem from the checkout pool event handler in openstack/ common/ db/sqlalchemy/ session. py. Each time a connection is requested from the pool this event handler checks that the connection is alive by running a query. In our case this query times out after 'read_timeout' seconds and the connection is recycled in the pool. This happens for each connection in the pool one at a time.
If we dispose of all connections in the pool when any connection fails the problem goes away. This does not seem like a good solution. So far in my testing I have only encountered this issue with neutron even though all services share this same code. I'm not sure why that's the case.