MySQL Gone Away Generated

Bug #706405 reported by Rick Harris
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Rick Harris

Bug Description

SQLAlchemy doesn't appear to be re-establishing the connection to the database automatically if the connection drops/times-out for whatever reason.

This causes a MySQL Gone Away error to be generated.

http://paste.openstack.org/show/545/

Related branches

Revision history for this message
Thierry Carrez (ttx) wrote :

Using pool_recycle was supposed to make sure we refresh the connection before MySQL expires it... but that does not cover MySQL connection drops.

Maybe our general use of expire_on_commit=False inhibits proper connection recycling ?

Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 706405] Re: MySQL Gone Away Generated

I also thought we had a fix for this, perhaps the db mingration merge caused a regression.

Vish

On Jan 24, 2011, at 8:21 AM, Thierry Carrez wrote:

> Using pool_recycle was supposed to make sure we refresh the connection
> before MySQL expires it... but that does not cover MySQL connection
> drops.
>
> Maybe our general use of expire_on_commit=False inhibits proper
> connection recycling ?
>
> --
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/706405
>
> Title:
> MySQL Gone Away Generated
>
> Status in OpenStack Compute (Nova):
> New
>
> Bug description:
> SQLAlchemy doesn't appear to be re-establishing the connection to the
> database automatically if the connection drops/times-out for whatever
> reason.
>
> This causes a MySQL Gone Away error to be generated.
>
> http://paste.openstack.org/show/545/
>
>

Revision history for this message
Devin Carlen (devcamcar) wrote :

Looks like we're getting this in our test lab as well.

On Jan 24, 2011, at 10:04 AM, Vish Ishaya wrote:

> I also thought we had a fix for this, perhaps the db mingration merge
> caused a regression.
>
> Vish
>
> On Jan 24, 2011, at 8:21 AM, Thierry Carrez wrote:
>
>> Using pool_recycle was supposed to make sure we refresh the connection
>> before MySQL expires it... but that does not cover MySQL connection
>> drops.
>>
>> Maybe our general use of expire_on_commit=False inhibits proper
>> connection recycling ?
>>
>> --
>> You received this bug notification because you are a member of Nova Bug
>> Team, which is subscribed to OpenStack Compute (nova).
>> https://bugs.launchpad.net/bugs/706405
>>
>> Title:
>> MySQL Gone Away Generated
>>
>> Status in OpenStack Compute (Nova):
>> New
>>
>> Bug description:
>> SQLAlchemy doesn't appear to be re-establishing the connection to the
>> database automatically if the connection drops/times-out for whatever
>> reason.
>>
>> This causes a MySQL Gone Away error to be generated.
>>
>> http://paste.openstack.org/show/545/
>>
>>
>
> --
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/706405
>
> Title:
> MySQL Gone Away Generated
>
> Status in OpenStack Compute (Nova):
> New
>
> Bug description:
> SQLAlchemy doesn't appear to be re-establishing the connection to the
> database automatically if the connection drops/times-out for whatever
> reason.
>
> This causes a MySQL Gone Away error to be generated.
>
> http://paste.openstack.org/show/545/
>
>

Revision history for this message
Thierry Carrez (ttx) wrote :

@Vish: we added pool_recycle to force connection recycling before MySQL timeouts unused connections. But here we are just talking about errors generated if for example you restart MySQL.

It does indeed generate "MySQL Gone Away" errors, but it also reestablishes connection as soon as the DB is available again, so setting importance to Medium.

Someone knowing SQLAlchemy better than I do should look into this. Apparently the way we use it prevents the pool mechanism from working correctly (connections are not returned to the pool afaict). We use autocommit=true, but also session.begin() which inhibits autocommit behavior (so maybe some session.commit() are missing).

Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Vish Ishaya (vishvananda) wrote :

Apparently MySQL gone away does not actually cause a retry. We can protect ourselves somewhat using the strategy here:

http://<email address hidden>/msg15079.html

I don't know the implications of having the server completely go down in this situation.

We can probably also catch the db errors in a wrapper (like i do here: https://code.launchpad.net/~vishvananda/nova/friendly-db) and retry a number of times, or automatically disable the service and keep trying to reconnect every x minutes.

It definitely isn't a trivial fix though, so this will probably have to happen in Cactus.

Vish

On Jan 25, 2011, at 6:01 AM, Thierry Carrez wrote:

> @Vish: we added pool_recycle to force connection recycling before MySQL
> timeouts unused connections. But here we are just talking about errors
> generated if for example you restart MySQL.
>
> It does indeed generate "MySQL Gone Away" errors, but it also
> reestablishes connection as soon as the DB is available again, so
> setting importance to Medium.
>
> Someone knowing SQLAlchemy better than I do should look into this.
> Apparently the way we use it prevents the pool mechanism from working
> correctly (connections are not returned to the pool afaict). We use
> autocommit=true, but also session.begin() which inhibits autocommit
> behavior (so maybe some session.commit() are missing).
>
> ** Changed in: nova
> Importance: Undecided => Medium
>
> ** Changed in: nova
> Status: New => Confirmed
>
> --
> You received this bug notification because you are a member of Nova Bug
> Team, which is subscribed to OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/706405
>
> Title:
> MySQL Gone Away Generated
>
> Status in OpenStack Compute (Nova):
> Confirmed
>
> Bug description:
> SQLAlchemy doesn't appear to be re-establishing the connection to the
> database automatically if the connection drops/times-out for whatever
> reason.
>
> This causes a MySQL Gone Away error to be generated.
>
> http://paste.openstack.org/show/545/
>
>

Revision history for this message
Rick Harris (rconradharris) wrote :

Figured out what's going on here.

Our flag which controls SQLAlchemy's pool_recycle feature, `sql_idle_timeout`, was defined as a string not an integer.

Since integers always compare as less than strings (in Python), this makes the timeout in-effect infinite.

Changed in nova:
assignee: nobody → Rick Harris (rconradharris)
status: Confirmed → In Progress
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.