Engine doesn't retry a DB transaction on "Too many connections" error

Bug #1837532 reported by Renat Akhmerov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mistral
Fix Released
Undecided
Renat Akhmerov

Bug Description

The "Too many connections" error can be considered retriable because it's usually temporary. However, when DB drivers raise this error the layers about the driver (sqlalchemy, oslo.db) don't always convert the exception to one of the known exceptions such as sqlalchemy.exc.OperationalError, i.e. the same original exception type can be reused. This is probably a bug in SQLAlchemy but more investigation is needed.

This is a traceback from a real run:

2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue [-] Failed to run transactional engine operation.: OperationalError: (pymysql.err.OperationalError) (1040, u'Too many connections') (Background on this error at: http://sqlalche.me/e/e3q8)
2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue Traceback (most recent call last):
2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue File "/usr/lib/python2.7/site-packages/mistral/engine/post_tx_queue.py", line 119, in _process_tx_queue
2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue func(*args)
2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 285, in _check
2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue wf_handler.check_and_complete(self.wf_ex.id)
2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue result = f(*args, **kwargs)
2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue File "/usr/lib/python2.7/site-packages/mistral/engine/workflow_handler.py", line 92, in check_and_complete
2019-06-15 17:34:52.157 2818 ERROR mistral.engine.post_tx_queue wf_ex = db_api.load_workflow_execution(wf_ex_id)

....

The corresponding "DB error detected, operation will be retried" message doesn't exist in the log that would be an evidence that an error is retried.

description: updated
Changed in mistral:
assignee: nobody → Renat Akhmerov (rakhmerov)
milestone: none → train-1
Changed in mistral:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to mistral (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/672390

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (stable/stein)

Reviewed: https://review.opendev.org/672390
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=19ddb052b3f7d78dd22c354682cb9090ce6d2001
Submitter: Zuul
Branch: stable/stein

commit 19ddb052b3f7d78dd22c354682cb9090ce6d2001
Author: Renat Akhmerov <email address hidden>
Date: Tue Jul 23 16:59:57 2019 +0700

    Retry a DB transaction on "Too many connections" error

    * Writing a unit test is very problematic but the fix has been
      tested manually.

    Closes-Bug: #1837532

    Change-Id: I4fa15994a7359a5f90a0a4671d47b19fe928cf33

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (master)

Reviewed: https://review.opendev.org/672245
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=4a9d55a1b041007fc1b2340c5a143ffa2eb3f31d
Submitter: Zuul
Branch: master

commit 4a9d55a1b041007fc1b2340c5a143ffa2eb3f31d
Author: Renat Akhmerov <email address hidden>
Date: Tue Jul 23 16:59:57 2019 +0700

    Retry a DB transaction on "Too many connections" error

    * Writing a unit test is very problematic but the fix has been
      tested manually.

    Closes-Bug: #1837532

    Change-Id: I4fa15994a7359a5f90a0a4671d47b19fe928cf33

Changed in mistral:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 9.0.0.0b1

This issue was fixed in the openstack/mistral 9.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 8.1.0

This issue was fixed in the openstack/mistral 8.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to mistral (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/710170

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (stable/rocky)

Reviewed: https://review.opendev.org/710170
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=4ad66f140bc7cce9c1c7805a22d9d3c5600642d6
Submitter: Zuul
Branch: stable/rocky

commit 4ad66f140bc7cce9c1c7805a22d9d3c5600642d6
Author: Renat Akhmerov <email address hidden>
Date: Tue Jul 23 16:59:57 2019 +0700

    Retry a DB transaction on "Too many connections" error

    * Writing a unit test is very problematic but the fix has been
      tested manually.

    Closes-Bug: #1837532

    Change-Id: I4fa15994a7359a5f90a0a4671d47b19fe928cf33

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral rocky-eol

This issue was fixed in the openstack/mistral rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.