Retry connection to mongodb on AutoReconnect exceptions

Bug #1309555 reported by Ionuț Arțăriși
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
Medium
Igor Degtiarov
Juno
Fix Released
Undecided
Unassigned

Bug Description

pymongo raises an AutoReconnect exception[1] when a ReplicaSet is used. This is supposed to be intercepted by client libraries which should decide if the operation can be retriggered or not. pymongo will handle reconnecting to a different Replica Set member transparently.

I have only encountered this exception when using ReplicaSets, but the upstream documentation doesn't mention them so it's possible that it is raised in other scenarios as well.

AFAICS this needs to be handled on each mongodb query in ceilometer. Read-only operations can just be re-triggered, but for write operations it might be more difficult.

[1] http://api.mongodb.org/python/current/api/pymongo/errors.html#pymongo.errors.AutoReconnect

Revision history for this message
Ionuț Arțăriși (mapleoin) wrote :

One dumb solution for read-only casses which works fine here looks like this:

Replace:

    meters = list(self.db.resource.find(q))

with:

    while True:
            try:
                meters = list(self.db.resource.find(q))
            except AutoReconnect:
                time.sleep(1)
            else:
                break

This could be wrapped in a reusable function, but I do not have enough ceilometer knowledge to know how to handle database queries involving writes.

Revision history for this message
Eoghan Glynn (eglynn) wrote :

Agreed, the native mongo replication feature can be used with the standard pymongo client, but fully leveraging mongo replication is not entirely transparent to the client-side.

Switching the ceilometer mongodb storage driver over to use the pymongo.MongoReplicaSetClient would give us the flexibility to control for example preference for reading from master or slaves, or write concerns around blocking for acknowledgements.

Also as stated above, we probably do mean some explicit handling of AutoReconnect exceptions

Changed in ceilometer:
status: New → Triaged
Changed in ceilometer:
assignee: nobody → Igor Degtiarov (idegtiarov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/122387

Changed in ceilometer:
status: Triaged → In Progress
Dina Belova (dbelova)
Changed in ceilometer:
importance: Undecided → Medium
milestone: none → juno-rc1
Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: juno-rc1 → kilo-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/122387
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=21d882c96cbbaeb8b78ff91e06e3615be97bff07
Submitter: Jenkins
Branch: master

commit 21d882c96cbbaeb8b78ff91e06e3615be97bff07
Author: Igor Degtiarov <email address hidden>
Date: Thu Oct 23 14:05:38 2014 +0300

    [MongoDB] Fix bug with reconnection to new master node

    Fixes bug with raising AutoReconnect exception when MongoDB ReplicaSet
    loses connection to primary node.

    Change-Id: Id0e81ba60b28d09adff6a10d04b412f25257d8ce
    Closes-Bug: #1309555

Changed in ceilometer:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/137133

Thierry Carrez (ttx)
Changed in ceilometer:
status: Fix Committed → Fix Released
JuanJo Ciarlante (jjo)
tags: added: canonical-bootstack
tags: added: juno-backport-potential
Thierry Carrez (ttx)
Changed in ceilometer:
milestone: kilo-1 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (stable/juno)

Reviewed: https://review.openstack.org/137133
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=b3307b1d6cd38572a78ca9f072b2e7ccd8cbd398
Submitter: Jenkins
Branch: stable/juno

commit b3307b1d6cd38572a78ca9f072b2e7ccd8cbd398
Author: Igor Degtiarov <email address hidden>
Date: Thu Oct 23 14:05:38 2014 +0300

    [MongoDB] Fix bug with reconnection to new master node

    Fixes bug with raising AutoReconnect exception when MongoDB ReplicaSet
    loses connection to primary node.

    Closes-Bug: #1309555

    Conflicts:
     ceilometer/event/storage/impl_db2.py
     ceilometer/event/storage/impl_mongodb.py
     ceilometer/storage/__init__.py
     ceilometer/storage/mongo/utils.py
            ceilometer/tests/storage/test_pymongo_base.py

    Conflicts are due to refactoring of the storage drivers and this patch
    has been modified to account for the refactor. The test case within the
    file test_pymongo_base.py was removed since equivalent test coverage
    was included in the cherry picked commit in the test_storage_scenarios.py
    file.

    Change-Id: Id0e81ba60b28d09adff6a10d04b412f25257d8ce
    (cherry-picked from commit 21d882c96cbbaeb8b78ff91e06e3615be97bff07)

tags: added: in-stable-juno
Revision history for this message
wekay102200 (wekay102200) wrote :

hi,dear all,I meet a problem when use pymongo and mongodb.

the error description as follows:

Traceback (most recent call last):
  File "/opt/ttwxenv/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/opt/ttwxenv/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
    return self.run(*args, **kwargs)
  File "/opt/ttwxenv/ttwx-server/src_2015-07-24_16-26-23/feeds/tasks.py", line 35, in process_new_share_like
    feeds.service.send_like_feed(liked_by, share_id)
  File "/opt/ttwxenv/ttwx-server/src_2015-07-24_16-26-23/feeds/service.py", line 48, in send_like_feed
    like = _get_like_info(sender_id, share_id)
  File "/opt/ttwxenv/ttwx-server/src_2015-07-24_16-26-23/feeds/service.py", line 846, in _get_like_info
    like = likes.service.get_share_like(share_id, sender_id)
  File "/opt/ttwxenv/ttwx-server/src_2015-07-24_16-26-23/likes/service.py", line 308, in get_share_like
    like = ShareLike.objects(share_id=share_id, liked_by=liked_by).first()
  File "/opt/ttwxenv/lib/python2.7/site-packages/mongoengine/queryset/base.py", line 256, in first
    result = queryset[0]
  File "/opt/ttwxenv/lib/python2.7/site-packages/mongoengine/queryset/base.py", line 150, in __getitem__
    return queryset._document._from_son(queryset._cursor[key],
  File "/opt/ttwxenv/lib/python2.7/site-packages/pymongo/cursor.py", line 538, in __getitem__
    for doc in clone:
  File "/opt/ttwxenv/lib/python2.7/site-packages/pymongo/cursor.py", line 904, in next
    if len(self.__data) or self._refresh():
  File "/opt/ttwxenv/lib/python2.7/site-packages/pymongo/cursor.py", line 848, in _refresh
    self.__uuid_subtype))
  File "/opt/ttwxenv/lib/python2.7/site-packages/pymongo/cursor.py", line 782, in __send_message
    res = client._send_message_with_response(message, **kwargs)
  File "/opt/ttwxenv/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1051, in _send_message_with_response
    raise AutoReconnect(str(e))
AutoReconnect: [Errno 104] Connection reset by peer

my python version is 2.7.8,pymongo version is 2.6,and mongodb version is 2.6.3.
mongodb use replcaSet,have 1 main db server(master server) and 3 slave db server(db1,db2,db3).

how to resolove this problem?
thanks all.

Revision history for this message
Julien Danjou (jdanjou) wrote :

I don't see any trace of Ceilometer in your traceback. Celery?

Revision history for this message
wekay102200 (wekay102200) wrote :

yes.my task is run in celery.@Julien Danjou (jdanjou)

Revision history for this message
Julien Danjou (jdanjou) wrote :

You are spamming the Ceilometer bug tracker here. Please go report that bug on Celery.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.