tripleo

oslo.cache's pymemcache backend doesn't recover from socket disconnection

Bug #1934130 reported by Damien Ciabrini on 2021-06-30

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Confirmed	Undecided	Damien Ciabrini	tripleo xena-3

Bug Description

When oslo.cache is enabled and configured to target pymemcache (e.g. memcached + TLS-e),
pymemcache is managing the sockets that connect to memcached.

With this configuration, there is no automatic retry in pymemcache on socket error
or socket disconnection. Instead, pymemcache closes the invalid socket and raises
an Exception down the stack. This makes the oslo cache call fail, and any subsequent
calls will also fail until all bad sockets are hit and closed.

Try can consistently been triggered by:
1. running "openstack service list" on the overcloud to create connection to memcache

  2. restart memcached with "systemctl restart tripleo_memcached" to
     force the connected sockets to close one side of its connection.
     This will leave <x> opened sockets on the controller:
     the keystone service will have its side of the socket still
     opened.

  3. the next call to "openstack service list" will fail because
     pymemcache will hit a half-closed socket, close its side, and
     raise an exception

4. the keystone service will recover only once the remaining <x>-1 half-closed sockets
get hit and closed.

Marios Andreou (marios-b) on 2021-07-21

Changed in tripleo:
milestone:	xena-2 → xena-3

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.