oslo.cache's pymemcache backend doesn't recover from socket disconnection
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Confirmed
|
Undecided
|
Damien Ciabrini |
Bug Description
When oslo.cache is enabled and configured to target pymemcache (e.g. memcached + TLS-e),
pymemcache is managing the sockets that connect to memcached.
With this configuration, there is no automatic retry in pymemcache on socket error
or socket disconnection. Instead, pymemcache closes the invalid socket and raises
an Exception down the stack. This makes the oslo cache call fail, and any subsequent
calls will also fail until all bad sockets are hit and closed.
Try can consistently been triggered by:
1. running "openstack service list" on the overcloud to create connection to memcache
2. restart memcached with "systemctl restart tripleo_memcached" to
force the connected sockets to close one side of its connection.
This will leave <x> opened sockets on the controller:
the keystone service will have its side of the socket still
opened.
3. the next call to "openstack service list" will fail because
pymemcache will hit a half-closed socket, close its side, and
raise an exception
4. the keystone service will recover only once the remaining <x>-1 half-closed sockets
get hit and closed.
Changed in tripleo: | |
milestone: | xena-2 → xena-3 |