Removing offline memcache servers from config and restarting nova-api and keystone is enough to bring nova back up to speed - of course this is a manual and temporary change only during the period when there is only 1 controller, but so is the crm policy change, so as a workaround I believe that is ok for now.
Still this issue should be visible even with 1 controller down, and we should try to implement a better approach to memcache.
We can fully imitate the current mamcache setup with haproxy and overcome this issue by making X endpoints (according to the amount of controllers). Each endpoint will point to a single controller. However, all endpoints will be accessible all the time aven if some controllers are down, thanks to HAProxy, so we will get rejects quickly from haproxy instead of timed out connections.
Removing offline memcache servers from config and restarting nova-api and keystone is enough to bring nova back up to speed - of course this is a manual and temporary change only during the period when there is only 1 controller, but so is the crm policy change, so as a workaround I believe that is ok for now.
Still this issue should be visible even with 1 controller down, and we should try to implement a better approach to memcache.
We can fully imitate the current mamcache setup with haproxy and overcome this issue by making X endpoints (according to the amount of controllers). Each endpoint will point to a single controller. However, all endpoints will be accessible all the time aven if some controllers are down, thanks to HAProxy, so we will get rejects quickly from haproxy instead of timed out connections.