keystone cache should be shared between HA units

Bug #1771114 reported by Trent Lloyd
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Charm Helpers
Invalid
Undecided
Unassigned
OpenStack Charm Guide
Fix Released
Undecided
Unassigned
OpenStack Keystone Charm
Fix Committed
Medium
Edward Hope-Morley
2023.1
Fix Committed
Undecided
Unassigned

Bug Description

[Problem]

Currently when you make changes to a role for a user, these changes may be inconsistently reflected when you have a HA Keystone configuration.

The reason for this is the use of an individual memcache on each keystone unit, where all memcache servers don't have their cache invalidated when a role is removed.

[Reproduction]

- Deploy a xenial-mitaka through queens environment with 3 keystone units and a VIP

openstack project create test
openstack user create test --password test --project test --domain admin_domain

- Download an OpenStack v3 RC file from openstack dashboard for 'admin' and 'test'

* As 'admin' user
source admin-openrc.sh
openstack network create admin1

* As 'test' user
source test-openrc.sh
openstack network create test1
openstack network list # should show only 'test1'

* As 'admin' user
source admin-openrc.sh
openstack role add --user test --project test Admin

* As 'test' user
source test-openrc.sh
openstack network list # do this a few times, should now show both 'test1' and 'admin1'
openstack network list
openstack network list

* As 'admin' user
openstack role remove --user test --project test Admin

* As 'test' user
source test-openrc.sh
openstack network list # do this a few times, sometimes you will see an inconsistent list showing either test1 or test1 and admin1 - depending on whether the keystone endpoint that 'neutron' hits had it's cache invalidated or not.
openstack network list
openstack network list

* Restart 'memcached' on each of the keystone servers
systemctl restart memcached

* Repeat test, inconsistency goes away.

You can further try delete the test user/project, re-add it and but then re-use the old test-openrc.sh which has the user and project ID hard coded and those IDs will partially work again depending on whether the cache was invalidated on that keystone host or not. Roles are not the only inconsistency.

[Possible Fixes]
 - Disable memcached on HA installations
 - Use a peered memcached solution (memcached itself does not have this built-in but other implementations and forks do)
 - Switch to redis (which supports peered implementations)
 - Set a faster memcached expiry and/or try to send keystone requests to a single server instead of round-robin

Revision history for this message
Trent Lloyd (lathiat) wrote :

memcached support was adding in the following bug:
https://bugs.launchpad.net/charm-keystone/+bug/1722541

description: updated
Trent Lloyd (lathiat)
Changed in charm-keystone:
status: New → Confirmed
Felipe Reyes (freyes)
tags: added: sts
Changed in charm-keystone:
milestone: none → 18.08
Revision history for this message
Edward Hope-Morley (hopem) wrote :

@lathiat ive tried this out for myself and I do not see the same behaviour - https://paste.ubuntu.com/p/TP2BxQF6kM/. Any idea what i might be doing differently?

Changed in charm-keystone:
status: Confirmed → Incomplete
James Page (james-page)
Changed in charm-keystone:
milestone: 18.08 → 18.11
Revision history for this message
Trent Lloyd (lathiat) wrote :

@hopem You said at the top of that paste "Test done on Mitaka with 3 units of neutron-api in HA" - the HA service should be Keystone and not neutron-api?

Trent Lloyd (lathiat)
Changed in charm-keystone:
status: Incomplete → New
tags: added: cpe-onsite
James Page (james-page)
Changed in charm-keystone:
milestone: 18.11 → 19.04
David Ames (thedac)
Changed in charm-keystone:
milestone: 19.04 → 19.07
Changed in charm-keystone:
importance: Undecided → Medium
status: New → Triaged
David Ames (thedac)
Changed in charm-keystone:
milestone: 19.07 → 19.10
David Ames (thedac)
Changed in charm-keystone:
milestone: 19.10 → 20.01
James Page (james-page)
Changed in charm-keystone:
milestone: 20.01 → 20.05
David Ames (thedac)
Changed in charm-keystone:
milestone: 20.05 → 20.08
James Page (james-page)
Changed in charm-keystone:
milestone: 20.08 → none
Revision history for this message
Phat (letonphat1988) wrote :

@lathiat are there any solutions for this case ? I have had same issue and it’s so critical when my service is swift storage that based on keystone middleware for authenticaton per file (need scale out more ha unit for token verify). Cached impacted API behaviors, we can’t be sure when responses become be consistency

Revision history for this message
Trent Lloyd (lathiat) wrote :

@Phat (letonphat1988)

There is currently no solution to this as far as I am aware, it only affects keystone deployed by the juju charm 'keystone' in a HA configuration.

If you're an Ubuntu Advantage customer I'd encourage you to open a support case and reference this bug link

Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Changed in charm-keystone:
assignee: nobody → Nicolas Bock (nicolasbock)
Changed in charm-keystone:
status: Triaged → In Progress
Revision history for this message
Muhammad Ahmad (ahmadfsbd) wrote :

Having exactly the same issue on stable/yoga deployment where keystone is deployed in HA. When adding or removing a role to a user, the results seem inconsistent.

For example: if a user has a 'member' role in a project and you add 'admin' role on top of it, 'openstack network list' will show networks from the project only sometimes ('member') and networks from all projects other times ('admin'). Same happens with removing a role.

Disabling the local memcached.service on keystone units and modifying keystone.conf to not use memcahche fixes this issue.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

We've been looking into this some more today and see little evidence to suggest that configuring multiple servers is the right way to go. The behaviour today is that by default the charms set a default expiration_time of 600s and while modules can set their own cache_time to override this, the charms do not configure this for [role]. This means that when you add/remove a role assignment, the cache local to the api host processed the request will be up-to-date and the peer hosts will not be for up to 600s. See
https://bugs.launchpad.net/charm-keystone/+bug/1899117 for a similar discussion and the solution there was to make the global expiration_time configurable such that if set to something low, the role cache_time will also use it. That should drastically reduce the impact of the problem but since reducing the global cache time could have performance side effects I suggest we make role cache_time configurable via the charm (similar to how catalog cache_time is already configurable).

Changed in charm-keystone:
status: In Progress → New
assignee: Nicolas Bock (nicolasbock) → nobody
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Fix goes directly into keystone charm so no charm-helpers patch required.

Changed in charm-helpers:
status: New → Invalid
Changed in charm-keystone:
status: New → In Progress
Changed in charm-keystone:
assignee: nobody → Edward Hope-Morley (hopem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-keystone (master)
Revision history for this message
Ghadi Rahme (ghadi-rahme) wrote :

Also after doing some research, it seems that the way memcached is implemented within keystone is nor the recommended nor the way it should be used. In it's current form if you have keystone in HA you also have memcached for every instance of keystone. But memcached does not replicate between multiple instances and caching services should not be expected to have replication either that's the role of the database.

There is more information from the memcached documentation warning about replication in memcached:
https://github.com/memcached/memcached/wiki/ProgrammingFAQ#how-do-you-handle-replication

And also for failover:
https://github.com/memcached/memcached/wiki/ProgrammingFAQ#how-do-you-handle-failover

The documentation also warns about scaling out beyond one instance of memcached (albeit in this case the example given was for using it as a queue but the same principle still applies in this scenario when reading from memcached):
https://github.com/memcached/memcached/wiki/ProgrammingFAQ#why-cant-we-use-memcached-as-a-queue-server

The way memcached should be implemented is to only have one instance of memcached for all the keystone instances instead of having one memcached instance for each.

Revision history for this message
Felipe Reyes (freyes) wrote :

The charm-guide needs to be updated with a new entry in the release notes for 2023.2 about this new config option.

Changed in charm-guide:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-keystone (master)

Reviewed: https://review.opendev.org/c/openstack/charm-keystone/+/885465
Committed: https://opendev.org/openstack/charm-keystone/commit/0cb787bb9d2e8a5c87821646f2387ae1f2dcd8a0
Submitter: "Zuul (22348)"
Branch: master

commit 0cb787bb9d2e8a5c87821646f2387ae1f2dcd8a0
Author: Edward Hope-Morley <email address hidden>
Date: Wed Jun 7 14:14:14 2023 +0100

    Make role-cache-expiration configurable

    We use a default expiration_time (dogpile-expiration-time)
    of 600s which means that role assignments will take up to
    this amount of time before all caches are updated to
    reflect changes. This may not be suitable for some clouds
    that make frequent changes to role assignments and lowering
    the global value is not recommended so this overrides the
    [role] cache_time to a more appropriate value and also
    makes it configurable. We leave default value as None so
    that the global value is still inherited but this at least
    allows it to be customised.

    Change-Id: I49e46e010c543f831959581b2122f59068f2c07b
    Closes-Bug: #1771114

Changed in charm-keystone:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-keystone (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/charm-keystone/+/885991

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-guide (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/charm-guide/+/886106

Changed in charm-guide:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-guide (master)

Reviewed: https://review.opendev.org/c/openstack/charm-guide/+/886106
Committed: https://opendev.org/openstack/charm-guide/commit/7c6618e69a2cfcd63450e5c264d4371d600e2675
Submitter: "Zuul (22348)"
Branch: master

commit 7c6618e69a2cfcd63450e5c264d4371d600e2675
Author: Edward Hope-Morley <email address hidden>
Date: Wed Jun 14 16:58:28 2023 +0100

    Add release note for new keystone role cache config

    Change-Id: I7c1f1e16ee1ac41318f16bac7ba8a134a200c003
    Closes-Bug: #1771114

Changed in charm-guide:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-keystone (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/charm-keystone/+/885991
Committed: https://opendev.org/openstack/charm-keystone/commit/64e5347b4b4ba2ad3b8fa53dbc7fa6dae8c42880
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 64e5347b4b4ba2ad3b8fa53dbc7fa6dae8c42880
Author: Edward Hope-Morley <email address hidden>
Date: Wed Jun 7 14:14:14 2023 +0100

    Make role-cache-expiration configurable

    We use a default expiration_time (dogpile-expiration-time)
    of 600s which means that role assignments will take up to
    this amount of time before all caches are updated to
    reflect changes. This may not be suitable for some clouds
    that make frequent changes to role assignments and lowering
    the global value is not recommended so this overrides the
    [role] cache_time to a more appropriate value and also
    makes it configurable. We leave default value as None so
    that the global value is still inherited but this at least
    allows it to be customised.

    Change-Id: I49e46e010c543f831959581b2122f59068f2c07b
    Closes-Bug: #1771114
    (cherry picked from commit 0cb787bb9d2e8a5c87821646f2387ae1f2dcd8a0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-keystone (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/charm-keystone/+/886363

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-keystone (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/charm-keystone/+/886363
Committed: https://opendev.org/openstack/charm-keystone/commit/afdb29fb3f539a0bc3e628484e35669a43cd93e7
Submitter: "Zuul (22348)"
Branch: stable/zed

commit afdb29fb3f539a0bc3e628484e35669a43cd93e7
Author: Edward Hope-Morley <email address hidden>
Date: Wed Jun 7 14:14:14 2023 +0100

    Make role-cache-expiration configurable

    We use a default expiration_time (dogpile-expiration-time)
    of 600s which means that role assignments will take up to
    this amount of time before all caches are updated to
    reflect changes. This may not be suitable for some clouds
    that make frequent changes to role assignments and lowering
    the global value is not recommended so this overrides the
    [role] cache_time to a more appropriate value and also
    makes it configurable. We leave default value as None so
    that the global value is still inherited but this at least
    allows it to be customised.

    Change-Id: I49e46e010c543f831959581b2122f59068f2c07b
    Closes-Bug: #1771114
    (cherry picked from commit 0cb787bb9d2e8a5c87821646f2387ae1f2dcd8a0)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-keystone (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/charm-keystone/+/886494

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-keystone (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/charm-keystone/+/886494
Committed: https://opendev.org/openstack/charm-keystone/commit/74fe8858a5f19f8bf49bbd451fdec9feaa2283b9
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 74fe8858a5f19f8bf49bbd451fdec9feaa2283b9
Author: Edward Hope-Morley <email address hidden>
Date: Wed Jun 7 14:14:14 2023 +0100

    Make role-cache-expiration configurable

    We use a default expiration_time (dogpile-expiration-time)
    of 600s which means that role assignments will take up to
    this amount of time before all caches are updated to
    reflect changes. This may not be suitable for some clouds
    that make frequent changes to role assignments and lowering
    the global value is not recommended so this overrides the
    [role] cache_time to a more appropriate value and also
    makes it configurable. We leave default value as None so
    that the global value is still inherited but this at least
    allows it to be customised.

    Change-Id: I49e46e010c543f831959581b2122f59068f2c07b
    Closes-Bug: #1771114
    (cherry picked from commit 0cb787bb9d2e8a5c87821646f2387ae1f2dcd8a0)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-keystone (master)

Change abandoned by "Edward Hope-Morley <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/charm-keystone/+/777197
Reason: For the reasons explained in https://bugs.launchpad.net/charm-keystone/+bug/1771114 this is no longer expected to be the way to resolve the problem and a different approach was merged. This patchset has also not had any updated in over a year so I will go ahead and abandon this patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.