Bug #1809469 “keystone_fernet incorrectly calculates rotation sc...” : Series pike : Bugs : kolla-ansible

Revision history for this message

John Garbutt (johngarbutt) wrote on 2019-05-09:

#1

So I believe the tokens Keystone hands out last 1 hour (not sure on that), and with three controllers the default behaviour is to rotate every 8 hours:

ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
0 0 * * * /usr/bin/fernet-rotate.sh
ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
0 8 * * * /usr/bin/fernet-rotate.sh
ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
0 16 * * * /usr/bin/fernet-rotate.sh

For each of these, fernet-rotate is giving you roughly the behaviour noted here:
https://docs.openstack.org/keystone/pike/admin/identity-fernet-token-faq.html#how-should-i-approach-key-distribution

The logs show the correct things happening:
May 9th 2019, 09:00:02.000 INFO ctrl2 keystone Excess key to purge: /etc/keystone/fernet-keys/139
May 9th 2019, 01:00:02.000 INFO ctrl1 keystone Excess key to purge: /etc/keystone/fernet-keys/138
May 8th 2019, 17:00:03.000 INFO ctrl3 keystone Excess key to purge: /etc/keystone/fernet-keys/137

However, we still see these logs from keystone:
May 9th 2019, 09:06:34.000 WARNING ctrl1 keystone
This is not a recognized Fernet token <snip> TokenNotFound

Which suggests some clients think they have a valid token, but they don't, after the above rotation.

Possibly we need to set keystone CONF.fernet_tokens.max_active_keys?

cfg.IntOpt(
    'max_active_keys',
    default=3,
    min=1,
    help=utils.fmt("""
This controls how many keys are held in rotation by `keystone-manage
fernet_rotate` before they are discarded. The default value of 3 means that
keystone will maintain one staged key (always index 0), one primary key (the
highest numerical index), and one secondary key (every other index). Increasing
this value means that additional secondary keys will be kept in the rotation.
"""))

Revision history for this message

John Garbutt (johngarbutt) wrote on 2019-05-09:

#2

so token timeout is 1 day...

[token]
revoke_by_id = False
provider = fernet
expiration = 86400

but we have already rotated out too many keys by then...

we need to update max_active_keys to match the number of controllers.

Revision history for this message

John Garbutt (johngarbutt) wrote on 2019-05-09:

#3

So looks like this is incorrect:
https://github.com/openstack/kolla-ansible/blob/2dd69e9140b1ce1bd248c5c09217fb3a6502a9fc/ansible/roles/keystone/templates/keystone.conf.j2#L37

I think we want +2 there

Revision history for this message

John Garbutt (johngarbutt) wrote on 2019-05-09:

#4

Actually it is more complicated, due to:

# This controls the number of seconds that a token can be retrieved for beyond
# the built-in expiry time. This allows long running operations to succeed.
# Defaults to two days. (integer value)
#allow_expired_window = 172800

So we have three days of needing to read the tokens.

In that time we have 9 key rotations with three controllers, plus we want a staging key out there, plus one for wiggle room.

Mark Goddard (mgoddard) on 2019-05-15

Changed in kolla-ansible:
importance:	Undecided → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-16: Change abandoned on kolla-ansible (master)

#5

Change abandoned by John Garbutt (<email address hidden>) on branch: master
Review: https://review.opendev.org/657967

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-16: Related fix proposed to kolla-ansible (master)

#6

Related fix proposed to branch: master
Review: https://review.opendev.org/659619

Mark Goddard (mgoddard) on 2019-05-28

summary:

- keystone_fernet container runs token rotate on multiple hosts
+ keystone_fernet incorrectly calculates rotation schedule

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-06: Related fix merged to kolla-ansible (master)

#7

Reviewed: https://review.opendev.org/659619
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=25ac955a4e2645da29f8c7b807f0bac5afb43838
Submitter: Zuul
Branch: master

commit 25ac955a4e2645da29f8c7b807f0bac5afb43838
Author: Mark Goddard <email address hidden>
Date: Thu May 16 14:01:39 2019 +0100

Add unit test for keystone fernet cron generator

Before making changes to this script, document its behaviour with a unit
test.

There are two major issues:

    * requesting an interval of more than 1 day results in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, results in no jobs

Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
Related-Bug: #1809469

Changed in kolla-ansible:
status:	New → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-06: Fix merged to kolla-ansible (master)

#8

Download full text (3.3 KiB)

Reviewed: https://review.opendev.org/659293
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=6c1442c385450004dd253f3f464fe4336194be99
Submitter: Zuul
Branch: master

commit 6c1442c385450004dd253f3f464fe4336194be99
Author: Mark Goddard <email address hidden>
Date: Thu May 16 17:26:45 2019 +0100

Fix keystone fernet key rotation scheduling

    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.

When we over rotate keys, we get logs like this:

This is not a recognized Fernet token <token> TokenNotFound

    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.

With three controllers, in crontab in keystone-fernet we see the once a day
correctly staggered across the three controllers:

    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh

Currently with three controllers we have this keystone config:

    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)

[fernet_tokens]
max_active_keys = 4

Currently, kolla-ansible configures key rotation according to the following:

rotation_interval = token_expiration / num_hosts

This means we rotate keys more quickly the more hosts we have, which doesn't
make much sense.

Keystone docs state:

max_active_keys =
((token_expiration + allow_expired_window) / rotation_interval) + 2

For details see:
https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.

    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.

This change also fixes the fernet cron job generator, which was broken
in the following cases:

    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs

It should now b...

Reviewed:  https://review.opendev.org/659293
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=6c1442c385450004dd253f3f464fe4336194be99
Submitter: Zuul
Branch:    master

commit 6c1442c385450004dd253f3f464fe4336194be99
Author: Mark Goddard <mark@stackhpc.com>
Date:   Thu May 16 17:26:45 2019 +0100

Fix keystone fernet key rotation scheduling
    
    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.
    
    When we over rotate keys, we get logs like this:
    
     This is not a recognized Fernet token <token> TokenNotFound
    
    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.
    
    With three controllers, in crontab in keystone-fernet we see the once a day
    correctly staggered across the three controllers:
    
    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh
    
    Currently with three controllers we have this keystone config:
    
    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)
    
    [fernet_tokens]
    max_active_keys = 4
    
    Currently, kolla-ansible configures key rotation according to the following:
    
       rotation_interval = token_expiration / num_hosts
    
    This means we rotate keys more quickly the more hosts we have, which doesn't
    make much sense.
    
    Keystone docs state:
    
       max_active_keys =
         ((token_expiration + allow_expired_window) / rotation_interval) + 2
    
    For details see:
    https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html
    
    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.
    
    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.
    
    This change also fixes the fernet cron job generator, which was broken
    in the following cases:
    
    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs
    
    It should now be possible to request any interval up to a week divided
    by the number of hosts.
    
    Change-Id: I10c82dc5f83653beb60ddb86d558c5602153341a
    Closes-Bug: #1809469

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-18: Related fix proposed to kolla-ansible (stable/stein)

#9

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/666086

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-18: Related fix proposed to kolla-ansible (stable/rocky)

#10

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/666087

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-18: Related fix proposed to kolla-ansible (stable/queens)

#11

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/666088

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-18: Fix proposed to kolla-ansible (stable/stein)

#12

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/666090

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-18: Fix proposed to kolla-ansible (stable/rocky)

#13

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/666093

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-18: Fix proposed to kolla-ansible (stable/queens)

#14

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/666095

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-20: Related fix merged to kolla-ansible (stable/stein)

#15

Reviewed: https://review.opendev.org/666086
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=1a6e9f7e927ebb3c2f021befc3630f4279dbceb1
Submitter: Zuul
Branch: stable/stein

commit 1a6e9f7e927ebb3c2f021befc3630f4279dbceb1
Author: Mark Goddard <email address hidden>
Date: Thu May 16 14:01:39 2019 +0100

Add unit test for keystone fernet cron generator

Before making changes to this script, document its behaviour with a unit
test.

There are two major issues:

    * requesting an interval of more than 1 day results in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, results in no jobs

    Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
    Related-Bug: #1809469
    (cherry picked from commit 25ac955a4e2645da29f8c7b807f0bac5afb43838)

tags:

added: in-stable-stein

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-20: Fix merged to kolla-ansible (stable/stein)

#16

Download full text (3.4 KiB)

Reviewed: https://review.opendev.org/666090
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=8e627c1eef5e7ef047cd3860d162a0e2a800e5ab
Submitter: Zuul
Branch: stable/stein

commit 8e627c1eef5e7ef047cd3860d162a0e2a800e5ab
Author: Mark Goddard <email address hidden>
Date: Thu May 16 17:26:45 2019 +0100

Fix keystone fernet key rotation scheduling

    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.

When we over rotate keys, we get logs like this:

This is not a recognized Fernet token <token> TokenNotFound

    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.

With three controllers, in crontab in keystone-fernet we see the once a day
correctly staggered across the three controllers:

    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh

Currently with three controllers we have this keystone config:

    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)

[fernet_tokens]
max_active_keys = 4

Currently, kolla-ansible configures key rotation according to the following:

rotation_interval = token_expiration / num_hosts

This means we rotate keys more quickly the more hosts we have, which doesn't
make much sense.

Keystone docs state:

max_active_keys =
((token_expiration + allow_expired_window) / rotation_interval) + 2

For details see:
https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.

    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.

This change also fixes the fernet cron job generator, which was broken
in the following cases:

    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs

It should...

Reviewed:  https://review.opendev.org/666090
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=8e627c1eef5e7ef047cd3860d162a0e2a800e5ab
Submitter: Zuul
Branch:    stable/stein

commit 8e627c1eef5e7ef047cd3860d162a0e2a800e5ab
Author: Mark Goddard <mark@stackhpc.com>
Date:   Thu May 16 17:26:45 2019 +0100

Fix keystone fernet key rotation scheduling
    
    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.
    
    When we over rotate keys, we get logs like this:
    
     This is not a recognized Fernet token <token> TokenNotFound
    
    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.
    
    With three controllers, in crontab in keystone-fernet we see the once a day
    correctly staggered across the three controllers:
    
    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh
    
    Currently with three controllers we have this keystone config:
    
    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)
    
    [fernet_tokens]
    max_active_keys = 4
    
    Currently, kolla-ansible configures key rotation according to the following:
    
       rotation_interval = token_expiration / num_hosts
    
    This means we rotate keys more quickly the more hosts we have, which doesn't
    make much sense.
    
    Keystone docs state:
    
       max_active_keys =
         ((token_expiration + allow_expired_window) / rotation_interval) + 2
    
    For details see:
    https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html
    
    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.
    
    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.
    
    This change also fixes the fernet cron job generator, which was broken
    in the following cases:
    
    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs
    
    It should now be possible to request any interval up to a week divided
    by the number of hosts.
    
    Change-Id: I10c82dc5f83653beb60ddb86d558c5602153341a
    Closes-Bug: #1809469
    (cherry picked from commit 6c1442c385450004dd253f3f464fe4336194be99)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-24: Related fix merged to kolla-ansible (stable/rocky)

#17

Reviewed: https://review.opendev.org/666087
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=c3e5ab0dc3b87c6ddae78a8f29d268ebe840638d
Submitter: Zuul
Branch: stable/rocky

commit c3e5ab0dc3b87c6ddae78a8f29d268ebe840638d
Author: Mark Goddard <email address hidden>
Date: Thu May 16 14:01:39 2019 +0100

Add unit test for keystone fernet cron generator

Before making changes to this script, document its behaviour with a unit
test.

There are two major issues:

    * requesting an interval of more than 1 day results in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, results in no jobs

    Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
    Related-Bug: #1809469
    (cherry picked from commit 25ac955a4e2645da29f8c7b807f0bac5afb43838)

tags:

added: in-stable-rocky

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-24: Fix merged to kolla-ansible (stable/rocky)

#18

Download full text (3.4 KiB)

Reviewed: https://review.opendev.org/666093
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=d66e95d1d96c9f5aee52740df53d6e784c7b8194
Submitter: Zuul
Branch: stable/rocky

commit d66e95d1d96c9f5aee52740df53d6e784c7b8194
Author: Mark Goddard <email address hidden>
Date: Thu May 16 17:26:45 2019 +0100

Fix keystone fernet key rotation scheduling

    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.

When we over rotate keys, we get logs like this:

This is not a recognized Fernet token <token> TokenNotFound

    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.

With three controllers, in crontab in keystone-fernet we see the once a day
correctly staggered across the three controllers:

    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh

Currently with three controllers we have this keystone config:

    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)

[fernet_tokens]
max_active_keys = 4

Currently, kolla-ansible configures key rotation according to the following:

rotation_interval = token_expiration / num_hosts

This means we rotate keys more quickly the more hosts we have, which doesn't
make much sense.

Keystone docs state:

max_active_keys =
((token_expiration + allow_expired_window) / rotation_interval) + 2

For details see:
https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.

    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.

This change also fixes the fernet cron job generator, which was broken
in the following cases:

    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs

It should...

Reviewed:  https://review.opendev.org/666093
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=d66e95d1d96c9f5aee52740df53d6e784c7b8194
Submitter: Zuul
Branch:    stable/rocky

commit d66e95d1d96c9f5aee52740df53d6e784c7b8194
Author: Mark Goddard <mark@stackhpc.com>
Date:   Thu May 16 17:26:45 2019 +0100

Fix keystone fernet key rotation scheduling
    
    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.
    
    When we over rotate keys, we get logs like this:
    
     This is not a recognized Fernet token <token> TokenNotFound
    
    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.
    
    With three controllers, in crontab in keystone-fernet we see the once a day
    correctly staggered across the three controllers:
    
    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh
    
    Currently with three controllers we have this keystone config:
    
    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)
    
    [fernet_tokens]
    max_active_keys = 4
    
    Currently, kolla-ansible configures key rotation according to the following:
    
       rotation_interval = token_expiration / num_hosts
    
    This means we rotate keys more quickly the more hosts we have, which doesn't
    make much sense.
    
    Keystone docs state:
    
       max_active_keys =
         ((token_expiration + allow_expired_window) / rotation_interval) + 2
    
    For details see:
    https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html
    
    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.
    
    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.
    
    This change also fixes the fernet cron job generator, which was broken
    in the following cases:
    
    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs
    
    It should now be possible to request any interval up to a week divided
    by the number of hosts.
    
    Change-Id: I10c82dc5f83653beb60ddb86d558c5602153341a
    Closes-Bug: #1809469
    (cherry picked from commit 6c1442c385450004dd253f3f464fe4336194be99)

tags:

added: in-stable-queens

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-24: Related fix merged to kolla-ansible (stable/queens)

#19

Reviewed: https://review.opendev.org/666088
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=ec2aa48c1713187dcb4ebfc836e45b8cfe5329c4
Submitter: Zuul
Branch: stable/queens

commit ec2aa48c1713187dcb4ebfc836e45b8cfe5329c4
Author: Mark Goddard <email address hidden>
Date: Thu May 16 14:01:39 2019 +0100

Add unit test for keystone fernet cron generator

Before making changes to this script, document its behaviour with a unit
test.

There are two major issues:

    * requesting an interval of more than 1 day results in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, results in no jobs

    Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
    Related-Bug: #1809469
    (cherry picked from commit 25ac955a4e2645da29f8c7b807f0bac5afb43838)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-24: Fix merged to kolla-ansible (stable/queens)

#20

Download full text (3.4 KiB)

Reviewed: https://review.opendev.org/666095
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=d5cef35a7fbb8349ed6cbd5862b24caed52ff7a4
Submitter: Zuul
Branch: stable/queens

commit d5cef35a7fbb8349ed6cbd5862b24caed52ff7a4
Author: Mark Goddard <email address hidden>
Date: Thu May 16 17:26:45 2019 +0100

Fix keystone fernet key rotation scheduling

    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.

When we over rotate keys, we get logs like this:

This is not a recognized Fernet token <token> TokenNotFound

    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.

With three controllers, in crontab in keystone-fernet we see the once a day
correctly staggered across the three controllers:

    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh

Currently with three controllers we have this keystone config:

    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)

[fernet_tokens]
max_active_keys = 4

Currently, kolla-ansible configures key rotation according to the following:

rotation_interval = token_expiration / num_hosts

This means we rotate keys more quickly the more hosts we have, which doesn't
make much sense.

Keystone docs state:

max_active_keys =
((token_expiration + allow_expired_window) / rotation_interval) + 2

For details see:
https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.

    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.

This change also fixes the fernet cron job generator, which was broken
in the following cases:

    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs

It shoul...

Reviewed:  https://review.opendev.org/666095
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=d5cef35a7fbb8349ed6cbd5862b24caed52ff7a4
Submitter: Zuul
Branch:    stable/queens

commit d5cef35a7fbb8349ed6cbd5862b24caed52ff7a4
Author: Mark Goddard <mark@stackhpc.com>
Date:   Thu May 16 17:26:45 2019 +0100

Fix keystone fernet key rotation scheduling
    
    Right now every controller rotates fernet keys. This is nice because
    should any controller die, we know the remaining ones will rotate the
    keys. However, we are currently over-rotating the keys.
    
    When we over rotate keys, we get logs like this:
    
     This is not a recognized Fernet token <token> TokenNotFound
    
    Most clients can recover and get a new token, but some clients (like
    Nova passing tokens to other services) can't do that because it doesn't
    have the password to regenerate a new token.
    
    With three controllers, in crontab in keystone-fernet we see the once a day
    correctly staggered across the three controllers:
    
    ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
    0 0 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
    0 8 * * * /usr/bin/fernet-rotate.sh
    ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
    0 16 * * * /usr/bin/fernet-rotate.sh
    
    Currently with three controllers we have this keystone config:
    
    [token]
    expiration = 86400 (although, keystone default is one hour)
    allow_expired_window = 172800 (this is the keystone default)
    
    [fernet_tokens]
    max_active_keys = 4
    
    Currently, kolla-ansible configures key rotation according to the following:
    
       rotation_interval = token_expiration / num_hosts
    
    This means we rotate keys more quickly the more hosts we have, which doesn't
    make much sense.
    
    Keystone docs state:
    
       max_active_keys =
         ((token_expiration + allow_expired_window) / rotation_interval) + 2
    
    For details see:
    https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html
    
    Rotation is based on pushing out a staging key, so should any server
    start using that key, other servers will consider that valid. Then each
    server in turn starts using the staging key, each in term demoting the
    existing primary key to a secondary key. Eventually you prune the
    secondary keys when there is no token in the wild that would need to be
    decrypted using that key. So this all makes sense.
    
    This change adds new variables for fernet_token_allow_expired_window and
    fernet_key_rotation_interval, so that we can correctly calculate the
    correct number of active keys. We now set the default rotation interval
    so as to minimise the number of active keys to 3 - one primary, one
    secondary, one buffer.
    
    This change also fixes the fernet cron job generator, which was broken
    in the following cases:
    
    * requesting an interval of more than 1 day resulted in no jobs
    * requesting an interval of more than 60 minutes, unless an exact
      multiple of 60 minutes, resulted in no jobs
    
    It should now be possible to request any interval up to a week divided
    by the number of hosts.
    
    Change-Id: I10c82dc5f83653beb60ddb86d558c5602153341a
    Closes-Bug: #1809469
    (cherry picked from commit 6c1442c385450004dd253f3f464fe4336194be99)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-18: Fix included in openstack/kolla-ansible 8.0.0.0rc2

#21

This issue was fixed in the openstack/kolla-ansible 8.0.0.0rc2 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-09: Fix included in openstack/kolla-ansible 6.2.2

#22

This issue was fixed in the openstack/kolla-ansible 6.2.2 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-09: Fix included in openstack/kolla-ansible 7.1.2

#23

This issue was fixed in the openstack/kolla-ansible 7.1.2 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-11: Fix included in openstack/kolla-ansible 9.0.0.0rc1

#24

This issue was fixed in the openstack/kolla-ansible 9.0.0.0rc1 release candidate.

	Status	Importance	Assigned to	Milestone
kolla-ansible	Fix Released	High	Unassigned	kolla-ansible 8.0.0 "Stein"
Pike	New	High	Unassigned	kolla-ansible 5.0.6 "pike"
Queens	Fix Released	High	Mark Goddard	kolla-ansible 6.2.1 "queens"
Rocky	Fix Released	High	Mark Goddard	kolla-ansible 7.1.1 "rocky"
Stein	Fix Released	High	Mark Goddard	kolla-ansible 8.0.0 "Stein"

kolla-ansible

keystone_fernet incorrectly calculates rotation schedule

Bug Description

Other bug subscribers

Remote bug watches