Monitor expiration of OVN certs

Bug #1979539 reported by Giuseppe Petralia
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Neutron API OVN Plugin Charm
New
Undecided
Unassigned
charm-openstack-service-checks
Invalid
High
Unassigned
charm-ovn-central
Fix Committed
Undecided
Edward Hope-Morley
22.09
In Progress
Undecided
Unassigned
23.03
In Progress
Undecided
Unassigned
charm-ovn-chassis
In Progress
Undecided
Edward Hope-Morley
vault-charm
New
Undecided
Unassigned

Bug Description

There is currently no monitoring for the expiry of certificates used by ovn-chassis, ovn-central and neutron-api-plugin-ovn:

* /etc/ovn/cert_host
* /etc/neutron/plugins/ml2/cert_host

If these certificates are left to expire, neutron-server can't talk to ovn NB and neutron API become unreachable. The NRPE relation could be extended to issue a warning or a critical based on configurable threshold that may default respectively to 30 and 15 days.

Tags: bseng-1277
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Hello, Giuseppe, and thank you for your bug report.

Providing a way to easily monitor certificate lifetime in a system that make use of PKI for authentication and authorization is indeed important.

The OVN charms make use of Juju to integrate with an application providing the Certificate Authority and automation for Certificate Issuance, for example Vault.

Would it perhaps make sense to implement monitoring of lifetime of issued certificates in the charm/application responsible for managing the certificates instead?

Revision history for this message
Giuseppe Petralia (peppepetra) wrote (last edit ):

Hi Frode, thanks for your reply.

If I read correctly you are suggesting to implement the check on the application providing the certs, which in our cases for OVN is 100% of times Vault.

I see some pros and cons of this choice:

Pros:
* we get out of the box monitoring for all certs provided by Vault, i.e. OpenStack API certs, Octavia certs, OVN certs etc.

Cons:
* We are only monitoring the certs that are in Vault. If the distribution of these certs fails, see LP#1940549, our monitoring will miss that as it will think the certs are all renewed but in reality the certs in the ovn-chassis or ovn-central units are still the old ones as Vault failed to provide the updated ones.

To be on the safe side, I think we should always monitor what is actually being used by the applications so I would like to see these checks in the OVN charms.

Andrea Ieri (aieri)
Changed in charm-openstack-service-checks:
status: New → Triaged
importance: Undecided → High
tags: added: bseng-1277
Andrea Ieri (aieri)
summary: - Add Nrpe check for monitoring expiration of certs
+ Monitor expiration of OVN certs
Changed in charm-ovn-central:
assignee: nobody → Edward Hope-Morley (hopem)
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Changed in charm-ovn-central:
status: New → In Progress
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Having looked into this a bit further I think it might actually not be possible. Nrpe checks run as the nagios user and the ovn certs have the following permissions:

# ll /etc/ovn/
total 24
dr-xr-xr-x 2 root root 4096 Jun 23 13:07 ./
drwxr-xr-x 101 root root 4096 Jul 3 12:44 ../
-rw-r----- 1 root root 1532 Jun 23 13:07 cert_host
-rw-r----- 1 root root 1674 Jun 23 13:07 key_host
-rw-r--r-- 1 root root 1244 Jul 3 10:50 ovn-central.crt
-rw-r----- 1 root root 211 Jun 23 13:07 ovn-northd-db-params.conf

So an nrpe check cannot read the cert file unless its permissions are opened up which I assume is not what anyone wants to do.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

It has been brought to my attention that we do have some charms with nrpe checks for root owned resources and they are implemented using a cron job that runs as root and performs the check itself while the nrpe check checks the output of the cron check e.g. rabbitmq-server:

https://github.com/openstack/charm-rabbitmq-server/blob/3c155e2bdaeaec5090111749a2cf366b55875575/hooks/rabbit_utils.py#L1630

Another option is to use the update-status hook, which runs by default every 5 minutes, to perform the check then store the output in a location that the nrpe check can read.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-central (master)

Reviewed: https://review.opendev.org/c/x/charm-ovn-central/+/887411
Committed: https://opendev.org/x/charm-ovn-central/commit/e3502b2c9c5f65ee8c0f3ed1aff8ef66c94de2f1
Submitter: "Zuul (22348)"
Branch: master

commit e3502b2c9c5f65ee8c0f3ed1aff8ef66c94de2f1
Author: Edward Hope-Morley <email address hidden>
Date: Fri Jun 30 17:27:22 2023 +0100

    Add ovn cert nrpe check

    Certs are root readable so we use a cron job to perform
    the check and save state for an nrpe check to read and
    send back to nagios.

    Closes-Bug: #1979539
    Change-Id: I7c7cd238ddf3fd9f92bfa5879d19d78c091cf2ac

Changed in charm-ovn-central:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-central (stable/22.09)

Fix proposed to branch: stable/22.09
Review: https://review.opendev.org/c/x/charm-ovn-central/+/895021

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-central (stable/23.03)

Fix proposed to branch: stable/23.03
Review: https://review.opendev.org/c/x/charm-ovn-central/+/895022

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-chassis (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/896409

Changed in charm-ovn-chassis:
status: New → In Progress
Changed in charm-ovn-chassis:
assignee: nobody → Edward Hope-Morley (hopem)
Revision history for this message
Eric Chen (eric-chen) wrote :

decide not to implement it in charm-openstack-service-checks , so mark it as invalid

Changed in charm-openstack-service-checks:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.