Openstack Integrator should have nrpe checks that monitor status of openstack components supporting k8s workloads
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Openstack Integrator Charm |
Won't Fix
|
Wishlist
|
Unassigned | ||
charm-openstack-service-checks |
Fix Released
|
Medium
|
Robert Gildein |
Bug Description
As is mentioned in lp#1853668, it is possible that there can be issues on the backend of openstack underlay that can cause odd/failing service access for kubernetes workloads.
The openstack integrator charm should have monitoring hooks added for nrpe-external-
For instance, if there is a loadbalancer that is running in support of a service endpoint, it's status and its loadbalancer pool member statuses should be monitored and reported up to kubernetes and/or nagios in some way that can be exposed to operators of multi-tiered clouds.
Changed in charm-openstack-integrator: | |
importance: | Undecided → Wishlist |
status: | New → Triaged |
Changed in charm-openstack-integrator: | |
assignee: | nobody → Robert Gildein (rgildein) |
status: | Triaged → In Progress |
Changed in charm-openstack-service-checks: | |
importance: | Undecided → Medium |
Changed in charm-openstack-service-checks: | |
status: | In Progress → Fix Committed |
Changed in charm-openstack-integrator: | |
status: | In Progress → Won't Fix |
Changed in charm-openstack-service-checks: | |
milestone: | none → 22.10 |
Changed in charm-openstack-service-checks: | |
status: | Fix Committed → Fix Released |
Changed in charm-openstack-service-checks: | |
status: | Fix Released → Fix Committed |
Changed in charm-openstack-service-checks: | |
status: | Fix Committed → Fix Released |
I think the best approach to fix this bug would be to add `layer:nagios` and create a python script in the templates folder. Each py file will contain one of the checks.
Here I provide my idea for nrpe check for all OpenStack networks should look like. master. available` flag exists, the `check_ openstack_ networks. py` file will be installed as a nagios plugin. This file checks all OpenStack networks to see if they are in the ACTIVE state. If the network is in the DOWN state, raises a warning, and if another problem occurs (problem with parsing networks from OpenStack output, etc.), raises a critical error.
When the `nrpe-external-
After verifying the correctness of my approach, I will provide more information about other checks.