Requesting nrpe check for no active routers in HA neutron

Bug #1883959 reported by Adam Dyess
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Neutron Gateway Charm
Confirmed
Wishlist
Unassigned

Bug Description

I ran into an issue where I had 5 neutron-gateways in my model, yet none of them contained any l3-agents that were active. This occurred because we were trying to work around an issue where neutron-l3-agents crash when they are told to join a rabbitmq-service which has yet to be clustered. (https://bugs.launchpad.net/charm-rabbitmq-server/+bug/1796886). The work-around we were applies was to prevent the l3-agents from talking to the rabbit cluster at all

juju run -a neutron-gateway -- iptables -A OUTPUT -p tcp --dport 5672 -j DROP

While clever at preventing the l3-agents from crashing, there were some risks with this work-around approach.
1) the openstack service checks showed 'neutron agents dropped' when the rabbit port was blocked -- however they were still routing traffic. So we ignored this check
2) When the l3-agents were restarted (maybe relation changes from the new rabbitmq unit), each of the ha qrouters wouldn't go active and didn't have IP addresses in their ha interfaces.

after the new rabbitmq-server was actively in the cluster, we used
juju run -a neutron-gateway -- iptables -D OUTPUT -p tcp --dport 5672 -j DROP
and quickly L3 services were restored.

We could have spotted the L3 service outage earlier if there was an NRPE check indicating that there were no active l3 services available.. This sounds a bit similar to the openstack service checks, but in essence are slightly different.

Liam Young (gnuoy)
Changed in charm-neutron-gateway:
status: New → Confirmed
importance: Undecided → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.