I reviewed the code path and upgrade in my reproducer, following the approach
of upgrading neutron-gateway and subsequently neutron-api doesn't works because of a mismatch
in the migrations/rpc versions that causes the ha port to fail to be created/updated,
then the keepalived process cannot be spawned and finally the state-change-monitor
fails to find the PID for that keepalived process.
If I upgrade neutron-api, run the migrations to head and then upgrade the gateways, all seems correct.
I upgraded from the following versions
root@juju-da864d-1927868-5:/home/ubuntu# dpkg -l |grep keepalived
ii keepalived 1:1.3.9-1ubuntu0.18.04.2 amd64 Failover and monitoring daemon for LVS clusters
root@juju-da864d-1927868-5:/home/ubuntu# dpkg -l |grep neutron-common
ii neutron-common 2:15.3.3-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - common
--> To
root@juju-da864d-1927868-5:/home/ubuntu# dpkg -l |grep neutron-common
ii neutron-common 2:16.3.2-0ubuntu3~cloud0 all Neutron is a virtual network service for Openstack - common
I created a router with HA enabled as follows
$ openstack router list
+--------------------------------------+-----------------+--------+-------+----------------------------------+-------------+------+
| ID | Name | Status | State | Project | Distributed | HA |
+--------------------------------------+-----------------+--------+-------+----------------------------------+-------------+------+
| 09fa811f-410c-4360-8cae-687e7e73ff21 | provider-router | ACTIVE | UP | 6f5aaf5130764305a5d37862e3ff18ce | False | True |
+--------------------------------------+-----------------+--------+-------+----------------------------------+-------------+------+
===> Prior to upgrade I can list the keepalived processed linked to the ha-router
2) Then the l3 ha router creation mechanism can't process the HA router because the HA port id 87cfdd45-fea7-4c06-aa13-174cb71b294f is down
and keepalived cannot be spawned [0] [1]
Hello,
I reviewed the code path and upgrade in my reproducer, following the approach monitor
of upgrading neutron-gateway and subsequently neutron-api doesn't works because of a mismatch
in the migrations/rpc versions that causes the ha port to fail to be created/updated,
then the keepalived process cannot be spawned and finally the state-change-
fails to find the PID for that keepalived process.
If I upgrade neutron-api, run the migrations to head and then upgrade the gateways, all seems correct.
I upgraded from the following versions
root@juju- da864d- 1927868- 5:/home/ ubuntu# dpkg -l |grep keepalived 1ubuntu0. 18.04.2 amd64 Failover and monitoring daemon for LVS clusters
ii keepalived 1:1.3.9-
root@juju- da864d- 1927868- 5:/home/ ubuntu# dpkg -l |grep neutron-common 3-0ubuntu1~ cloud0 all Neutron is a virtual network service for Openstack - common
ii neutron-common 2:15.3.
--> To
root@juju- da864d- 1927868- 5:/home/ ubuntu# dpkg -l |grep neutron-common 2-0ubuntu3~ cloud0 all Neutron is a virtual network service for Openstack - common
ii neutron-common 2:16.3.
I created a router with HA enabled as follows
$ openstack router list ------- ------- ------- ------- ----+-- ------- ------- -+----- ---+--- ----+-- ------- ------- ------- ------- ----+-- ------- ----+-- ----+ ------- ------- ------- ------- ----+-- ------- ------- -+----- ---+--- ----+-- ------- ------- ------- ------- ----+-- ------- ----+-- ----+ 410c-4360- 8cae-687e7e73ff 21 | provider-router | ACTIVE | UP | 6f5aaf513076430 5a5d37862e3ff18 ce | False | True | ------- ------- ------- ------- ----+-- ------- ------- -+----- ---+--- ----+-- ------- ------- ------- ------- ----+-- ------- ----+-- ----+
+------
| ID | Name | Status | State | Project | Distributed | HA |
+------
| 09fa811f-
+------
===> Prior to upgrade I can list the keepalived processed linked to the ha-router
root 22999 0.0 0.0 91816 3052 ? Ss 19:17 0:00 keepalived -P -f /var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21/keepalived. conf -p /var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21.pid. keepalived -r /var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21.pid. keepalived- vrrp -D
root 23001 0.0 0.1 92084 4088 ? S 19:17 0:00 keepalived -P -f /var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21/keepalived. conf -p /var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21.pid. keepalived -r /var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21.pid. keepalived- vrrp -D
===> After upgrading -- None is returned, and in fact the keepalived processes aren't spawned
after neutron-* is upgraded.
Pre-upgrade: 1927868- 5 Keepalived[22997]: Starting Keepalived v1.3.9 (10/21,2017) 1927868- 5 Keepalived[22999]: Starting VRRP child process, pid=23001
Jun 24 19:17:07 juju-da864d-
Jun 24 19:17:07 juju-da864d-
Post - upgrade -- Not started
Jun 24 19:30:41 juju-da864d- 1927868- 5 Keepalived[22999]: Stopping 1927868- 5 Keepalived_ vrrp[23001] : Stopped 1927868- 5 Keepalived[22999]: Stopped Keepalived v1.3.9 (10/21,2017)
Jun 24 19:30:42 juju-da864d-
Jun 24 19:30:42 juju-da864d-
The reason for those keepalived processes not re-spawned is
1) The ml2 process starts the router devices by requesting a rpc call on the device details. This
one fails with different oslo target versions.
Therefore is required for the neutron-api migrations to be applied before the gateways.
9819:2021-06-24 19:31:09.935 31744 DEBUG neutron. plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-14f31407- 6342-4f71- 98b8-4437e166db aa - - - - -] Starting to process devices in:{'current': {'87cfdd45- fea7-4c06- aa13-174cb71b29 4f', 'b8e18ba0- c65b-498e- 9a8b-34c0fcc42d 07', '926b7377- 30f4-4b2c- 9064-8aab3918a3 85'}, 'added': {'87cfdd45- fea7-4c06- aa13-174cb71b29 4f'}, 'removed': set(), 'updated': set(), 're_added': set()} rpc_loop /usr/lib/ python3/ dist-packages/ neutron/ plugins/ ml2/drivers/ openvswitch/ agent/ovs_ neutron_ agent.py: 2685
9821:2021-06-24 19:31:10.028 31744 ERROR neutron.agent.rpc [req-14f31407- 6342-4f71- 98b8-4437e166db aa - - - - -] Failed to get details for device 87cfdd45- fea7-4c06- aa13-174cb71b29 4f: oslo_messaging. rpc.client. RemoteError: Remote error: InvalidTargetVe rsion Invalid target version 1.1
9869:2021-06-24 19:31:10.510 31744 DEBUG neutron. plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-14f31407- 6342-4f71- 98b8-4437e166db aa - - - - -] retrying failed devices {'87cfdd45- fea7-4c06- aa13-174cb71b29 4f'} _update_ port_info_ failed_ devices_ stats /usr/lib/ python3/ dist-packages/ neutron/ plugins/ ml2/drivers/ openvswitch/ agent/ovs_ neutron_ agent.py: 1674
2) Then the l3 ha router creation mechanism can't process the HA router because the HA port id 87cfdd45- fea7-4c06- aa13-174cb71b29 4f is down
and keepalived cannot be spawned [0] [1]
[0] https:/ /github. com/openstack/ neutron/ blob/1ad9ca56b0 7ffdc9f7e0bc6a6 2af61961b9128eb /neutron/ agent/l3/ ha_router. py#L519 /github. com/openstack/ neutron/ blob/1ad9ca56b0 7ffdc9f7e0bc6a6 2af61961b9128eb /neutron/ agent/linux/ keepalived. py#L455
[1] https:/
1971:2021-06-24 19:31:15.034 32459 DEBUG neutron. agent.l3. ha_router [-] Processing HA router with HA port: {'id': '87cfdd45- fea7-4c06- aa13-174cb71b29 4f', 'name': 'HA port tenant 6f5aaf513076430 5a5d37862e3ff18 ce', 'network_id': '1a2e73c3- 1587-4417- be96-40fde93547 4b', 'tenant_id': '', 'mac_address': 'fa:16: 3e:e2:e0: 56', 'admin_state_up': True, 'status': 'DOWN', 'device_id': '09fa811f- 410c-4360- 8cae-687e7e73ff 21', 'device_owner': 'network: router_ ha_interface' , 'fixed_ips': [{'subnet_id': '6f8bfdbf- ca04-4847- ac83-f4bd90c089 b6', 'ip_address': '169.254.193.135', 'prefixlen': 18}], 'allowed_ address_ pairs': [], 'extra_dhcp_opts': [], 'security_groups': [], 'description': '', 'binding: vnic_type' : 'normal', 'binding:profile': {}, 'binding:host_id': 'juju-da864d- 1927868- 5', 'binding:vif_type': 'ovs', 'binding: vif_details' : {'connectivity': 'l2', 'port_filter': True, 'ovs_hybrid_plug': True, 'datapath_type': 'system', 'bridge_name': 'br-int'}, 'port_security_ enabled' : False, 'dns_name': '', 'dns_assignment': [{'ip_address': '169.254.193.135', 'hostname': 'host-169- 254-193- 135', 'fqdn': 'host-169- 254-193- 135.1927868. stsstack. qa.1ss. '}], 'dns_domain': '', 'ip_allocation': 'immediate', 'tags': [], 'created_at': '2021-06- 24T19:16: 35Z', 'updated_at': '2021-06- 24T19:30: 59Z', 'revision_number': 5, 'project_id': '', 'subnets': [{'id': '6f8bfdbf- ca04-4847- ac83-f4bd90c089 b6', 'cidr': '169.254.192.0/18', 'gateway_ip': None, 'dns_nameservers': [], 'ipv6_ra_mode': None, 'subnetpool_id': None}], 'extra_subnets': [], 'address_scopes': {'4': None, '6': None}, 'mtu': 1500} process /usr/lib/ python3/ dist-packages/ neutron/ agent/l3/ ha_router. py:513
3) Since the port is down, the keepalived process cannot be started, the 'neutron- keepalived- state-change' agent fails with:
11166:2021-06-24 20:12:53.600 8839 DEBUG neutron. agent.linux. utils [-] Running command: ['sudo', '/usr/bin/ neutron- rootwrap' , '/etc/neutron/ rootwrap. conf', 'neutron- keepalived- state-change' , '--router_ id=09fa811f- 410c-4360- 8cae-687e7e73ff 21', '--namespace= qrouter- 09fa811f- 410c-4360- 8cae-687e7e73ff 21', '--conf_ dir=/var/ lib/neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21', '--log- file=/var/ lib/neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21/neutron- keepalived- state-change. log', '--monitor_ interface= ha-87cfdd45- fe', '--monitor_ cidr=169. 254.0.203/ 24', '--pid_ file=/var/ lib/neutron/ external/ pids/09fa811f- 410c-4360- 8cae-687e7e73ff 21.monitor. pid.neutron- keepalived- state-change- monitor' , '--state_ path=/var/ lib/neutron' , '--user=113', '--group=117'] create_process /usr/lib/ python3/ dist-packages/ neutron/ agent/linux/ utils.py: 88 agent.l3. ha_router [-] Router 09fa811f- 410c-4360- 8cae-687e7e73ff 21 neutron- keepalived- state-change- monitor pid 8961 spawn_state_ change_ monitor /usr/lib/ python3/ dist-packages/ neutron/ agent/l3/ ha_router. py:428 agent.linux. utils [-] Unable to access /var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21.pid. keepalived; Error: [Errno 2] No such file or directory: '/var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21.pid. keepalived' get_value_from_file /usr/lib/ python3/ dist-packages/ neutron/ agent/linux/ utils.py: 263 agent.linux. utils [-] Unable to access /var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21.pid. keepalived; Error: [Errno 2] No such file or directory: '/var/lib/ neutron/ ha_confs/ 09fa811f- 410c-4360- 8cae-687e7e73ff 21.pid. keepalived' get_value_from_file /usr/lib/ python3/ dist-packages/ neutron/ agent/linux/ utils.py: 263
11167:2021-06-24 20:12:55.379 8839 DEBUG neutron.
11182:2021-06-24 20:12:55.611 8839 DEBUG neutron.
11214:2021-06-24 20:12:56.172 8839 DEBUG neutron.