OVN Kolla Openstack deployment with docker containers.
With 300 VM deployed on a compute node with some synthetic network load between the machines caused by pings and netperf, when the opvnswitch_vswitchd is restarted the meter implementation breaks for the opvnswitch kernel module and the flows can't be added back.
This does not happen with the default 5.4.0.121.122 kernel, only on the linux-image-generic-hwe-20.04 5.13.0.44.49~20.04.28. It also does not happen until sufficient machines / load is present on the system (with 50 machines the behavior is not present).
The OVS bridges also cannot be added to the system manually but OVS adds them to the database even though the operation fails:
ovs-vswitchd.log
---
2022-06-29T12:12:51.874Z|00071|bridge|INFO|bridge br-int: added interface br-int on port 65534
2022-06-29T12:12:51.877Z|00072|bridge|INFO|bridge br-int: using datapath ID 000026586eff73c0
2022-06-29T12:12:51.877Z|00073|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2022-06-29T12:13:16.157Z|00074|bridge|INFO|bridge br-int: deleted interface br-int on port 65534
2022-06-29T12:13:31.074Z|00075|netlink_socket|INFO|netlink dump request error (No such file or directory)
2022-06-29T12:13:31.074Z|00076|dpif|WARN|failed to enumerate system datapaths: No such file or directory
2022-06-29T12:13:31.074Z|00077|dpif|WARN|failed to create datapath ovs-system: No such file or directory
2022-06-29T12:13:31.074Z|00078|ofproto_dpif|ERR|failed to open datapath of type system: No such file or directory
2022-06-29T12:13:31.074Z|00079|ofproto|ERR|failed to open datapath br-int: No such file or directory
2022-06-29T12:13:31.074Z|00080|bridge|ERR|failed to create bridge br-int: No such file or directory
2022-06-29T12:15:32.189Z|00081|netlink_socket|INFO|netlink dump request error (No such file or directory)
2022-06-29T12:15:32.189Z|00082|dpif|WARN|failed to enumerate system datapaths: No such file or directory
2022-06-29T12:15:32.189Z|00083|dpif|WARN|failed to create datapath ovs-system: No such file or directory
2022-06-29T12:15:32.189Z|00084|ofproto_dpif|ERR|failed to open datapath of type system: No such file or directory
2022-06-29T12:15:32.189Z|00085|ofproto|ERR|failed to open datapath br-int: No such file or directory
2022-06-29T12:15:32.189Z|00086|bridge|ERR|failed to create bridge br-int: No such file or directory
2022-06-29T12:18:43.488Z|00087|netlink_socket|INFO|netlink dump request error (No such file or directory)
2022-06-29T12:18:43.488Z|00088|dpif|WARN|failed to enumerate system datapaths: No such file or directory
2022-06-29T12:18:43.488Z|00089|dpif|WARN|failed to create datapath ovs-system: No such file or directory
2022-06-29T12:18:43.488Z|00090|ofproto_dpif|ERR|failed to open datapath of type system: No such file or directory
Mitigation steps:
The only way currently to bring it back to operation, that we've found, without rebooting the host is to reload the kernel module. To to this we have to back up the ovsdb conf.db, start all the containers and delete the bridges so that we can reload the openvswitch module. Then restore the conf.db and start back the containers sequentially.
Behavior:
OVN Kolla Openstack deployment with docker containers.
With 300 VM deployed on a compute node with some synthetic network load between the machines caused by pings and netperf, when the opvnswitch_vswitchd is restarted the meter implementation breaks for the opvnswitch kernel module and the flows can't be added back.
This does not happen with the default 5.4.0.121.122 kernel, only on the linux-image- generic- hwe-20. 04 5.13.0. 44.49~20. 04.28. It also does not happen until sufficient machines / load is present on the system (with 50 machines the behavior is not present).
Before restart: vswitchd) # ovs-ofctl meter-features br-int -O OpenFlow15 FEATURES reply (OF1.5) (xid=0x2):
(openvswitch-
OFPST_METER_
max_meter:200000 max_bands:1 max_color:0
band_types: drop
capabilities: kbps pktps burst stats
After restart: vswitchd) # ovs-ofctl meter-features br-int -O OpenFlow15 FEATURES reply (OF1.5) (xid=0x2):
(openvswitch-
OFPST_METER_
max_meter:0 max_bands:0 max_color:0
band_types:
capabilities:
Following logs are shown in openvswitch- vswitchd and ovn-controller logs preventing logs from being added back: 29T12:43: 49.188Z| 00006|dpif( handler1) |WARN|system@ ovs-system: failed to put[create] (Invalid argument) ufid:73695f68- 6778-4980- bb74-29b528036b 57 recirc_ id(0),dp_ hash(0) ,skb_priority( 0),in_port( 303),skb_ mark(0) ,ct_state( 0),ct_zone( 0),ct_mark( 0),ct_label( 0),eth( src=5c: 45:27:f9: 9a:02,dst= 01:00:0c: cc:cc:cd) ,eth_type( 0x8100) ,vlan(vid= 214,pcp= 7),encap( ) 29T12:43: 49.204Z| 00036|dpif_ netlink| INFO|dpif_ netlink_ meter_transact OVS_METER_CMD_SET failed 29T12:43: 49.204Z| 00037|dpif_ netlink| INFO|dpif_ netlink_ meter_transact OVS_METER_CMD_SET failed 29T12:43: 49.204Z| 00038|dpif_ netlink| INFO|dpif_ netlink_ meter_transact get failed 29T12:43: 49.204Z| 00039|dpif_ netlink| INFO|The kernel module has a broken meter implementation. 29T12:43: 49.208Z| 00040|dpif| WARN|system@ ovs-system: failed to query port patch-br- int-to- provnet- 90238e1e- fbf5-45e4- bc6c-6110823d58 ed: Invalid argument 29T12:43: 49.209Z| 00041|dpif| WARN|system@ ovs-system: failed to query port patch-br- int-to- provnet- b0a964fc- 13a9-48ce- 9a1b-7cfe8fc2b9 79: Invalid argument 29T12:44: 56.778Z| 00585|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message 29T12:44: 56.778Z| 00586|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message 29T12:44: 56.778Z| 00587|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message 29T12:44: 56.778Z| 00588|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message 29T12:44: 56.778Z| 00589|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message 29T12:44: 56.778Z| 00590|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message 29T12:44: 56.778Z| 00591|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message 29T12:44: 56.778Z| 00592|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message 29T12:44: 56.778Z| 00593|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message 29T12:44: 56.778Z| 00594|connmgr| INFO|br- int<->unix# 3: sending OFPMMFC_ INVALID_ METER error reply to OFPT_METER_MOD message
ovs-vswitchd.log
---
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
[...]
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
ovn-controller.log 29T12:44: 56.785Z| 00165|ofctrl| INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x388f): OFPMMFC_ INVALID_ METER 29T12:44: 56.785Z| 00166|ofctrl| INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x3890): OFPMMFC_ INVALID_ METER 29T12:44: 56.785Z| 00167|ofctrl| INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x3891): OFPMMFC_ INVALID_ METER 29T12:44: 56.785Z| 00168|ofctrl| INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x3892): OFPMMFC_ INVALID_ METER 29T12:44: 56.785Z| 00169|ofctrl| INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x3893): OFPMMFC_ INVALID_ METER
---
2022-06-
OFPT_METER_MOD (OF1.5) (xid=0x388f): ADD meter=1565 kbps stats bands=
type=drop rate=250000
2022-06-
OFPT_METER_MOD (OF1.5) (xid=0x3890): ADD meter=3427 kbps stats bands=
type=drop rate=250000
2022-06-
OFPT_METER_MOD (OF1.5) (xid=0x3891): ADD meter=3766 kbps stats bands=
type=drop rate=250000
2022-06-
OFPT_METER_MOD (OF1.5) (xid=0x3892): ADD meter=1225 kbps stats bands=
type=drop rate=250000
2022-06-
OFPT_METER_MOD (OF1.5) (xid=0x3893): ADD meter=2853 kbps stats bands=
type=drop rate=250000
The OVS bridges also cannot be added to the system manually but OVS adds them to the database even though the operation fails:
ovs-vswitchd.log 29T12:12: 51.874Z| 00071|bridge| INFO|bridge br-int: added interface br-int on port 65534 29T12:12: 51.877Z| 00072|bridge| INFO|bridge br-int: using datapath ID 000026586eff73c0 29T12:12: 51.877Z| 00073|connmgr| INFO|br- int: added service controller "punix: /var/run/ openvswitch/ br-int. mgmt" 29T12:13: 16.157Z| 00074|bridge| INFO|bridge br-int: deleted interface br-int on port 65534 29T12:13: 31.074Z| 00075|netlink_ socket| INFO|netlink dump request error (No such file or directory) 29T12:13: 31.074Z| 00076|dpif| WARN|failed to enumerate system datapaths: No such file or directory 29T12:13: 31.074Z| 00077|dpif| WARN|failed to create datapath ovs-system: No such file or directory 29T12:13: 31.074Z| 00078|ofproto_ dpif|ERR| failed to open datapath of type system: No such file or directory 29T12:13: 31.074Z| 00079|ofproto| ERR|failed to open datapath br-int: No such file or directory 29T12:13: 31.074Z| 00080|bridge| ERR|failed to create bridge br-int: No such file or directory 29T12:15: 32.189Z| 00081|netlink_ socket| INFO|netlink dump request error (No such file or directory) 29T12:15: 32.189Z| 00082|dpif| WARN|failed to enumerate system datapaths: No such file or directory 29T12:15: 32.189Z| 00083|dpif| WARN|failed to create datapath ovs-system: No such file or directory 29T12:15: 32.189Z| 00084|ofproto_ dpif|ERR| failed to open datapath of type system: No such file or directory 29T12:15: 32.189Z| 00085|ofproto| ERR|failed to open datapath br-int: No such file or directory 29T12:15: 32.189Z| 00086|bridge| ERR|failed to create bridge br-int: No such file or directory 29T12:18: 43.488Z| 00087|netlink_ socket| INFO|netlink dump request error (No such file or directory) 29T12:18: 43.488Z| 00088|dpif| WARN|failed to enumerate system datapaths: No such file or directory 29T12:18: 43.488Z| 00089|dpif| WARN|failed to create datapath ovs-system: No such file or directory 29T12:18: 43.488Z| 00090|ofproto_ dpif|ERR| failed to open datapath of type system: No such file or directory
---
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
Mitigation steps:
The only way currently to bring it back to operation, that we've found, without rebooting the host is to reload the kernel module. To to this we have to back up the ovsdb conf.db, start all the containers and delete the bridges so that we can reload the openvswitch module. Then restore the conf.db and start back the containers sequentially.
Openvswitch reports compatibility for 2.16.x between kernel versions 3.16 to 5.8 Ref: https:/ /docs.openvswit ch.org/ en/latest/ faq/releases/