Comment 0 for bug 1980527

Revision history for this message
Stefan Lupsa (stefanlupsacbsl) wrote : 5.13.0.44 openvswith kerenel module meter implementation breaks on ovs vswitchd restart

Behavior:

OVN Kolla Openstack deployment with docker containers.

With 300 VM deployed on a compute node with some synthetic network load between the machines caused by pings and netperf, when the opvnswitch_vswitchd is restarted the meter implementation breaks for the opvnswitch kernel module and the flows can't be added back.

This does not happen with the default 5.4.0.121.122 kernel, only on the linux-image-generic-hwe-20.04 5.13.0.44.49~20.04.28. It also does not happen until sufficient machines / load is present on the system (with 50 machines the behavior is not present).

Before restart:
(openvswitch-vswitchd)# ovs-ofctl meter-features br-int -O OpenFlow15
OFPST_METER_FEATURES reply (OF1.5) (xid=0x2):
max_meter:200000 max_bands:1 max_color:0
band_types: drop
capabilities: kbps pktps burst stats

After restart:
(openvswitch-vswitchd)# ovs-ofctl meter-features br-int -O OpenFlow15
OFPST_METER_FEATURES reply (OF1.5) (xid=0x2):
max_meter:0 max_bands:0 max_color:0
band_types:
capabilities:

Following logs are shown in openvswitch-vswitchd and ovn-controller logs preventing logs from being added back:
ovs-vswitchd.log
---
2022-06-29T12:43:49.188Z|00006|dpif(handler1)|WARN|system@ovs-system: failed to put[create] (Invalid argument) ufid:73695f68-6778-4980-bb74-29b528036b57 recirc_id(0),dp_hash(0),skb_priority(0),in_port(303),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=5c:45:27:f9:9a:02,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=214,pcp=7),encap()
2022-06-29T12:43:49.204Z|00036|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2022-06-29T12:43:49.204Z|00037|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2022-06-29T12:43:49.204Z|00038|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2022-06-29T12:43:49.204Z|00039|dpif_netlink|INFO|The kernel module has a broken meter implementation.
2022-06-29T12:43:49.208Z|00040|dpif|WARN|system@ovs-system: failed to query port patch-br-int-to-provnet-90238e1e-fbf5-45e4-bc6c-6110823d58ed: Invalid argument
2022-06-29T12:43:49.209Z|00041|dpif|WARN|system@ovs-system: failed to query port patch-br-int-to-provnet-b0a964fc-13a9-48ce-9a1b-7cfe8fc2b979: Invalid argument
[...]
2022-06-29T12:44:56.778Z|00585|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2022-06-29T12:44:56.778Z|00586|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2022-06-29T12:44:56.778Z|00587|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2022-06-29T12:44:56.778Z|00588|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2022-06-29T12:44:56.778Z|00589|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2022-06-29T12:44:56.778Z|00590|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2022-06-29T12:44:56.778Z|00591|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2022-06-29T12:44:56.778Z|00592|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2022-06-29T12:44:56.778Z|00593|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message
2022-06-29T12:44:56.778Z|00594|connmgr|INFO|br-int<->unix#3: sending OFPMMFC_INVALID_METER error reply to OFPT_METER_MOD message

ovn-controller.log
---
2022-06-29T12:44:56.785Z|00165|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x388f): OFPMMFC_INVALID_METER
OFPT_METER_MOD (OF1.5) (xid=0x388f): ADD meter=1565 kbps stats bands=
type=drop rate=250000
2022-06-29T12:44:56.785Z|00166|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x3890): OFPMMFC_INVALID_METER
OFPT_METER_MOD (OF1.5) (xid=0x3890): ADD meter=3427 kbps stats bands=
type=drop rate=250000
2022-06-29T12:44:56.785Z|00167|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x3891): OFPMMFC_INVALID_METER
OFPT_METER_MOD (OF1.5) (xid=0x3891): ADD meter=3766 kbps stats bands=
type=drop rate=250000
2022-06-29T12:44:56.785Z|00168|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x3892): OFPMMFC_INVALID_METER
OFPT_METER_MOD (OF1.5) (xid=0x3892): ADD meter=1225 kbps stats bands=
type=drop rate=250000
2022-06-29T12:44:56.785Z|00169|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x3893): OFPMMFC_INVALID_METER
OFPT_METER_MOD (OF1.5) (xid=0x3893): ADD meter=2853 kbps stats bands=
type=drop rate=250000

The OVS bridges also cannot be added to the system manually but OVS adds them to the database even though the operation fails:

ovs-vswitchd.log
---
2022-06-29T12:12:51.874Z|00071|bridge|INFO|bridge br-int: added interface br-int on port 65534
2022-06-29T12:12:51.877Z|00072|bridge|INFO|bridge br-int: using datapath ID 000026586eff73c0
2022-06-29T12:12:51.877Z|00073|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2022-06-29T12:13:16.157Z|00074|bridge|INFO|bridge br-int: deleted interface br-int on port 65534
2022-06-29T12:13:31.074Z|00075|netlink_socket|INFO|netlink dump request error (No such file or directory)
2022-06-29T12:13:31.074Z|00076|dpif|WARN|failed to enumerate system datapaths: No such file or directory
2022-06-29T12:13:31.074Z|00077|dpif|WARN|failed to create datapath ovs-system: No such file or directory
2022-06-29T12:13:31.074Z|00078|ofproto_dpif|ERR|failed to open datapath of type system: No such file or directory
2022-06-29T12:13:31.074Z|00079|ofproto|ERR|failed to open datapath br-int: No such file or directory
2022-06-29T12:13:31.074Z|00080|bridge|ERR|failed to create bridge br-int: No such file or directory
2022-06-29T12:15:32.189Z|00081|netlink_socket|INFO|netlink dump request error (No such file or directory)
2022-06-29T12:15:32.189Z|00082|dpif|WARN|failed to enumerate system datapaths: No such file or directory
2022-06-29T12:15:32.189Z|00083|dpif|WARN|failed to create datapath ovs-system: No such file or directory
2022-06-29T12:15:32.189Z|00084|ofproto_dpif|ERR|failed to open datapath of type system: No such file or directory
2022-06-29T12:15:32.189Z|00085|ofproto|ERR|failed to open datapath br-int: No such file or directory
2022-06-29T12:15:32.189Z|00086|bridge|ERR|failed to create bridge br-int: No such file or directory
2022-06-29T12:18:43.488Z|00087|netlink_socket|INFO|netlink dump request error (No such file or directory)
2022-06-29T12:18:43.488Z|00088|dpif|WARN|failed to enumerate system datapaths: No such file or directory
2022-06-29T12:18:43.488Z|00089|dpif|WARN|failed to create datapath ovs-system: No such file or directory
2022-06-29T12:18:43.488Z|00090|ofproto_dpif|ERR|failed to open datapath of type system: No such file or directory

Mitigation steps:
The only way currently to bring it back to operation, that we've found, without rebooting the host is to reload the kernel module. To to this we have to back up the ovsdb conf.db, start all the containers and delete the bridges so that we can reload the openvswitch module. Then restore the conf.db and start back the containers sequentially.

Openvswitch reports compatibility for 2.16.x between kernel versions 3.16 to 5.8 Ref: https://docs.openvswitch.org/en/latest/faq/releases/