We are deploying Focal Wallaby for a customer
Neutron package version (2:18.2.0-0ubuntu1~cloud0), GLIBC 2.31-0ubuntu9.7
When running rally/tempest tests that are creating some VMs, the following symptoms happen:
1) A huge increase of size and load of writings on /var/lib/openvswitch/conf.db
(If ovsdb-server is restarted while OVS database is a few GB, the unit can fail to start)
2) A very high CPU usage on the following processes :
* neutron-ovn-metadata-agent
* nova-compute
* ovn-controller
* ovsdb-server
3) The Nova compute node may face some severe delays and may time-out when creating any instance (for Nova or Octavia Amphora) on it.
A temporary way to solve the issue is to restart ovn-controller service.
Then it reproduces again after some time on a different hypervisor.
It has been reproducible so far only on a customer deployment with many Nova-compute units.
Ovn-controller.log on the hypervisor:
2022-03-04T12:54:43.065Z|00479|binding|INFO|Changing chassis for lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from comp04.maas to comp18.maas
.
2022-03-04T12:54:43.065Z|00480|binding|INFO|cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53: Claiming fa:16:3e:15:1f:a6 10.218.131.106/18
2022-03-04T12:54:43.077Z|00481|binding|INFO|Releasing lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from this chassis.
2022-03-04T12:54:46.798Z|00482|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00483|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00484|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00485|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
We are deploying Focal Wallaby for a customer 0-0ubuntu1~ cloud0) , GLIBC 2.31-0ubuntu9.7
Neutron package version (2:18.2.
When running rally/tempest tests that are creating some VMs, the following symptoms happen: openvswitch/ conf.db
1) A huge increase of size and load of writings on /var/lib/
(If ovsdb-server is restarted while OVS database is a few GB, the unit can fail to start)
2) A very high CPU usage on the following processes : ovn-metadata- agent
* neutron-
* nova-compute
* ovn-controller
* ovsdb-server
3) The Nova compute node may face some severe delays and may time-out when creating any instance (for Nova or Octavia Amphora) on it.
A temporary way to solve the issue is to restart ovn-controller service.
Then it reproduces again after some time on a different hypervisor.
It has been reproducible so far only on a customer deployment with many Nova-compute units.
Ovn-controller.log on the hypervisor: 04T12:54: 43.065Z| 00479|binding| INFO|Changing chassis for lport cr-lrp- f741e3f2- 4708-4091- 841d-4a9c05f09b 53 from comp04.maas to comp18.maas 04T12:54: 43.065Z| 00480|binding| INFO|cr- lrp-f741e3f2- 4708-4091- 841d-4a9c05f09b 53: Claiming fa:16:3e:15:1f:a6 10.218.131.106/18 04T12:54: 43.077Z| 00481|binding| INFO|Releasing lport cr-lrp- f741e3f2- 4708-4091- 841d-4a9c05f09b 53 from this chassis. 04T12:54: 46.798Z| 00482|poll_ loop|INFO| wakeup due to [POLLIN] on fd 13 (<->/var/ run/openvswitch /db.sock) at lib/stream-fd.c:157 (64% CPU usage) 04T12:54: 46.799Z| 00483|poll_ loop|INFO| wakeup due to [POLLIN] on fd 13 (<->/var/ run/openvswitch /db.sock) at lib/stream-fd.c:157 (64% CPU usage) 04T12:54: 46.799Z| 00484|poll_ loop|INFO| wakeup due to [POLLIN] on fd 13 (<->/var/ run/openvswitch /db.sock) at lib/stream-fd.c:157 (64% CPU usage) 04T12:54: 46.799Z| 00485|poll_ loop|INFO| wakeup due to [POLLIN] on fd 13 (<->/var/ run/openvswitch /db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-
.
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
Full log of ovn-controller available here : /private- fileshare. canonical. com/~alitvinov/ random/ ovn-controller. txt
https:/