Ubuntu
ovn package

Bug #1963698
Comment #0

Comment 0 for bug 1963698

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2022-03-04:

#0

We are deploying Focal Wallaby for a customer
Neutron package version (2:18.2.0-0ubuntu1~cloud0), GLIBC 2.31-0ubuntu9.7

When running rally/tempest tests that are creating some VMs, the following symptoms happen:
1) A huge increase of size and load of writings on /var/lib/openvswitch/conf.db
(If ovsdb-server is restarted while OVS database is a few GB, the unit can fail to start)

2) A very high CPU usage on the following processes :
* neutron-ovn-metadata-agent
* nova-compute
* ovn-controller
* ovsdb-server

3) The Nova compute node may face some severe delays and may time-out when creating any instance (for Nova or Octavia Amphora) on it.

A temporary way to solve the issue is to restart ovn-controller service.
Then it reproduces again after some time on a different hypervisor.

It has been reproducible so far only on a customer deployment with many Nova-compute units.

Ovn-controller.log on the hypervisor:
2022-03-04T12:54:43.065Z|00479|binding|INFO|Changing chassis for lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from comp04.maas to comp18.maas
.
2022-03-04T12:54:43.065Z|00480|binding|INFO|cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53: Claiming fa:16:3e:15:1f:a6 10.218.131.106/18
2022-03-04T12:54:43.077Z|00481|binding|INFO|Releasing lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from this chassis.
2022-03-04T12:54:46.798Z|00482|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00483|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00484|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00485|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)

Full log of ovn-controller available here :
https://private-fileshare.canonical.com/~alitvinov/random/ovn-controller.txt

We are deploying Focal Wallaby for a customer
Neutron package version (2:18.2.0-0ubuntu1~cloud0), GLIBC 2.31-0ubuntu9.7

When running rally/tempest tests that are creating some VMs, the following symptoms happen:
1) A huge increase of size and load of writings on /var/lib/openvswitch/conf.db
(If ovsdb-server is restarted while OVS database is a few GB, the unit can fail to start)

2) A very high CPU usage on the following processes :
* neutron-ovn-metadata-agent
* nova-compute
* ovn-controller
* ovsdb-server

3) The Nova compute node may face some severe delays and may time-out when creating any instance (for Nova or Octavia Amphora) on it.

A temporary way to solve the issue is to restart ovn-controller service. 
Then it reproduces again after some time on a different hypervisor.

It has been reproducible so far only on a customer deployment with many Nova-compute units.

Ovn-controller.log on the hypervisor:
2022-03-04T12:54:43.065Z|00479|binding|INFO|Changing chassis for lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from comp04.maas to comp18.maas
.
2022-03-04T12:54:43.065Z|00480|binding|INFO|cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53: Claiming fa:16:3e:15:1f:a6 10.218.131.106/18
2022-03-04T12:54:43.077Z|00481|binding|INFO|Releasing lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from this chassis.
2022-03-04T12:54:46.798Z|00482|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00483|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00484|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00485|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)

Full log of ovn-controller available here :
https://private-fileshare.canonical.com/~alitvinov/random/ovn-controller.txt