* in good run we have:
2022-01-23T19:28:26.309Z|00140|jsonrpc|DBG|unix#3: send notification, method="update2", params=["a220fbde-7c82-11ec-9a03-4fce359daaf8",{"Datapath_Binding":{"2c98db7e-84ad-4782-9d87-24eafae1700e":{"insert":{"tunnel_key":1,"external_ids":["map",[["logical-switch","5b801987-900b-4b79-aa3b-05c5fdc422eb"],["name","neutron-2d726047-4990-446b-9f1b-9dc6403adb3e"],["name2","network1"]]]}}}}]
2022-01-23T19:28:26.330Z|00141|poll_loop|DBG|wakeup due to [POLLIN] on fd 17 (/tmp/tmp8aro35_w/ovnsb_db.sock<->) at lib/stream-fd.c:157
2022-01-23T19:28:26.330Z|00142|jsonrpc|DBG|unix#3: received request, method="transact", params=["OVN_Southbound",{"uuid-name":"rowdebd1810_12e4_4ae8_9bcd_a18781f1b986","table":"MAC_Binding","op":"insert","row":{"ip":"100.0.0.21","datapath":["uuid","2c98db7e-84ad-4782-9d87-24eafae1700e"]}}], id=1260
2022-01-23T19:28:26.330Z|00143|jsonrpc|DBG|unix#3: send reply, result=[{"uuid":["uuid","e514d19c-7371-4d1b-9958-25ffd612b68b"]}], id=1260
2022-01-23T19:28:26.330Z|00144|jsonrpc|DBG|unix#4: send notification, method="update3", params=[["monid","OVN_Southbound"],"00000000-0000-0000-0000-000000000000",{"MAC_Binding":{"e514d19c-7371-4d1b-9958-25ffd612b68b":{"insert":{"ip":"100.0.0.21","datapath":["uuid","2c98db7e-84ad-4782-9d87-24eafae1700e"]}}}}]
2022-01-23T19:28:26.460Z|00145|poll_loop|DBG|wakeup due to [POLLIN] on fd 18 (/tmp/tmp8aro35_w/ovnsb_db.sock<->) at lib/stream-fd.c:157
but in failed run there is only:
2022-01-23T19:28:46.868Z|00156|jsonrpc|DBG|tcp:127.0.0.1:33296: send notification, method="update2", params=["ae7bd9c6-7c82-11ec-9a03-4fce359daaf8",{"Connection":{"83c29962-a6e2-4f3c-9fc7-888cf46540ed":{"modify":{"is_connected":true,"status":["map",[["bound_port","42597"],["n_connections","2"],["sec_since_connect","0"],["sec_since_disconnect","0"]]]}}},"Datapath_Binding":{"15003f80-2b91-44d6-8b9d-f8d217285fb3":{"insert":{"tunnel_key":1,"external_ids":["map",[["logical-switch","70caa880-3fed-4f9b-9558-f624c0d2c469"],["name","neutron-51ac15e4-24cc-4348-85c3-fe3a51cfcaec"],["name2","network1"]]]}}}}]
2022-01-23T19:28:49.369Z|00157|poll_loop|DBG|wakeup due to 2500-ms timeout at ovsdb/ovsdb-server.c:296 (0% CPU usage)
2022-01-23T19:28:51.872Z|00158|poll_loop|DBG|wakeup due to 2503-ms timeout at ovsdb/ovsdb-server.c:296 (0% CPU usage)
2022-01-23T19:28:54.373Z|00159|poll_loop|DBG|wakeup due to 2501-ms timeout at ovsdb/ovsdb-server.c:296 (0% CPU usage)
and nothing else.
In the NB log, there is:
* good run:
2022-01-23T19:28:26.457Z|00144|jsonrpc|DBG|unix#5: send notification, method="update2", params=["a220fbdc-7c82-11ec-9a03-4fce359daaf8",{"Logical_Switch":{"16b0beb1-7c31-40dd-8c76-1ece7d81d20b":{"insert":{"name":"neutron-c314b414-279d-4457-a8d2-eeab1d2d78ad","external_ids":["map",[["neutron:availability_zone_hints",""],["neutron:mtu","1442"],["neutron:network_name","net1"],["neutron:revision_number","1"]]],"other_config":["map",[["mcast_flood_unregistered","false"],["mcast_snoop","false"],["vlan-passthru","false"]]]}}}}]
2022-01-23T19:28:26.457Z|00145|jsonrpc|DBG|unix#5: send reply, result=[{"uuid":["uuid","16b0beb1-7c31-40dd-8c76-1ece7d81d20b"]}], id=1261
2022-01-23T19:28:26.457Z|00146|jsonrpc|DBG|unix#6: send notification, method="update3", params=[["monid","OVN_Northbound"],"00000000-0000-0000-0000-000000000000",{"Logical_Switch":{"16b0beb1-7c31-40dd-8c76-1ece7d81d20b":{"insert":{"name":"neutron-c314b414-279d-4457-a8d2-eeab1d2d78ad","external_ids":["map",[["neutron:availability_zone_hints",""],["neutron:mtu","1442"],["neutron:network_name","net1"],["neutron:revision_number","1"]]],"other_config":["map",[["mcast_flood_unregistered","false"],["mcast_snoop","false"],["vlan-passthru","false"]]]}}}}]
2022-01-23T19:28:26.769Z|00147|poll_loop|DBG|wakeup due to [POLLIN] on fd 17 (/tmp/tmp8aro35_w/ovnnb_db.sock<->) at lib/stream-fd.c:157 (0% CPU usage)
2022-01-23T19:28:26.769Z|00148|jsonrpc|DBG|unix#5: received request, method="transact", params=["OVN_Northbound",{"uuid-name":"row157d7b26_ebea_47d7_8917_161d2f11a246","table":"DHCP_Options","op":"insert","row":{"options":["map",[["dns_server","{10.10.10.10}"],["lease_time","43200"],["mtu","1442"],["router","20.0.0.1"],["server_id","20.0.0.1"],["server_mac","12:34:56:78:31:3f"]]],"cidr":"20.0.0.0/24","external_ids":["map",[["neutron:revision_number","0"],["subnet_id","2cf02242-c45f-4ccd-8219-7268a94379e7"]]]}}], id=1262
2022-01-23T19:28:26.769Z|00149|jsonrpc|DBG|unix#5: send notification, method="update2", params=["a220fbdc-7c82-11ec-9a03-4fce359daaf8",{"DHCP_Options":{"76a8f045-8669-46d5-a50b-cd391544efbe":{"insert":{"options":["map",[["dns_server","{10.10.10.10}"],["lease_time","43200"],["mtu","1442"],["router","20.0.0.1"],["server_id","20.0.0.1"],["server_mac","12:34:56:78:31:3f"]]],"cidr":"20.0.0.0/24","external_ids":["map",[["neutron:revision_number","0"],["subnet_id","2cf02242-c45f-4ccd-8219-7268a94379e7"]]]}}}}]
2022-01-23T19:28:26.769Z|00150|jsonrpc|DBG|unix#5: send reply, result=[{"uuid":["uuid","76a8f045-8669-46d5-a50b-cd391544efbe"]}], id=1262
2022-01-23T19:28:26.770Z|00151|jsonrpc|DBG|unix#6: send notification, method="update3", params=[["monid","OVN_Northbound"],"00000000-0000-0000-0000-000000000000",{"DHCP_Options":{"76a8f045-8669-46d5-a50b-cd391544efbe":{"insert":{"options":["map",[["dns_server","{10.10.10.10}"],["lease_time","43200"],["mtu","1442"],["router","20.0.0.1"],["server_id","20.0.0.1"],["server_mac","12:34:56:78:31:3f"]]],"cidr":"20.0.0.0/24","external_ids":["map",[["neutron:revision_number","0"],["subnet_id","2cf02242-c45f-4ccd-8219-7268a94379e7"]]]}}}}]
2022-01-23T19:28:27.314Z|00152|poll_loop|DBG|wakeup due to [POLLIN] on fd 17 (/tmp/tmp8aro35_w/ovnnb_db.sock<->) at lib/stream-fd.c:157 (0% CPU usage)
* but in failed run there is only:
2022-01-23T19:28:46.862Z|00169|jsonrpc|DBG|tcp:127.0.0.1:60220: send notification, method="update3", params=[["monid","OVN_Northbound"],"00000000-0000-0000-0000-000000000000",{"Logical_Switch":{"70caa880-3fed-4f9b-9558-f624c0d2c469":{"insert":{"name":"neutron-51ac15e4-24cc-4348-85c3-fe3a51cfcaec","external_ids":["map",[["neutron:availability_zone_hints",""],["neutron:mtu","1442"],["neutron:network_name","network1"],["neutron:revision_number","1"]]],"other_config":["map",[["mcast_flood_unregistered","false"],["mcast_snoop","false"],["vlan-passthru","false"]]]}}}}]
2022-01-23T19:28:49.128Z|00170|poll_loop|DBG|wakeup due to 2266-ms timeout at ovsdb/ovsdb-server.c:296 (0% CPU usage)
2022-01-23T19:28:51.629Z|00171|poll_loop|DBG|wakeup due to 2501-ms timeout at ovsdb/ovsdb-server.c:296 (0% CPU usage)
2022-01-23T19:28:54.130Z|00172|poll_loop|DBG|wakeup due to 2501-ms timeout at ovsdb/ovsdb-server.c:296
From other things which I noticed, in the failed test testrun log, there is error like:
Just after network was created. In the good runs, similar error is also sometimes visible but it's later during the test.
If that could be root cause, then it seems like some kind of race condition but I'm not sure about that really.
I today compared logs from the failed (https:/ /storage. gra.cloud. ovh.net/ v1/AUTH_ dcaab5e32b234d5 6b626f72581e364 4c/zuul_ opendev_ logs_f76/ 825521/ 3/gate/ neutron- functional- with-uwsgi/ f76cef6/ controller/ logs/dsvm- functional- logs/neutron. tests.functiona l.plugins. ml2.drivers. ovn.mech_ driver. ovsdb.test_ ovsdb_monitor. TestNBDbMonitor OverTcp. test_floatingip _mac_bindings/) and good (https:/ /storage. gra.cloud. ovh.net/ v1/AUTH_ dcaab5e32b234d5 6b626f72581e364 4c/zuul_ opendev_ logs_f76/ 825521/ 3/gate/ neutron- functional- with-uwsgi/ f76cef6/ controller/ logs/dsvm- functional- logs/neutron. tests.functiona l.plugins. ml2.drivers. ovn.mech_ driver. ovsdb.test_ ovsdb_monitor. TestNBDbMonitor .test_floatingi p_mac_bindings/) runs. Here's what I found:
- In the SB logs, there is difference like:
* in good run we have: 23T19:28: 26.309Z| 00140|jsonrpc| DBG|unix# 3: send notification, method="update2", params= ["a220fbde- 7c82-11ec- 9a03-4fce359daa f8",{"Datapath_ Binding" :{"2c98db7e- 84ad-4782- 9d87-24eafae170 0e":{"insert" :{"tunnel_ key":1, "external_ ids":[" map",[[ "logical- switch" ,"5b801987- 900b-4b79- aa3b-05c5fdc422 eb"],[" name"," neutron- 2d726047- 4990-446b- 9f1b-9dc6403adb 3e"],[" name2", "network1" ]]]}}}} ] 23T19:28: 26.330Z| 00141|poll_ loop|DBG| wakeup due to [POLLIN] on fd 17 (/tmp/tmp8aro35 _w/ovnsb_ db.sock< ->) at lib/stream-fd.c:157 23T19:28: 26.330Z| 00142|jsonrpc| DBG|unix# 3: received request, method="transact", params= ["OVN_Southboun d",{"uuid- name":" rowdebd1810_ 12e4_4ae8_ 9bcd_a18781f1b9 86","table" :"MAC_Binding" ,"op":" insert" ,"row": {"ip":" 100.0.0. 21","datapath" :["uuid" ,"2c98db7e- 84ad-4782- 9d87-24eafae170 0e"]}}] , id=1260 23T19:28: 26.330Z| 00143|jsonrpc| DBG|unix# 3: send reply, result= [{"uuid" :["uuid" ,"e514d19c- 7371-4d1b- 9958-25ffd612b6 8b"]}], id=1260 23T19:28: 26.330Z| 00144|jsonrpc| DBG|unix# 4: send notification, method="update3", params= [["monid" ,"OVN_Southboun d"],"00000000- 0000-0000- 0000-0000000000 00",{"MAC_ Binding" :{"e514d19c- 7371-4d1b- 9958-25ffd612b6 8b":{"insert" :{"ip": "100.0. 0.21"," datapath" :["uuid" ,"2c98db7e- 84ad-4782- 9d87-24eafae170 0e"]}}} }] 23T19:28: 26.460Z| 00145|poll_ loop|DBG| wakeup due to [POLLIN] on fd 18 (/tmp/tmp8aro35 _w/ovnsb_ db.sock< ->) at lib/stream-fd.c:157
2022-01-
2022-01-
2022-01-
2022-01-
2022-01-
2022-01-
but in failed run there is only:
2022-01- 23T19:28: 46.868Z| 00156|jsonrpc| DBG|tcp: 127.0.0. 1:33296: send notification, method="update2", params= ["ae7bd9c6- 7c82-11ec- 9a03-4fce359daa f8",{"Connectio n":{"83c29962- a6e2-4f3c- 9fc7-888cf46540 ed":{"modify" :{"is_connected ":true, "status" :["map" ,[["bound_ port"," 42597"] ,["n_connection s","2"] ,["sec_ since_connect" ,"0"],[ "sec_since_ disconnect" ,"0"]]] }}},"Datapath_ Binding" :{"15003f80- 2b91-44d6- 8b9d-f8d217285f b3":{"insert" :{"tunnel_ key":1, "external_ ids":[" map",[[ "logical- switch" ,"70caa880- 3fed-4f9b- 9558-f624c0d2c4 69"],[" name"," neutron- 51ac15e4- 24cc-4348- 85c3-fe3a51cfca ec"],[" name2", "network1" ]]]}}}} ] 23T19:28: 49.369Z| 00157|poll_ loop|DBG| wakeup due to 2500-ms timeout at ovsdb/ovsdb- server. c:296 (0% CPU usage) 23T19:28: 51.872Z| 00158|poll_ loop|DBG| wakeup due to 2503-ms timeout at ovsdb/ovsdb- server. c:296 (0% CPU usage) 23T19:28: 54.373Z| 00159|poll_ loop|DBG| wakeup due to 2501-ms timeout at ovsdb/ovsdb- server. c:296 (0% CPU usage)
2022-01-
2022-01-
2022-01-
and nothing else.
In the NB log, there is:
* good run:
2022-01- 23T19:28: 26.457Z| 00144|jsonrpc| DBG|unix# 5: send notification, method="update2", params= ["a220fbdc- 7c82-11ec- 9a03-4fce359daa f8",{"Logical_ Switch" :{"16b0beb1- 7c31-40dd- 8c76-1ece7d81d2 0b":{"insert" :{"name" :"neutron- c314b414- 279d-4457- a8d2-eeab1d2d78 ad","external_ ids":[" map",[[ "neutron: availability_ zone_hints" ,""],[" neutron: mtu","1442" ],["neutron: network_ name"," net1"], ["neutron: revision_ number" ,"1"]]] ,"other_ config" :["map" ,[["mcast_ flood_unregiste red","false" ],["mcast_ snoop", "false" ],["vlan- passthru" ,"false" ]]]}}}} ] 23T19:28: 26.457Z| 00145|jsonrpc| DBG|unix# 5: send reply, result= [{"uuid" :["uuid" ,"16b0beb1- 7c31-40dd- 8c76-1ece7d81d2 0b"]}], id=1261 23T19:28: 26.457Z| 00146|jsonrpc| DBG|unix# 6: send notification, method="update3", params= [["monid" ,"OVN_Northboun d"],"00000000- 0000-0000- 0000-0000000000 00",{"Logical_ Switch" :{"16b0beb1- 7c31-40dd- 8c76-1ece7d81d2 0b":{"insert" :{"name" :"neutron- c314b414- 279d-4457- a8d2-eeab1d2d78 ad","external_ ids":[" map",[[ "neutron: availability_ zone_hints" ,""],[" neutron: mtu","1442" ],["neutron: network_ name"," net1"], ["neutron: revision_ number" ,"1"]]] ,"other_ config" :["map" ,[["mcast_ flood_unregiste red","false" ],["mcast_ snoop", "false" ],["vlan- passthru" ,"false" ]]]}}}} ] 23T19:28: 26.769Z| 00147|poll_ loop|DBG| wakeup due to [POLLIN] on fd 17 (/tmp/tmp8aro35 _w/ovnnb_ db.sock< ->) at lib/stream-fd.c:157 (0% CPU usage) 23T19:28: 26.769Z| 00148|jsonrpc| DBG|unix# 5: received request, method="transact", params= ["OVN_Northboun d",{"uuid- name":" row157d7b26_ ebea_47d7_ 8917_161d2f11a2 46","table" :"DHCP_ Options" ,"op":" insert" ,"row": {"options" :["map" ,[["dns_ server" ,"{10.10. 10.10}" ],["lease_ time"," 43200"] ,["mtu" ,"1442" ],["router" ,"20.0. 0.1"],[ "server_ id","20. 0.0.1"] ,["server_ mac","12: 34:56:78: 31:3f"] ]],"cidr" :"20.0. 0.0/24" ,"external_ ids":[" map",[[ "neutron: revision_ number" ,"0"],[ "subnet_ id","2cf02242- c45f-4ccd- 8219-7268a94379 e7"]]]} }], id=1262 23T19:28: 26.769Z| 00149|jsonrpc| DBG|unix# 5: send notification, method="update2", params= ["a220fbdc- 7c82-11ec- 9a03-4fce359daa f8",{"DHCP_ Options" :{"76a8f045- 8669-46d5- a50b-cd391544ef be":{"insert" :{"options" :["map" ,[["dns_ server" ,"{10.10. 10.10}" ],["lease_ time"," 43200"] ,["mtu" ,"1442" ],["router" ,"20.0. 0.1"],[ "server_ id","20. 0.0.1"] ,["server_ mac","12: 34:56:78: 31:3f"] ]],"cidr" :"20.0. 0.0/24" ,"external_ ids":[" map",[[ "neutron: revision_ number" ,"0"],[ "subnet_ id","2cf02242- c45f-4ccd- 8219-7268a94379 e7"]]]} }}}] 23T19:28: 26.769Z| 00150|jsonrpc| DBG|unix# 5: send reply, result= [{"uuid" :["uuid" ,"76a8f045- 8669-46d5- a50b-cd391544ef be"]}], id=1262 23T19:28: 26.770Z| 00151|jsonrpc| DBG|unix# 6: send notification, method="update3", params= [["monid" ,"OVN_Northboun d"],"00000000- 0000-0000- 0000-0000000000 00",{"DHCP_ Options" :{"76a8f045- 8669-46d5- a50b-cd391544ef be":{"insert" :{"options" :["map" ,[["dns_ server" ,"{10.10. 10.10}" ],["lease_ time"," 43200"] ,["mtu" ,"1442" ],["router" ,"20.0. 0.1"],[ "server_ id","20. 0.0.1"] ,["server_ mac","12: 34:56:78: 31:3f"] ]],"cidr" :"20.0. 0.0/24" ,"external_ ids":[" map",[[ "neutron: revision_ number" ,"0"],[ "subnet_ id","2cf02242- c45f-4ccd- 8219-7268a94379 e7"]]]} }}}] 23T19:28: 27.314Z| 00152|poll_ loop|DBG| wakeup due to [POLLIN] on fd 17 (/tmp/tmp8aro35 _w/ovnnb_ db.sock< ->) at lib/stream-fd.c:157 (0% CPU usage)
2022-01-
2022-01-
2022-01-
2022-01-
2022-01-
2022-01-
2022-01-
2022-01-
* but in failed run there is only:
2022-01- 23T19:28: 46.862Z| 00169|jsonrpc| DBG|tcp: 127.0.0. 1:60220: send notification, method="update3", params= [["monid" ,"OVN_Northboun d"],"00000000- 0000-0000- 0000-0000000000 00",{"Logical_ Switch" :{"70caa880- 3fed-4f9b- 9558-f624c0d2c4 69":{"insert" :{"name" :"neutron- 51ac15e4- 24cc-4348- 85c3-fe3a51cfca ec","external_ ids":[" map",[[ "neutron: availability_ zone_hints" ,""],[" neutron: mtu","1442" ],["neutron: network_ name"," network1" ],["neutron: revision_ number" ,"1"]]] ,"other_ config" :["map" ,[["mcast_ flood_unregiste red","false" ],["mcast_ snoop", "false" ],["vlan- passthru" ,"false" ]]]}}}} ] 23T19:28: 49.128Z| 00170|poll_ loop|DBG| wakeup due to 2266-ms timeout at ovsdb/ovsdb- server. c:296 (0% CPU usage) 23T19:28: 51.629Z| 00171|poll_ loop|DBG| wakeup due to 2501-ms timeout at ovsdb/ovsdb- server. c:296 (0% CPU usage) 23T19:28: 54.130Z| 00172|poll_ loop|DBG| wakeup due to 2501-ms timeout at ovsdb/ovsdb- server. c:296
2022-01-
2022-01-
2022-01-
From other things which I noticed, in the failed test testrun log, there is error like:
2022-01-23 19:28:46.875 54853 ERROR ovsdbapp. backend. ovs_idl. vlog [-] tcp:127. 0.0.1:42597: error parsing update: ovsdb error: Modify non-existing row: ovs.db.error.Error: ovsdb error: Modify non-existing row
Just after network was created. In the good runs, similar error is also sometimes visible but it's later during the test.
If that could be root cause, then it seems like some kind of race condition but I'm not sure about that really.