periodic featureset 35 wallaby times out running tempest (2 hours)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Unassigned |
Bug Description
At [1][2][3][4] the periodic-
2021-08-05 00:28:05.041522 | primary | TASK [os_tempest : Execute tempest tests] *******
2021-08-05 00:28:05.041528 | primary | Thursday 05 August 2021 00:28:05 +0000 (0:00:00.048) 1:50:45.136 *******
2021-08-05 02:22:27.036537 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.
Cant quickly see something useful from the tempest run logs [5] and tempestconf looks to have completed OK [6]
[1] https:/
[2] https:/
[3] https:/
[4] https:/
[5] https:/
[6] https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Marios Andreou (marios-b) wrote (last edit ): | #1 |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Marios Andreou (marios-b) wrote : | #2 |
I can't see any major difference in the nodes between a good log [1] and a timeout out [2], except the timeout one has less free memory (but same total)
[1] * MemTotal: 8150828 kB
MemFree: 1240728 kB
[2] * MemTotal: 8150828 kB
MemFree: 293704 kB
Similarly the cpuinfo log looks the same good @ [3] bad at [4]
I see in the errors log an issue reaching rabbit on controller-1 with retries, I don't know if that is directly related
2021-08-05 18:18:42.101 ERROR /var/log/
[1] https:/
[2] https:/
[3] https:/
[4] https:/
[5] https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Marios Andreou (marios-b) wrote : | #3 |
it also seems to be inconsistent :/
among the TIMED_OUT we also have a couple of success from Saturday 7th
* https:/
* 3 hrs 49 mins 3 secs 2021-08-07 22:11:54 SUCCESS
* 3 hrs 48 mins 3 secs 2021-08-07 16:36:46 SUCCESS
but they are taking close to 4 hours so pretty close to the timeout which is 4 hours (inherited from https:/
so why is it taking almost 2 hours to run tempest it seems excessive
used to take closer to 1 hour.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Marios Andreou (marios-b) wrote : | #4 |
Based on comment #1 and attached screen shot this started ~3rd August. I compared 2 'good runs' one that took close to 3 hours from 2/3 August [1] and another recent one from yesterday 9th august [2]
From [1]
3 hrs 9 mins 1 sec 2021-08-02 22:22:27
Ran: 1416 tests in 3475.6904 sec.
- Passed: 1295
- Skipped: 121
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 0
Sum of execute time for each test: 8074.3680 sec.
- Worker 0 (428 tests) => 0:57:48.876694
- Worker 1 (334 tests) => 0:50:33.366087
- Worker 2 (367 tests) => 0:39:59.810439
- Worker 3 (287 tests) => 0:49:15.021685
From [2]
Ran: 1416 tests in 7587.3411 sec.
- Passed: 1295
- Skipped: 121
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 0
Sum of execute time for each test: 16297.8125 sec.
- Worker 0 (362 tests) => 1:59:08.312166
- Worker 1 (356 tests) => 1:08:44.041797
- Worker 2 (426 tests) => 2:06:16.196664
- Worker 3 (272 tests) => 1:21:51.781254
As can be seen in 2 the same tests take twice as long to complete. You can see more about the timings at the stackviz logs [3] ('good' ~1 hour tempest run) and [4] (bad ~2 hours tempest)
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Sagi (Sergey) Shnaidman (sshnaidm) wrote : | #5 |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Ronelle Landy (rlandy) wrote : | #6 |
From https:/
Also noticed a difference in the openvswitch versions from August 3rd:
network-
openvswitch-
openvswitch2.
-------
network-
openvswitch-
openvswitch2.
https:/
matching that new build.
Maybe we downgrade openvswitch and see if we do better?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
chandan kumar (chkumar246) wrote : | #7 |
Since openvswitch2.
SO ovs update is might not be the culprit.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Martin Kopec (mkopec) wrote : | #8 |
Many tests just take longer, f.e:
test_dhcp6_
116.8 seconds -> 206.8 seconds
test_dualnet_
134.2 seconds -> 245.8 seconds
test_dualnet_
161.2 seconds -> 254.8 seconds
Seems like all requests, especially GET ones are taking much longer, comparison of requests within test_dualnet_
$ cut -d" " -f12- good_r
200 POST https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
201 POST https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
201 POST https://[2001:db8:
201 POST https://[2001:db8:
202 POST https://[2001:db8:
200 GET https://[2001:db8:
200 GET https://[2001:db8:
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
yatin (yatinkarel) wrote : | #9 |
So it's not just wallaby, xena is also impacted. Since https:/
For example:-
PASSING JOB:-
$ grep -nr "GET /v2.1/os-
0.273092
0.259492
0.238887
$ grep -nr "DELETE /v2.1/os-
0.368750
0.063184
0.254873
0.391544
0.331399
0.244026
0.425756
0.453150
0.236778
0.270068
0.586434
0.037038
0.045581
0.648366
0.236613
0.221590
FAILING JOB:-
$ grep -nr "GET /v2.1/os-
3.625276
2.823670
4.931641
2.061295
$ grep -nr "DELETE /v2.1/os-
3.348766
2.595817
2.260266
2.017600
1.669031
3.165000
3.684208
2.686081
1.148234
2.810574
1.752639
2.078275
2.003472
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Ronelle Landy (rlandy) wrote : | #10 |
https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart (master) | #11 |
Fix proposed to branch: master
Review: https:/
Changed in tripleo: | |
status: | Triaged → In Progress |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Grzegorz Grasza (xek) wrote : | #12 |
I tested a different way of disabling FQDNs in memcache server list configuration here:
https:/
The first successful run is without any change and the second one switches to IPs without reverting the large patch.
The second run finished faster by 22 minutes.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Slawek Kaplonski (slaweq) wrote : | #13 |
I looked at logs from the job https:/
In nsswitch.conf file there is:
hosts: files dns myhostname
So resolve of the names should be first done using /etc/hosts file and in this file there are entries for controllers like overcloud-
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master) | #14 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 1ce490716d3ff0a
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200
Environment for switching to using IPs for memcached
Related-Bug: #1939023
Change-Id: Iaadee6be4e1eaf
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/wallaby) | #15 |
Related fix proposed to branch: stable/wallaby
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/victoria) | #16 |
Related fix proposed to branch: stable/victoria
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/ussuri) | #17 |
Related fix proposed to branch: stable/ussuri
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train) | #18 |
Related fix proposed to branch: stable/train
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/wallaby) | #19 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit 2456e5930119030
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200
Environment for switching to using IPs for memcached
Related-Bug: #1939023
Change-Id: Iaadee6be4e1eaf
(cherry picked from commit 1ce490716d3ff0a
tags: | added: in-stable-wallaby |
tags: | added: in-stable-victoria |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/victoria) | #20 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/victoria
commit 4cbc970d15066aa
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200
Environment for switching to using IPs for memcached
Related-Bug: #1939023
Change-Id: Iaadee6be4e1eaf
(cherry picked from commit 1ce490716d3ff0a
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Slawek Kaplonski (slaweq) wrote : | #21 |
Today I checked logs from the job https:/
I found out that there are some tests which runs very long time, like e.g. tempest.
I compared this with u/s job and the same test took about 18 seconds.
Now, I checked in tempest logs, what took so long in that test and here is what I found:
zgrep test_associate_
2021-09-07 12:11:47.017 321761 INFO tempest.
2021-09-07 12:11:47.019 321761 INFO tempest.
2021-09-07 12:11:53.102 321761 INFO tempest.
2021-09-07 12:12:01.016 321761 INFO tempest.
2021-09-07 12:12:05.347 321761 INFO tempest.
2021-09-07 12:12:09.281 321761 INFO tempest.lib...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Bogdan Dobrelya (bogdando) wrote : | #22 |
@Slawek, did you testing show different results to what was brought in https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Bogdan Dobrelya (bogdando) wrote : | #23 |
I can't see the extr env file to switch memcached to use IPs there https:/
could you please adjust the job and retry it with environments/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/train) | #24 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/train
commit 3d637e176178e94
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200
Environment for switching to using IPs for memcached
Related-Bug: #1939023
Change-Id: Iaadee6be4e1eaf
(cherry picked from commit 1ce490716d3ff0a
tags: | added: in-stable-train |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/ussuri) | #25 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/ussuri
commit 750877f25e27b0f
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:20:06 2021 +0200
Environment for switching to using IPs for memcached
Related-Bug: #1939023
Change-Id: Iaadee6be4e1eaf
(cherry picked from commit 1ce490716d3ff0a
tags: | added: in-stable-ussuri |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
yatin (yatinkarel) wrote : | #26 |
<< could you please adjust the job and retry it with environments/
https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart (master) | #27 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 4b2454350289fe2
Author: Grzegorz Grasza <email address hidden>
Date: Wed Aug 25 09:23:52 2021 +0200
Use IPs instead of FQDNs in memcached with IPv6
Change-Id: I34c6a4d9e64e13
Resolves-Bug: #1939023
Depends-On: https:/
Changed in tripleo: | |
status: | In Progress → Fix Released |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Attila Fazekas (afazekas) wrote : | #28 |
Probably you want to switch to a different memcached library:
https:/
The current one is not prepared for if a name resolves to an ipv6 address,
but works with ipv6: addresses when configured by ip.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Slawek Kaplonski (slaweq) wrote : | #29 |
I was trying to reproduce that issue today but wasn't able to reproduce and investigate that issue. When I run it on test patch, tempest ended up for me in about 4300 seconds
======
Totals
======
Ran: 1425 tests in 4297.8985 sec.
- Passed: 1303
- Skipped: 121
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 1
Sum of execute time for each test: 8342.5592 sec.
I also checked builds history https:/
Next I compared time of the test execution in the fast (https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Lee Yarwood (lyarwood) wrote : | #30 |
As discussed downstream this appears to be the result of the environments/
Ultimately we either need to remove this environment *or* if that's not possible, increase timeouts for the individual test and overall test run.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Bogdan Dobrelya (bogdando) wrote : | #31 |
As a related to this issue, we should switch the environments/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (master) | #32 |
Related fix proposed to branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master) | #33 |
Related fix proposed to branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master) | #34 |
Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master) | #35 |
Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master) | #36 |
Related fix proposed to branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master) | #37 |
Related fix proposed to branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (stable/wallaby) | #38 |
Change abandoned by "Takashi Kajinami <email address hidden>" on branch: stable/wallaby
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master) | #39 |
Change abandoned by "chandan kumar <email address hidden>" on branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ci (master) | #40 |
Related fix proposed to branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ci (master) | #41 |
Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https:/
Reason: I don't think this is needed, let's tweak on a fs/job basisc
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master) | #42 |
Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ci (master) | #43 |
Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart (master) | #44 |
Change abandoned by "Bogdan Dobrelya <email address hidden>" on branch: master
Review: https:/
As can be seen in the attached screen shot from [1] the successful runs on this job are usually closer to ~3 hours. The timeouts started on 3rd August.
We *are* running a lot of tempest tests here [2] but that list of tests has not been altered recently and used to complete well within timeout.
Comparing to a green run at [3] the tempest tests usually take ~ 1hour to run:
* 2021-08-03 00:18:37.735157 | primary | TASK [os_tempest : Execute tempest tests] ******* ******* ******* ******* ******* ***
2021-08-03 00:18:37.735168 | primary | Tuesday 03 August 2021 00:18:37 +0000 (0:00:00.041) 1:39:33.632 ********
2021-08-03 01:16:39.798815 | primary | ok: [undercloud]
but as per this bug they are now timing out after 2 hours.
[1] https:/ /review. rdoproject. org/zuul/ builds? job_name= periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t035-wallaby /github. com/openstack/ tripleo- quickstart/ blob/444fcff6b1 7b77778382cd0be 5a45f7b85a7b7ca /config/ general_ config/ featureset035. yml#L175- L179 /logserver. rdoproject. org/openstack- periodic- integration- stable1/ opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t035-wallaby/ d385cc6/ job-output. txt
[2] https:/
[3] https:/