The gate-horizon-selenium job starts Xvfb on display :99, then uses selenium and firefox to run 2 horizon selenium tests, then 7 dashboard tests (after skipping 117 unit tests). During one of the tests, selenium suddenly hangs, apparently on the get(), and the job times out after 30 minutes.
When the job hangs, it is not always on the same test; sometimes it hangs on the first horizon test, sometimes the last dashboard test, and sometimes one in between.
Many executions of this job have succeeded, so it is sporadic.
The errors have occured on a number of different build systems (jenkins01, 03, 05, and 07), so it is not isolated to a particular build system
The previous jobs on any particular slave have succeeded, so it does not appear to be due to some leftover resource.
There have been no cases of multiple instances of this job running on the same system at the same time, so it does not appear to be a resource contention issue.
I have not been able reproduce the hanging behavior locally. I have attempted (unsuccessfully) to induce it by:
- Running a multiple Xvfb instances
- Killing Xvfb before the tests are running
- Killing Xvfb while the tests are running
- Using with an invalid URL (by modifying the source)
but in all cases the tests fail immediately without hanging
Google searches reveal that others have had issues with the python+firefox+selenium hanging sporadically on get(), but none with the exact combination of versions that we use. This is the most likely cause of the problem, but without having a reproducible test it is difficult to verify that that this is indeed the cause nor that upgrading versions will fix the problem.
Summary of findings:
The gate-horizon- selenium job starts Xvfb on display :99, then uses selenium and firefox to run 2 horizon selenium tests, then 7 dashboard tests (after skipping 117 unit tests). During one of the tests, selenium suddenly hangs, apparently on the get(), and the job times out after 30 minutes.
When the job hangs, it is not always on the same test; sometimes it hangs on the first horizon test, sometimes the last dashboard test, and sometimes one in between.
Many executions of this job have succeeded, so it is sporadic.
I found the following instances of failures of this job: logs.openstack. org/03/ 92703/1/ check/gate- horizon- selenium/ dd448f1 logs.openstack. org/05/ 91905/1/ check/gate- horizon- selenium/ be82d92 logs.openstack. org/06/ 92006/1/ check/gate- horizon- selenium/ 80bd2ba logs.openstack. org/16/ 90716/4/ check/gate- horizon- selenium/ f898b0a logs.openstack. org/42/ 93142/1/ check/gate- horizon- selenium/ 682963d logs.openstack. org/58/ 92958/1/ check/gate- horizon- selenium/ d4a9170 logs.openstack. org/58/ 92958/3/ check/gate- horizon- selenium/ a4761bd
- http://
- http://
- http://
- http://
- http://
- http://
- http://
The errors have occured on a number of different build systems (jenkins01, 03, 05, and 07), so it is not isolated to a particular build system
The previous jobs on any particular slave have succeeded, so it does not appear to be due to some leftover resource.
There have been no cases of multiple instances of this job running on the same system at the same time, so it does not appear to be a resource contention issue.
I have not been able reproduce the hanging behavior locally. I have attempted (unsuccessfully) to induce it by:
- Running a multiple Xvfb instances
- Killing Xvfb before the tests are running
- Killing Xvfb while the tests are running
- Using with an invalid URL (by modifying the source)
but in all cases the tests fail immediately without hanging
Google searches reveal that others have had issues with the python+ firefox+ selenium hanging sporadically on get(), but none with the exact combination of versions that we use. This is the most likely cause of the problem, but without having a reproducible test it is difficult to verify that that this is indeed the cause nor that upgrading versions will fix the problem.