Tempest silently hides crash due to OOM
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Rally |
New
|
Undecided
|
Unassigned |
Bug Description
I was running the full Tempest suite via Rally. It looked like the suite passed successfully:
[...]
2020-07-09 10:48:32.672 16426 INFO default [-] {0} tempest.
2020-07-09 10:49:22.303 16426 INFO rally.task.context [-] Verification a73cc1fa-
2020-07-09 10:49:22.303 16426 INFO rally.task.context [-] Verification a73cc1fa-
2020-07-09 10:49:22.304 16426 INFO rally.task.context [-] Verification a73cc1fa-
2020-07-09 10:49:32.420 16426 INFO rally.task.context [-] Verification a73cc1fa-
2020-07-09 10:49:32.467 16426 INFO rally.api [-] Verification (UUID=6160b3bb-
29e-b92c-
======
Totals
======
Ran: 1558 tests in 3693.127 sec.
- Success: 579
- Skipped: 138
- Expected failures: 0
- Unexpected success: 0
- Failures: 0
Using verification (UUID=6160b3bb-
The HTML report matches these numbers: 579 tests in "success" status and 138 in "skipped" status.
However, if you look more closely at the test numbers, they don't add up at all: there are 841 tests missing! With `rally verify show 6160b3bb-
I found the reason tests stopped running at that point: there was an OOM which killed a python process:
[Thu Jul 9 10:49:20 2020] Out of memory: Killed process 16431 (python) total-vm:2074092kB, anon-rss:1509788kB, file-rss:0kB, shmem-rss:0kB, UID:1000
[Thu Jul 9 10:49:20 2020] oom_reaper: reaped process 16431 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Unfortunately tempest hides this failure and makes it look like the test suite completed successfully.
It looks like the framework running tempest (rally) should have dealt with the Out of memory error. I'd say that the whole tempest process (not just a tempest test) got killed by the error and therefore tempest didn't show any traceback, it couldn't, it got killed before.
I'm gonna change the project to Rally.