UnresponsiveClient.does_not_hang_server hangs server

Bug #1586382 reported by Andreas Pokorny
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mir
Triaged
High
Unassigned
mir (Ubuntu)
Triaged
High
Unassigned

Bug Description

10:35:12 [ RUN ] ServerWithoutActiveOutputs.creates_valid_client_surface
10:35:12 [ OK ] ServerWithoutActiveOutputs.creates_valid_client_surface (51 ms)
10:35:12 [----------] 1 test from ServerWithoutActiveOutputs (51 ms total)
10:35:12
10:35:12 [----------] 2 tests from ServerStartup
10:35:12 [ RUN ] ServerStartup.creates_endpoint_on_filesystem
10:35:12 [ OK ] ServerStartup.creates_endpoint_on_filesystem (63 ms)
10:35:12 [ RUN ] ServerStartup.after_server_sigkilled_can_start_new_instance
10:35:13 [ OK ] ServerStartup.after_server_sigkilled_can_start_new_instance (110 ms)
10:35:13 [----------] 2 tests from ServerStartup (173 ms total)
10:35:13
10:35:13 [----------] 1 test from ServerStartupReliability
10:35:13 [ RUN ] ServerStartupReliability.can_start_with_low_entropy
10:35:13 [ OK ] ServerStartupReliability.can_start_with_low_entropy (38 ms)
10:35:13 [----------] 1 test from ServerStartupReliability (38 ms total)
10:35:13
10:35:13 [----------] 3 tests from DebugAPI
10:35:13 [ RUN ] DebugAPI.translates_surface_coordinates_to_screen_coordinates
10:35:13 [ OK ] DebugAPI.translates_surface_coordinates_to_screen_coordinates (56 ms)
10:35:13 [ RUN ] DebugAPI.is_unavailable_when_server_not_started_with_debug
10:35:13 [ OK ] DebugAPI.is_unavailable_when_server_not_started_with_debug (52 ms)
10:35:13 [ RUN ] DebugAPI.is_overrideable
10:35:13 [ OK ] DebugAPI.is_overrideable (53 ms)
10:35:13 [----------] 3 tests from DebugAPI (164 ms total)
10:35:13
10:35:13 [----------] 1 test from UnresponsiveClient
10:35:13 [ RUN ] UnresponsiveClient.does_not_hang_server
10:47:00 Build timed out (after 30 minutes). Marking the build as aborted.
10:47:00 Build was aborted
10:47:00 Archiving artifacts
10:47:00 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
10:47:00 Finished: ABORTED

Happened at least twice this week on krillin.

Tags: testsfail

Related branches

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Only happens in this branch AFAIK:
https://code.launchpad.net/~kdub/mir/fix-1577967/+merge/294283

So possibly not a bug in lp:mir

Changed in mir:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Mir because there has been no activity for 60 days.]

Changed in mir:
status: Incomplete → Expired
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

https://mir-jenkins.ubuntu.com/job/device-runtests-mir/device_type=krillin/1397/consoleFull

11:52:24 [ RUN ] UnresponsiveClient.does_not_hang_server
11:52:24 Detected attempt to close a bad file-descriptor.
11:52:24 This usually indicates a double-close bug.
11:52:24 The bad file descriptor was: 31
12:07:39 Build timed out (after 35 minutes). Marking the build as aborted.
12:07:39 Build was aborted
12:07:39 Archiving artifacts
12:07:39 Terminated
12:07:40 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
12:07:40 Finished: ABORTED

Changed in mir:
status: Expired → Confirmed
importance: Undecided → Medium
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also yesterday (from bug 1615512):

06:27:29 [ RUN ] UnresponsiveClient.does_not_hang_server
06:27:29 Detected attempt to close a bad file-descriptor.
06:27:29 This usually indicates a double-close bug.
06:27:29 The bad file descriptor was: 31
06:42:50 Build timed out (after 35 minutes). Marking the build as aborted.
06:42:50 Build was aborted

https://mir-jenkins.ubuntu.com/job/device-runtests-mir/1392/device_type=krillin/consoleFull

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

This is blocking CI, raising priority.

Changed in mir:
importance: Medium → Critical
tags: added: ci-blocker
Changed in mir:
assignee: nobody → Alexandros Frantzis (afrantzis)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Yeah confirmed the only way to get a green light from CI right now is to delete the UnresponsiveClient tests:

https://code.launchpad.net/~vanvugt/mir/remove-UnresponsiveClient-test/+merge/303889

Changed in mir:
milestone: none → 0.25.0
Revision history for this message
Chris Halse Rogers (raof) wrote :

Has anyone been able to reproduce this outside of CI? I tried, and fixed what *I* hit, but I couldn't hit this problem.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I *think* so yes. For the past few days 'make test' has hung indefinitely on acceptance tests on my desktop.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I'm assuming it's the same issue...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

hung indefinitely /on more than 50% of attempts/ on acceptance tests on my desktop

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Workaround landed so this isn't a CI blocker any more.

Sadly bug 1616291 is still blocking autolandings.

tags: removed: ci-blocker
Changed in mir:
importance: Critical → High
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> Has anyone been able to reproduce this outside of CI? I tried, and fixed what *I* hit,
> but I couldn't hit this problem.

I am able to very easily reproduce this with --gtest_repeat=N locally, and I have found the core of the problem. Unfortunately, this bug is non-trivial to fix since it requires refactoring of fd ownership on the client side (which I have already started working on).

Changed in mir:
status: Confirmed → In Progress
Changed in mir:
milestone: 0.25.0 → 0.26.0
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Probably not in progress(?)

The offending test was disabled back in August because it was failing too much (and nobody was able to fix it, yet).

Changed in mir:
milestone: 0.26.0 → none
status: In Progress → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Note to self and others: This is still the second-hottest CI failure despite the test having been disabled to work around it.

Changed in mir:
assignee: Alexandros Frantzis (afrantzis) → nobody
Revision history for this message
Michał Sawicz (saviq) wrote :

Syncing task from Mir.

Changed in mir (Ubuntu):
importance: Undecided → High
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.