github.com/juju/juju/state go test timeout

Bug #1625768 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Andrew Wilkins
juju-core
Won't Fix
Undecided
Unassigned
1.25
Won't Fix
Undecided
Unassigned

Bug Description

As seen at
    http://reports.vapour.ws/releases/issue/562a396a749a561a0619d324
there is a spike in timeouts within the unit tests. We expect to see 1 every other day. we now see 10+ a day!

The time out is seen in xenial across several architectures. The go test is using a 20 minute timeout except for race tests which has a 40 minute timeout. There appears to be a lock being held in github.com/juju/juju/state.

An example looks like

goroutine 1 [chan receive, 20 minutes]:
testing.RunTests(0x38b4d68, 0x4a3ede0, 0x1, 0x1, 0x2143547698badc01)
  /usr/lib/go-1.6/src/testing/testing.go:583 +0x8d2
testing.(*M).Run(0xc82002cf08, 0xc820076280)
  /usr/lib/go-1.6/src/testing/testing.go:515 +0x81
main.main()
  github.com/juju/juju/state/_test/_testmain.go:56 +0x117

goroutine 17 [syscall, 20 minutes, locked to thread]:
runtime.goexit()
  /usr/lib/go-1.6/src/runtime/asm_amd64.s:1998 +0x1

...

goroutine 13 [syscall, 20 minutes]:
os/signal.signal_recv(0x6)
  /usr/lib/go-1.6/src/runtime/sigqueue.go:116 +0x132
os/signal.loop()
  /usr/lib/go-1.6/src/os/signal/signal_unix.go:22 +0x18
created by os/signal.init.1
  /usr/lib/go-1.6/src/os/signal/signal_unix.go:28 +0x37

..

We have a sample of recent failures
    http://reports.vapour.ws/releases/4409/job/run-unit-tests-race/attempt/1889
    http://reports.vapour.ws/releases/4408/job/run-unit-tests-xenial-amd64/attempt/1342
    http://reports.vapour.ws/releases/4410/job/run-unit-tests-xenial-arm64/attempt/1353

Curtis Hovey (sinzui)
description: updated
Tim Penhey (thumper)
Changed in juju:
assignee: nobody → Tim Penhey (thumper)
Changed in juju:
assignee: Tim Penhey (thumper) → Alexis Bruemmer (alexis-bruemmer)
Tim Penhey (thumper)
Changed in juju:
assignee: Alexis Bruemmer (alexis-bruemmer) → Tim Penhey (thumper)
Revision history for this message
Tim Penhey (thumper) wrote :

https://github.com/juju/juju/pull/6296

I've addressed the slowest three state tests. Given how incredibly slow they were, I'm hoping this addresses this issue.

Changed in juju:
status: Triaged → In Progress
Revision history for this message
Tim Penhey (thumper) wrote :

I don't think we'll ever get a full solution to this until we replace the DB, but seems good enough for now.

Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Curtis Hovey (sinzui) wrote :

We are still seeing failures often enough to prevent a bless.

Changed in juju:
status: Fix Committed → In Progress
Revision history for this message
Christian Muirhead (2-xtian) wrote :

Reed and I are experimenting with removing ssl from the connection to mongo for tests - Dave Cheney mentioned that SSL handshaking made up a lot of the test runtime.

Revision history for this message
Christian Muirhead (2-xtian) wrote :

(For state tests at least.)
We don't know how much of an improvement it's likely to make (if any) yet.

Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-rc2 → none
Changed in juju:
milestone: none → 2.0.0
Changed in juju:
status: In Progress → Triaged
importance: Critical → High
assignee: Tim Penhey (thumper) → Alexis Bruemmer (alexis-bruemmer)
milestone: 2.0.0 → 2.1.0
Changed in juju-core:
status: New → Won't Fix
Revision history for this message
Tim Penhey (thumper) wrote :

Mostly this is cert generation for the arm64, which is very slow.

Changed in juju:
status: Triaged → In Progress
assignee: Alexis Bruemmer (alexis-bruemmer) → Tim Penhey (thumper)
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.1.0 → 2.2.0-alpha1
Revision history for this message
Anastasia (anastasia-macmood) wrote :

According to the latest report, the issue is still there on both 2.1 and develop despite cert generation improvements.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I ran the state tests with -cpuprofile, and it pointed to debugf (among other things) in the mgo package. Some time ago there was a change to github.com/juju/testing.MgoSuite to call mgo.SetLogger and mgo.SetDebug, presumably to simplify isolating mgo-related test failures. I disabled them, and the time taken to run the package tests dropped by ~100s.

PR to disable them by default is here: https://github.com/juju/testing/pull/120

Revision history for this message
Anastasia (anastasia-macmood) wrote :
Changed in juju:
status: In Progress → Fix Committed
assignee: Tim Penhey (thumper) → Andrew Wilkins (axwalk)
milestone: 2.2.0-alpha1 → 2.1.0
Revision history for this message
Aaron Bentley (abentley) wrote :

We are still seeing this, on develop and 2.1 as of today.

Changed in juju:
status: Fix Committed → Triaged
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Removing 2.1 milestone as we will not be addressing this issue in 2.1.

Also reducing it's Importance as a test related bug.

Changed in juju:
milestone: 2.1-rc2 → none
importance: High → Medium
Revision history for this message
Andrew Wilkins (axwalk) wrote :

OK, so it looks like there is a bug in the SingularSuite.TestExpire test. I've just ran it in a loop locally and experienced a similar timeout.

Revision history for this message
Andrew Wilkins (axwalk) wrote :
Andrew Wilkins (axwalk)
Changed in juju:
milestone: none → 2.1-rc2
status: Triaged → Fix Committed
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.