worker/lease manager_expire_test.go:206: ExpireSuite.TestClaim_ExpiryInFuture is flaky

Bug #1815471 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
John A Meinel
2.5
Fix Released
Medium
John A Meinel

Bug Description

Running under:
  go test -race -count=200 -failfast -test.timeout=20m -check.v

I got this test to fail:

FAIL: manager_expire_test.go:206: ExpireSuite.TestClaim_ExpiryInFuture

[LOG] 0:00.000 CRITICAL juju.clock timer 2073-03-03 02:00:00.000000005 -0840 -0840 not ready: 1h0m0s
[LOG] 0:00.000 TRACE lease_test [] next expire decided on 1h0m0s 2073-03-03 02:00:00.000000005 -0840 -0840
[LOG] 0:00.000 TRACE lease_test [] redis/0 asked for lease redis, no lease found, claiming for 1m0s
[LOG] 0:00.001 TRACE lease_test [] redis/0 claiming lease redis for 1m0s successful
[LOG] 0:00.001 TRACE lease_test [] ensuring we wake up before 1m0s at 2073-03-03 01:01:00.000000005 -0840 -0840
[LOG] 0:00.001 TRACE lease_test [] tick at 2073-03-03 01:01:03.000000004 -0840 -0840, running expiry checks
[LOG] 0:00.001 TRACE lease_test [] expire looking for leases to expire
[LOG] 0:00.001 WARNING lease_test [] reporting error to main loop: Store method called after test complete: Refresh []
[LOG] 0:00.001 TRACE lease_test [] exiting main loop with error: Store method called after test complete: Refresh []
fixture_test.go:131:
    c.Check(err, jc.ErrorIsNil)
... value *errors.Err = &errors.unformatter{message:"", cause:(*errors.Err)(0xc000328fa0), previous:(*errors.Err)(0xc0003292c0), file:"github.com/juju/juju/worker/lease/manager.go", line:167} ()
... error stack:
        github.com/juju/juju/worker/lease/util_test.go:149: Store method called after test complete: Refresh []
        github.com/juju/juju/worker/lease/manager.go:524:
        github.com/juju/juju/worker/lease/manager.go:183:
        github.com/juju/juju/worker/lease/manager.go:167:

In a successful run the log is supposed to look like:
PASS: manager_expire_test.go:27: ExpireSuite.SetUpTest 0.000s

[LOG] 0:00.000 CRITICAL juju.clock timer 2073-03-03 02:00:00.000000005 -0840 -0840 not ready: 1h0m0s
[LOG] 0:00.000 TRACE lease_test [] next expire decided on 1h0m0s 2073-03-03 02:00:00.000000005 -0840 -0840
[LOG] 0:00.000 TRACE lease_test [] redis/0 asked for lease redis, no lease found, claiming for 1m0s
[LOG] 0:00.000 TRACE lease_test [] redis/0 claiming lease redis for 1m0s successful
[LOG] 0:00.000 TRACE lease_test [] ensuring we wake up before 1m0s at 2073-03-03 01:01:00.000000005 -0840 -0840
[LOG] 0:00.000 TRACE lease_test [] tick at 2073-03-03 01:01:03.000000006 -0840 -0840, running expiry checks
[LOG] 0:00.000 TRACE lease_test [] expire looking for leases to expire
[LOG] 0:00.000 TRACE lease_test [] checking expiry on 1 leases
[LOG] 0:00.000 DEBUG lease_test [] expired 1 leases: redis
[LOG] 0:00.000 TRACE lease_test [] evaluating 0 blocks
[LOG] 0:00.001 TRACE lease_test [] next expire decided on 1h0m0s 2073-03-03 02:01:03.000000006 -0840 -0840
[LOG] 0:00.001 TRACE lease_test [] exiting main loop with error: catacomb 0xc0000e0a00 is dying
START: <autogenerated>:1: ExpireSuite.TearDownTest

It seems there is some sort of race with the Done() condition/Refresh is being called at an unexpected time.
Given that both of them do seem to be waking up at 01:01:03 and it is expecting to Refresh and then expire, it seems odd that it would sometimes claim that Refresh shouldn't be called.

Needs investigation.

Revision history for this message
John A Meinel (jameinel) wrote :

I don't think this is a production issue, just a flaky test, as such it shouldn't block 2.5.1

Changed in juju:
milestone: none → 2.6-beta1
Revision history for this message
John A Meinel (jameinel) wrote :

http://ci.jujucharms.com/job/make-check-juju/2863/testReport/junit/github/com_juju_juju_worker_lease/TestPackage/

Says that NoExpiry_LongEnough is also flaky:
goroutine 555 [semacquire, 24 minutes]:
sync.runtime_Semacquire(0xc00031cd98)
 /snap/go/3129/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00031cd90)
 /snap/go/3129/src/sync/waitgroup.go:130 +0x64
github.com/juju/juju/worker/lease_test.(*Fixture).RunTest(0xc0003d2f60, 0xc000624d20, 0xc0005829c8)
 /home/ubuntu/go/src/github.com/juju/juju/worker/lease/fixture_test.go:143 +0x536
github.com/juju/juju/worker/lease_test.(*ExpireSuite).TestStartup_NoExpiry_LongEnough(0xc000123450, 0xc000624d20)
 /home/ubuntu/go/src/github.com/juju/juju/worker/lease/manager_expire_test.go:111 +0x4ab
reflect.Value.call(0xaba1c0, 0xc000123450, 0x5613, 0xacfdd2, 0x4, 0xc000582f70, 0x1, 0x1, 0x1008380, 0x99e92179, ...)
 /snap/go/3129/src/reflect/value.go:447 +0x454
reflect.Value.Call(0xaba1c0, 0xc000123450, 0x5613, 0xc0004d2f70, 0x1, 0x1, 0xc000624e10, 0xc0004d2ee7, 0xc0004d2f70)
 /snap/go/3129/src/reflect/value.go:308 +0xa4
github.com/juju/juju/vendor/gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1(0xc000624d20)
 /home/ubuntu/go/src/github.com/juju/juju/vendor/gopkg.in/check.v1/check.go:772 +0x64f
github.com/juju/juju/vendor/gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1(0xc000116100, 0xc000624d20, 0xc00057f400)
 /home/ubuntu/go/src/github.com/juju/juju/vendor/gopkg.in/check.v1/check.go:666 +0x7c
created by github.com/juju/juju/vendor/gopkg.in/check%2ev1.(*suiteRunner).forkCall
 /home/ubuntu/go/src/github.com/juju/juju/vendor/gopkg.in/check.v1/check.go:663 +0x1ff

Not only that, but it deadlocks until the test suite expires. We need to confirm why it isn't failing as it should be if something gets blocked.

Revision history for this message
John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/9742 fixes ~5 tests that seem flaky in worker/lease.

Revision history for this message
John A Meinel (jameinel) wrote :
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.