worker/uniter: tests panic using golang tip

Bug #1114726 reported by Dave Cheney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Dave Cheney

Bug Description

Revision https://code.google.com/p/go/source/detail?r=cf8e097deec5ad5c61bb88fca6c9013c125621e5 causes worker/uniter tests to panic in a way that hints at memory corruption

actual: []string{"install", "config-changed", "start", "db-relation-joined mysql/0 db:0", "db-relation-changed mysql/0 db:0", "config-changed"}
step 1
uniter_test.custom{f:(func(*gocheck.C, *uniter_test.context))(0x448a20)}
uniter_test.go:725:
    ctx.run(c, t.steps)
uniter_test.go:1448:
    c.Assert(ctx.relation.Destroy(), IsNil)
... value *errors.errorString = &errors.errorString{s:"cannot destroy relation \"u:db mysql:server\": ObjectIDs must be exactly 12 bytes long (got 23)"} ("cannot destroy relation \"u:db mysql:server\": ObjectIDs must be exactly 12 bytes long (got 23)")

[LOG] 2.81923 JUJU:DEBUG state/watcher: got request: watcher.reqUnwatch{key:watcher.watchKey{c:"relationscopes", id:interface {}(nil)}, ch:(chan<- watcher.Change)(0xc2008d9de0)}
[LOG] 2.81925 JUJU:DEBUG state/watcher: got request: watcher.reqUnwatch{key:watcher.watchKey{c:"settings", id:"r#0#provider#mysql/0"}, ch:(chan<- watcher.Change)(0xc2008d9d20)}
[LOG] 2.81926 JUJU:DEBUG worker/uniter: ModeAbide exiting
[LOG] 2.81927 JUJU worker/uniter: unit "u/0" shutting down: tomb: dying
[LOG] 2.81927 JUJU:DEBUG state/presence: stopping pinger for "u#u/0" with seq=2
[LOG] 2.81930 JUJU:DEBUG state/watcher: got request: watcher.reqUnwatch{key:watcher.watchKey{c:"relations", id:interface {}(nil)}, ch:(chan<- watcher.Change)(0xc2013263c0)}
[LOG] 2.81932 JUJU:DEBUG state/watcher: got request: watcher.reqUnwatch{key:watcher.watchKey{c:"settings", id:"s#u"}, ch:(chan<- watcher.Change)(0xc20077ca80)}
[LOG] 2.81933 JUJU:DEBUG state/watcher: got request: watcher.reqUnwatch{key:watcher.watchKey{c:"services", id:"u"}, ch:(chan<- watcher.Change)(0xc20077ca20)}
[LOG] 2.81934 JUJU:DEBUG state/watcher: got request: watcher.reqUnwatch{key:watcher.watchKey{c:"units", id:"u/0"}, ch:(chan<- watcher.Change)(0xc20077c900)}
[LOG] 2.81935 JUJU worker/uniter: filter error: tomb: dying
[LOG] 2.82020 JUJU environs/dummy: reset environment
[LOG] 2.82041 JUJU api: error receiving request: read tcp 127.0.0.1:39899: use of closed network connection
OOPS: 21 passed, 4 FAILED
--- FAIL: TestPackage (77.00 seconds)
FAIL
FAIL launchpad.net/juju-core/worker/uniter 77.111s

(with -gcflags -N, and a little debugging)

panic: error string is 833225874128 in length
goroutine 832 [running]:
launchpad.net/gocheck.(*suiteRunner).callDone(0xc200ebc8c0, 0xc200596a00, 0x7fd70e396100, 0x7fd70e396fb8)
        /home/dfc/src/launchpad.net/gocheck/gocheck.go:661 +0x315
launchpad.net/juju-core/state.(*State).AddCharm(0xc2004ab880, 0xc20011c980, 0xc200eccdb0, 0x7fd712575478, 0xc200cbf3f0, ...)
        /home/dfc/src/launchpad.net/juju-core/state/state.go:293 +0x370
launchpad.net/juju-core/worker/uniter_test.addCharm.step(0xc200eccdb0, 0x7fd712575478, 0xc200596a00, 0xc2004393c0, 0x1, ...)
        /home/dfc/src/launchpad.net/juju-core/worker/uniter/uniter_test.go:835 +0x84f
launchpad.net/juju-core/worker/uniter_test.(*addCharm).step(0xc20021dac0, 0xc200596a00, 0xc2004393c0, 0xc20021dad0, 0x1, ...)
        /home/dfc/src/launchpad.net/juju-core/worker/uniter/context_test.go:0 +0x9b
launchpad.net/juju-core/worker/uniter_test.step(0xc200596a00, 0xc2004393c0, 0xc2004f4120, 0xc20021dac0, 0x7fd712575478, ...)
        /home/dfc/src/launchpad.net/juju-core/worker/uniter/uniter_test.go:780 +0xe1
launchpad.net/juju-core/worker/uniter_test.createCharm.step(0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/dfc/src/launchpad.net/juju-core/worker/uniter/uniter_test.go:814 +0x6af
launchpad.net/juju-core/worker/uniter_test.(*createCharm).step(0xc20026d030, 0xc200596a00, 0xc2004393c0, 0xc2004cf960, 0x1, ...)
        /home/dfc/src/launchpad.net/juju-core/worker/uniter/context_test.go:0 +0xa1
launchpad.net/juju-core/worker/uniter_test.step(0xc200596a00, 0xc2004393c0, 0xc20026d000, 0xc20026d030, 0x1, ...)
        /home/dfc/src/launchpad.net/juju-core/worker/uniter/uniter_test.go:780 +0xe1
launchpad.net/juju-core/worker/uniter_test.(*context).run(0xc2004393c0, 0xc200596a00, 0xc2002c7a50, 0x5, 0x5, ...)
        /home/dfc/src/launchpad.net/juju-core/worker/uniter/uniter_test.go:118 +0x1e3
launchpad.net/juju-core/worker/uniter_test.(*UniterSuite).TestUniter(0xc20012e080, 0xc200596a00, 0x7fd712575100, 0x7fd712575fb8)
        /home/dfc/src/launchpad.net/juju-core/worker/uniter/uniter_test.go:725 +0x4e4
reflect.Value.call(0xa29978, 0xc20012e080, 0x1538, 0xa52458, 0x4, ...)
        /home/dfc/go/src/pkg/reflect/value.go:532 +0x13f3
reflect.Value.Call(0xa29978, 0xc20012e080, 0x1538, 0xc20022b400, 0x1, ...)
        /home/dfc/go/src/pkg/reflect/value.go:345 +0x9b
launchpad.net/gocheck.func·006(0xc2003facd8, 0x4d4fe7, 0xc200596a00, 0xc200596a00, 0x413360, ...)
        /home/dfc/src/launchpad.net/gocheck/gocheck.go:738 +0x4df
launchpad.net/gocheck.func·004(0xc2003face0, 0xc2003facf0, 0xc2003face8, 0x0, 0x0, ...)
        /home/dfc/src/launchpad.net/gocheck/gocheck.go:633 +0x77
created by launchpad.net/gocheck.(*suiteRunner).forkCall
        /home/dfc/src/launchpad.net/gocheck/gocheck.go:634 +0x23c

OOPS: 24 passed, 1 PANICKED
--- FAIL: TestPackage (18.61 seconds)
FAIL
FAIL launchpad.net/juju-core/worker/uniter 18.731s

Revision history for this message
Dave Cheney (dave-cheney) wrote :
Revision history for this message
Dave Cheney (dave-cheney) wrote :

First the good news. I think I have isolated the pattern in our code (not unique, this is valid syntax) which is causing the problem. The simplest explanation is this diff (there are other cases inside state and charm, but they follow a similar pattern)

lucky(~/src/launchpad.net/juju-core/state) % bzr diff relation.go
=== modified file 'state/relation.go'
--- state/relation.go 2013-01-25 18:33:16 +0000
+++ state/relation.go 2013-02-04 05:37:13 +0000
@@ -202,7 +202,7 @@
                        Update: D{{"$inc", D{{"relationcount", -1}}}},
                })
        }
- cDoc := &cleanupDoc{
+ cDoc := cleanupDoc{
                Id: bson.NewObjectId(),
                Kind: "settings",
                Prefix: fmt.Sprintf("r#%d#", r.Id()),
@@ -210,9 +210,10 @@
        return append(ops, txn.Op{
                C: r.st.cleanups.Name,
                Id: cDoc.Id,
- Insert: cDoc,
+ Insert: &cDoc,
        }), nil
 }
+

The bad news is I still have to convert this into a test case and get it fixed. If my theory is right, this is a problem with the escape analysis which was never detected because &T{} structs were always heap allocated regardless of their escape status. Once we fixed the latter bug, the former came to the surface.

Revision history for this message
Dave Cheney (dave-cheney) wrote :

Dmorsing and Remy_o have found the problem, but the fix is non trivial. I have proposed a rollback of the original commit.

Revision history for this message
Dave Cheney (dave-cheney) wrote :
Changed in juju-core:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.