mgo-txn-resumer cannot resume transaction because missing document
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
Hi,
On a controller running juju 2.2.4, these are the last lines of a controller log (see at the bottom).
The document is indeed gone from MongoDB :
juju:PRIMARY> db.leases.
0
This has led (I believe) to very large txn queues in documents related to this model (which is apparently still running version 2.2.2) :
{ "_id" : "d3ab9c43-
{ "_id" : "d3ab9c43-
{ "_id" : "d3ab9c43-
Also, the txn-resumer isn't working, so there are currently 4.5M txns in the DB :
juju:PRIMARY> db.txns.count()
4550581
This is also, I think, creating large load spikes from times to times on the controller.
How can we fix this situation ?
Thanks
machine logs :
2017-10-26 10:18:24 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:25 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:26 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:27 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:27 ERROR juju.worker.
2017-10-26 10:18:28 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:29 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:30 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:31 ERROR juju.worker.
2017-10-26 10:18:31 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:32 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:33 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:34 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:34 ERROR juju.worker.
2017-10-26 10:18:35 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:36 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:37 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:38 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:38 ERROR juju.worker.
2017-10-26 10:18:39 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:40 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:41 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:42 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:43 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:43 ERROR juju.worker.
2017-10-26 10:18:44 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:45 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:46 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:47 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:48 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:49 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:50 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:50 ERROR juju.worker.
2017-10-26 10:18:51 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:52 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:53 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:54 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:55 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:55 ERROR juju.worker.
2017-10-26 10:18:56 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:57 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:58 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:18:59 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:19:00 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:19:01 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:19:02 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:19:03 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:19:03 ERROR juju.worker.
2017-10-26 10:19:04 ERROR juju.worker runner.go:392 exited "txnlog": EOF
2017-10-26 10:19:05 ERROR juju.worker runner.go:392 exited "txnlog": EOF
tags: | added: canonical-is |
tags: | added: cpe-onsite |
tags: | added: 4010 |
tags: | removed: 4010 |
This got worked around by applying this change : update( {_id: ObjectId( "59ef3f595f5ce8 7b42d1e774" )}, {$unset: {"n": 1}, $set: {"s": 1}})
db.txns.
And the, running a full mgopurge (with controllers stopped) to handle the big txn-queues.