juju.txns mongo collection getting out of control

Bug #1676427 reported by Junien Fridrick
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Menno Finlay-Smits

Bug Description

Hi,

We have a juju2 environment (now running juju 2.1.1) where the juju.txns mongo collection is getting out of control :

juju:PRIMARY> db.txns.count()
34831682 <= nearly 35M
juju:PRIMARY> db.txns.stats()["size"]
16801507852 <= that's 16GB
juju:PRIMARY> db.txns.stats()["storageSize"]
5176877056 <= that's 5GB
juju:PRIMARY> db.txns.stats()["totalIndexSize"]
459661312 <= that's 450MB

On this environment (where HA is enabled, we have 3 controllers), mongodb is regularly killed by Linux' OOM-killer. I have enabled DEBUG logging for juju.txn, and I'm fairly convinced that the oom-kills are happening when all the txns are gathering. In the logs, we see the following :

2017-03-27 10:45:57 INFO juju.txn prune.go:43 txns after last prune: 206688, txns now = 34715286, pruning required: true
2017-03-27 10:45:57 DEBUG juju.txn prune.go:148 loading all completed transactions

But we never see "found %d completed transactions across %d collections", and the code suggests we should (https://github.com/juju/txn/blob/master/prune.go#L148).

According to what's stored in the juju.txns.prune collection, our last successful pruning dates back to March 10th :

juju:PRIMARY> db.txns.prune.find({"_id": "last"}).pretty()
{ "_id" : "last", "id" : ObjectId("58c2b1fbc2b5ab5367292bf5") }
juju:PRIMARY> db.txns.prune.find({"_id": ObjectId("58c2b1fbc2b5ab5367292bf5")}).pretty()
{
        "_id" : ObjectId("58c2b1fbc2b5ab5367292bf5"),
        "started" : ISODate("2017-03-10T13:59:36.751Z"),
        "completed" : ISODate("2017-03-10T14:02:35.747Z"),
        "txns-before" : 592869,
        "txns-after" : 206688
}

This is rather concerning, as the problem will only get worse.

In the same prune.go file, we can read that (https://github.com/juju/txn/blob/master/prune.go#L137) :

// This set could potentially contain many entries, however even
// 500,000 entries requires only ~44MB of memory. Given that the
// memory hit is short-lived this is probably acceptable.

Which means that 35M entries require 70*44MB = ~3GB of RAM, and we simply don't have this kind of free memory on our controllers (which have 8GB each).

Even after the last prune, there were 206k transactions remaining. Isn't that a bit high for a controller hosting ~60 models ? I took a look at oldest txn object, and it's the transaction that creates the controller model, adds the "admin" user, etc.. In essence, the very first transaction. Isn't it supposed to be purged at some point ?

The controller model also has some very old transactions in its txn-queue, the first of which says to update "available-tools" for the controller model to "2.0.1", so quite old. Is that one supposed to still exist as well ?

To recap my questions :

1) could you make juju / mgo _not_ read every existing transaction in memory ? I think this should be very much avoided.

1a) how do we get our environment to purge transactions ? Do you have a workaround / manual mongodb query we could apply ?

2) shouldn't the juju.txns collection be capped ? As I understand it, it will only get pruned if the number of current transactions is twice the number of transactions after the last purge, allowing for a unlimited (and potentially exponential) growth

3) is it normal that some transactions stay in the mongodb database forever ?

4) what removes transactions from the txn-queue ? I believe some documents are still in the juju.txns collection because they're still referenced in some txn-queue.

Thanks

Tags: canonical-is
Junien Fridrick (axino)
tags: added: canonical-is
Changed in juju:
importance: Undecided → High
assignee: nobody → Menno Smits (menno.smits)
assignee: Menno Smits (menno.smits) → nobody
status: New → Triaged
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

> 1) could you make juju / mgo _not_ read every existing transaction in memory ? I think this should be very much avoided.

Agreed that the current approach doesn't scale and needs to be fixed.

I think what we should store the working set in a temporary MongoDB collection. This can be done efficiently using MongoDB's aggregation pipeline features (the $out operator is particularly useful) and the bulk update features. Note that although it'll be fairly efficient it does trade memory usage for disk I/O. For this reason transaction pruning is still something that we don't want to do too often.

> 1a) how do we get our environment to purge transactions ? Do you have a workaround / manual mongodb query we could apply ?

One of the things the mgopurge tool does is purge completed transactions. You need to shut down the controllers to run it so it's pretty invasive.

https://github.com/juju/mgopurge/releases

The better solution is make Juju do a better job of purging itself.

> 2) shouldn't the juju.txns collection be capped ? As I understand it, it will only get pruned if the number of current transactions is twice the number of transactions after the last purge, allowing for a unlimited (and potentially exponential) growth

It can't be capped in the usual sense. As long as a transaction is referenced by a document somewhere.

> 3) is it normal that some transactions stay in the mongodb database forever ?

Yes that is expected. The last transaction to touch a document stays in the document's txn-queue field.

Also, when transaction is "assert only" (doesn't change anything) the txn-queue field is only appended to, with previous completed transactions being left behind. We should fix this behaviour in mgo/txn as it can lead to excessive amounts of transactions being left behind which can't be cleaned up.

I've just filed this: https://bugs.launchpad.net/juju/+bug/1676673

> 4) what removes transactions from the txn-queue ? I believe some documents are still in the juju.txns collection because they're still referenced in some txn-queue.

Yep that's correct. A previous transaction id is removed from a document's txn-queue when the next data modifying transaction touches it. Some documents are never touched after creation so the creating transaction hangs around. I suspect changing this behaviour in mgo/txn will be non-trivial.

Aside from the above, I think we should make the pruning schedule more aggressive. As you've noted the current approach of only pruning when the transaction count has doubled is not really working. There is a balance to be found though - given that pruning is an expensive operation, we don't want to do it too often.

Changed in juju:
milestone: none → 2.2-beta2
Changed in juju:
assignee: nobody → Menno Smits (menno.smits)
Revision history for this message
John A Meinel (jameinel) wrote :

We could really use a copy of the data so that as we iterate on possible fixes we can make sure we're addressing the core issues. Is it possible to get a copy out-of-band?

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Junien Fridrick (axino) wrote :

A copy of the mongo database ?

Junien Fridrick (axino)
Changed in juju:
status: Incomplete → Triaged
John A Meinel (jameinel)
Changed in juju:
status: Triaged → In Progress
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Yes, it would be very helpful if we could get our hands on a dump of the database (taken with mongodump).

Revision history for this message
James Troup (elmo) wrote : Re: [Bug 1676427] Re: juju.txns mongo collection getting out of control

Menno Smits <email address hidden> writes:

> Yes, it would be very helpful if we could get our hands on a dump of the
> database (taken with mongodump).

Do you guys have a database sanitization tool?

--
James

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@James Troup (elmo),

The PR for script is here: https://github.com/juju/juju/pull/7172

I believe that this has been shared with Junien (axino).

Changed in juju:
milestone: 2.2-beta2 → 2.2-beta3
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

@elmo: jam created one yesterday to help with this. It'll get merged today for future occasions.

See: https://github.com/juju/juju/pull/7172

AFAIK jam and axino used it yesterday to create a dump that we can use. I'm hoping to get my hands on it today.

In the meantime I'm currently working on the alternative (low memory) txn pruning algorithm.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

The meat of the fix is here: https://github.com/juju/txn/pull/22

This greatly reduces both the memory used during pruning (now negligible) and the time taken (by an order of magnitude).

Revision history for this message
John A Meinel (jameinel) wrote :

Note that: https://github.com/juju/juju/pull/7180
at least helps with "txns getting out of control" by removing one more of the places where we create transactions on an ongoing basis that don't actually contribute anything meaningful.
(see bug #1677619)

no longer affects: juju/2.1
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

This PR to Juju pull in the new pruning algorithm and makes pruning occur more often (after 10% growth in txns): https://github.com/juju/juju/pull/7183

Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Junien Fridrick (axino) wrote :

Thanks for this. I tested the new mgopurge on my test instance with a copy of the production mongodb database, and it completed all the stages in ~1h15 without using too much RAM :) Excellent !

So I guess we have 2 options to stabilize our controller :
 - you swiftly release 2.1.3 and we upgrade to it

OR

 - while waiting for 2.2, we cron runs of "mgopurge -yes -stages prune" to prune the DB, and we disable the builtin pruning by setting txns-after to a silly value in the most recent juju.txns.prune document, e.g. : db.txns.prune.update({_id: ObjectId("58c2b1fbc2b5ab5367292bf5")}, { $set : {"txns-after": 40000000}})

What do you think ?

Revision history for this message
John A Meinel (jameinel) wrote :

The word through the grapevine is that you should go the 'mgopurge' route. I don't think you'd actually have to disable the built-in one as if there isn't much to purge, it shouldn't consume a lot of memory.

The fact that you have ~400k transactions that aren't getting purged is also something we want to understand. It may be something we need to teach the pruner/mgopurge, or it might be something like the quick patch to mgo/txn that makes assert-only transactions still cleanup the txn queue.

Revision history for this message
John A Meinel (jameinel) wrote :

Note also bug #1678587 [NEW] juju.txns.stash never cleaned

I'm working on patching both the primer and mgopurge to clean out
txns.stash. That should catch at least another 300k of the stale (completed
but not purged) transactions.

John
=:->

On Mar 31, 2017 2:51 PM, "John A Meinel" <email address hidden> wrote:

> The word through the grapevine is that you should go the 'mgopurge'
> route. I don't think you'd actually have to disable the built-in one as
> if there isn't much to purge, it shouldn't consume a lot of memory.
>
> The fact that you have ~400k transactions that aren't getting purged is
> also something we want to understand. It may be something we need to
> teach the pruner/mgopurge, or it might be something like the quick patch
> to mgo/txn that makes assert-only transactions still cleanup the txn
> queue.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1676427
>
> Title:
> juju.txns mongo collection getting out of control
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1676427/+subscriptions
>

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Addition PR against develop (2.2) landed - https://github.com/juju/juju/pull/7180

Revision history for this message
John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/7224 is another PR against 2.2

On my system running 'peer-xplod' with 50 units, and trying to do about 1,200 txns per minute, a prune after 2 hrs ends up with:

2017-04-11 17:24:10 INFO juju.txn prune.go:135 txns after last prune: 108, txns now: 149915, pruning: too many new transactions
2017-04-11 17:25:08 INFO juju.txn prune.go:156 txn pruning complete after 58.189718631s. txns now: 763, inspected 56 collections, 496 docs (293 cleaned)
   removed 109 stash docs and 149940 txn docs

so that found 293 documents referencing completed transactions (a bit spurious as it is the same documents being updated over-and-over-again), and was able to prune 150,000 transactions from the 'txns' table, 109 from txns.stash and while spending ~60s pruning an extra 760 transactions showed up that it didn't touch.

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.