controller leaking mongodb connections

Bug #1581069 reported by Martin Hilton
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Tim Penhey

Bug Description

Version: jujud [2.0-beta6.1 gc go1.6]

We have a controller being used by a number of users to create models. After some time new models could no longer be created. The error returned was "could not create state for new model: cannot connect to mongodb: no reachable servers" Investigation on the controller machine shows a large number of open connections from jujud to mongodb. (see https://pastebin.canonical.com/156345/)

Tags: canonical-is
Revision history for this message
Roger Peppe (rogpeppe) wrote :

Further analysis of the log shows that there are 449 connections to MongoDB,
64 connections to the API server of which 27 are local to the controller node, giving an average of 7 MongoDB sockets for each API connection.

Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.0-beta7
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Please provide logs for the controller machines and the MongoDB logs in /var/log/syslog on the controller machines.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta7 → 2.0-beta8
Revision history for this message
Martin Hilton (martin-hilton) wrote :
Revision history for this message
Martin Hilton (martin-hilton) wrote :
Revision history for this message
Roger Peppe (rogpeppe) wrote :

I suspect that the underlying reason here might be that the code
calls mgo.Session.Copy every time it does anything, which means
that every concurrent operation will use its own session.

That is, this issue may not be an actual leak at all - just
an excessive use of mongo sessions.

There's an associated issue with using Copy all the time
like that too, which is that consistency between sequential
operations is not guaranteed, as the copied session may
talk to a different server.

Changed in juju-core:
milestone: 2.0-beta8 → 2.0-beta9
William Reade (fwereade)
Changed in juju-core:
assignee: nobody → William Reade (fwereade)
Revision history for this message
William Reade (fwereade) wrote :

I have a potential mitigation proposed at http://reviews.vapour.ws/r/4981/ -- but I'd be surprised if it made any serious difference.

I'm looking into what it'd take to associate sessions with api activity and/or the various long-running internal tasks; I have nothing very concrete yet; but we've actually built out quite a lot of the necessary infrastructure already, and the most critical step -- extracting infrastructure tasks from *State instances -- is evidently *large*: but relatively comprehensible, and I think not high-risk.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta9 → 2.0-beta10
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta10 → 2.0-beta11
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta11 → 2.0-beta12
Changed in juju-core:
milestone: 2.0-beta12 → 2.0-beta13
Changed in juju-core:
milestone: 2.0-beta13 → 2.0-beta12
status: Triaged → Fix Released
Revision history for this message
Tim Penhey (thumper) wrote :

We have landed something that we believe will have either fixed or improved the situation.

Can we please retest and let us know?

Changed in juju-core:
status: Fix Released → Incomplete
Revision history for this message
Cheryl Jennings (cherylj) wrote :

William's patch landed in beta9. Can you retest with the latest beta?

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta12 → 2.0-beta13
Changed in juju-core:
milestone: 2.0-beta13 → none
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Roger, @Martin,

Could you please re-test with newer beta - currently, 14?
We believe this to have been fixed couple of betas back :D

affects: juju-core → juju
Changed in juju:
assignee: William Reade (fwereade) → nobody
Junien Fridrick (axino)
Changed in juju:
status: Incomplete → Confirmed
Revision history for this message
Junien Fridrick (axino) wrote :

Hi,

I believe we have hit this bug. We have a shared controller running juju 2.1.1, running around 40 models. The number of connections to mongodb was above 4000 for a while, it's now back to nearly 3000 :

juju:PRIMARY> db.serverStatus()["connections"]
{ "current" : 2929, "available" : 49071, "totalCreated" : NumberLong(6974) }

There are only a dozen or so queries in progress at any time (as reported by db.currentOp() ), so I believe these connections are unnecessarily wasting resources.

Junien Fridrick (axino)
tags: added: canonical-is
Changed in juju:
status: Confirmed → Triaged
assignee: nobody → Tim Penhey (thumper)
milestone: none → 2.2.0
Revision history for this message
Tim Penhey (thumper) wrote :

Found and fixed the underlying last (that we know of) leak
  https://github.com/juju/juju/pull/7134

Changed in juju:
status: Triaged → In Progress
Tim Penhey (thumper)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
milestone: 2.2-rc1 → 2.2-beta1
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.