presence collection grows without bound
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
John A Meinel | ||
juju-core |
Won't Fix
|
Low
|
Unassigned |
Bug Description
presence.pings and presence.beings both grow without cleaning up old data.
beings maps from a Entity into a unique sequence id for a live Pinger and pings tracks the actual "what sequence was active at what time".
For "liveness" checking we only ever look at "slot" and "slot-period", where our period is 30s. (was it alive in the previous slot, or the current slot).
Certainly we don't need to keep the liveness ping from 20 days ago, perhaps we could expire them after 1 day? (we don't seem to *need* the data from more than 1 minute ago.)
For "beings" we only actually need to keep the set of active Pingers, and there *should* only be one Pinger for a given entity. Potentially if there was some confusion and a net split so the agent reconnected before we noticed the disconnect we could have a couple. Even so, we don't need to keep 500 old possible sequence numbers for a given Entity.
Again, for "liveness" we only trust the latest sequence, we just allow the old ones because we don't ever want 2 pingers on the same Sequence (overflow would cause corruption).
However it seems straightforward to just limit the maximum number of sequence numbers that we track for a given entity, and if we get a Ping request for a sequence that we don't know about it, we could just ignore it. (log a warning?)
Changed in juju-core: | |
status: | Triaged → Won't Fix |
Changed in juju-core: | |
status: | Won't Fix → New |
importance: | Low → High |
tags: | added: canonical-is |
Changed in juju: | |
status: | Triaged → In Progress |
assignee: | nobody → Menno Smits (menno.smits) |
Changed in juju: | |
assignee: | Menno Smits (menno.smits) → John A Meinel (jameinel) |
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
Hi,
Not purging the presence collections leads to mongodb performance issues that I believe are one of the causes of the high CPU usage by jujud and mongod, leading to overall performance issues.
We have a juju 2.1.1 controller running about 40 models, and a lot of time is spent reading the presence.beings collections : https:/ /paste. ubuntu. com/24145949/
Some queries are returning over 20k items, and over 3MB of data ! https:/ /paste. ubuntu. com/24145956/
The distribution by of items per model is the following : https:/ /paste. ubuntu. com/24145961/
I believe fixing this will make juju2 more stable (and I don't really understand why this was triaged as "Won't fix").
Also, the presence.beings collection doesn't have any index by default (on our database, we have added one as recommended by axw).
Thanks !