prometheus takes way too long to join relations with telegraf

Bug #1756964 reported by Jason Hobbs
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Prometheus Charm
Triaged
High
Unassigned

Bug Description

we have 64 telegraf units in our bundle, and two relations between telegraf and prometheus - juju-info and prometheus-client.

juju status:
http://paste.ubuntu.com/p/sCvBSVRXXN/

bundle:
http://paste.ubuntu.com/p/67Qsrvmynd/

It looks like each telegraf unit is joining with prometheus twice - once for each relation, I assume. It doesn't seem like each telegraf should be joining juju-info with prometheus.

Either way, hooks are still firing three and a half hours after deploy, because each hook takes about a minute, there are 64 units, and 4 hooks. This causes our deploys to timeout.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

attached crashdump from run

affects: collectd-charm → prometheus-charm
tags: added: canonical-bootstack
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

I've had what might be a similar issue, but for removing prometheus. It was hanging endlessly on removal, and on the unit I could see it looping on relation-get calls towards telegraf

Jacek Nykis (jacekn)
Changed in prometheus-charm:
importance: Undecided → High
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.