juju 1.24 poor performance

Bug #1478232 reported by Ante Karamatić
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Triaged
High
Unassigned
1.24
Triaged
High
Unassigned

Bug Description

Using juju 1.24.3 I'm unable to finish deployment that works fine with 1.22. I've retired deployment at least 3 times in last 24h and had no success. Observed problems:
 - juju debug-log stalls for long periods; end result is debug-log printing messages up to 10 minutes old
 - juju status takes up to 3-5 minutes to return output
 - ran juju add-unit 10 minutes ago; juju has still not requested new node
 - 3 units from 190 node deployment are marked as pending, but never requested from MAAS
 - forcefully terminating nodes does nothing

In general, whole juju environment seems unusable. I'm not sure how to proceed. Deployment was started at 8:22AM, at 12:03 it's still not done. With 1.22 it took around 2h.

Environment has 192 nodes, out of which 174 are used. This seems like a continuation of bug 1474195. While memory leak is not there any more, the load on juju state server is still rather high (11-14), logs and juju db has consumed ~8GB of disk in those 4h:
root@verifiable-sheet:~# du -hs /var/lib/juju/
3.0G /var/lib/juju/
root@verifiable-sheet:~# du -hs /var/log/juju/
2.3G /var/log/juju/
root@verifiable-sheet:~# du -hs /var/log/syslog
2.7G /var/log/syslog

Deployment is done with juju-deployer (version 0.4.3-0ubuntu1~ubuntu14.04.1~ppa1).

Tags: sts cpec
Ante Karamatić (ivoks)
description: updated
Changed in juju-core:
importance: Undecided → High
assignee: nobody → Menno Smits (menno.smits)
status: New → Triaged
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

I've been doing controlled performance comparisons between 1.22.6 and 1.24.3 using a MAAS environment today and I haven' t been able to find a significant difference. My test deployed 10 machines each with 10 containers on them (with units being added with some in parallellism), and both Juju versions completed in almost exactly the same time. This at least gives me some idea about where the problem /doesn't/ lie.

It likely that there's some aspect of the charms or the additional scale of what you're deploying that's triggering the issue.

Can you please give me more detailed instructions of how to reproduce?

The state server logs would also be very helpful.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

I wonder if bug 1478024 could be related to what you're seeing.

Changed in juju-core:
assignee: Menno Smits (menno.smits) → nobody
Revision history for this message
Ante Karamatić (ivoks) wrote :

Here's the yaml used in this deployment. It has three stages; base, services, scale. By the state of the nodes, it looks like even 'services' stage haven't settled down.

tags: added: sts
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: none → 1.25.0
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

It's highly likely that the leadership rework which fixed 1478024 will also fix the performance issues reported here. These fixes went into 1.24.4/5. Please report here if you're still seeing these perf problems when using 1.24.5.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.