juju-core

juju 1.24 poor performance

Bug #1478232 reported by Ante Karamatić on 2015-07-25

This bug report is a duplicate of: Bug #1478024: Looping config-changed hooks in fresh juju-core 1.24.3 Openstack deployment. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Triaged	High	Unassigned	juju-core 1.25-alpha1
	1.24	Triaged	High	Unassigned	juju-core 1.24.6

Bug Description

Using juju 1.24.3 I'm unable to finish deployment that works fine with 1.22. I've retired deployment at least 3 times in last 24h and had no success. Observed problems:
- juju debug-log stalls for long periods; end result is debug-log printing messages up to 10 minutes old
- juju status takes up to 3-5 minutes to return output
- ran juju add-unit 10 minutes ago; juju has still not requested new node
- 3 units from 190 node deployment are marked as pending, but never requested from MAAS
- forcefully terminating nodes does nothing

In general, whole juju environment seems unusable. I'm not sure how to proceed. Deployment was started at 8:22AM, at 12:03 it's still not done. With 1.22 it took around 2h.

Environment has 192 nodes, out of which 174 are used. This seems like a continuation of bug 1474195. While memory leak is not there any more, the load on juju state server is still rather high (11-14), logs and juju db has consumed ~8GB of disk in those 4h:
root@verifiable-sheet:~# du -hs /var/lib/juju/
3.0G /var/lib/juju/
root@verifiable-sheet:~# du -hs /var/log/juju/
2.3G /var/log/juju/
root@verifiable-sheet:~# du -hs /var/log/syslog
2.7G /var/log/syslog

Deployment is done with juju-deployer (version 0.4.3-0ubuntu1~ubuntu14.04.1~ppa1).

See original description

Tags:

Ante Karamatić (ivoks) on 2015-07-25

description:

updated

Menno Finlay-Smits (menno.smits) on 2015-07-26

Changed in juju-core:
importance:	Undecided → High
assignee:	nobody → Menno Smits (menno.smits)
status:	New → Triaged

Revision history for this message

Menno Finlay-Smits (menno.smits) wrote on 2015-07-27:

I've been doing controlled performance comparisons between 1.22.6 and 1.24.3 using a MAAS environment today and I haven' t been able to find a significant difference. My test deployed 10 machines each with 10 containers on them (with units being added with some in parallellism), and both Juju versions completed in almost exactly the same time. This at least gives me some idea about where the problem /doesn't/ lie.

It likely that there's some aspect of the charms or the additional scale of what you're deploying that's triggering the issue.

Can you please give me more detailed instructions of how to reproduce?

The state server logs would also be very helpful.

Revision history for this message

Menno Finlay-Smits (menno.smits) wrote on 2015-07-27:

I wonder if bug 1478024 could be related to what you're seeing.

Menno Finlay-Smits (menno.smits) on 2015-07-27

Changed in juju-core:
assignee:	Menno Smits (menno.smits) → nobody

Revision history for this message

Ante Karamatić (ivoks) wrote on 2015-07-27:

deployment.yaml Edit (17.9 KiB, text/plain)

Here's the yaml used in this deployment. It has three stages; base, services, scale. By the state of the nodes, it looks like even 'services' stage haven't settled down.

Edward Hope-Morley (hopem) on 2015-08-04

tags:

added: sts

Curtis Hovey (sinzui) on 2015-08-10

Changed in juju-core:
milestone:	none → 1.25.0

Revision history for this message

Menno Finlay-Smits (menno.smits) wrote on 2015-08-13:

It's highly likely that the leadership rework which fixed 1478024 will also fix the performance issues reported here. These fixes went into 1.24.4/5. Please report here if you're still seeing these perf problems when using 1.24.5.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1478024 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

deployment.yaml Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.