Autopilot Log Analyser

Bug #1655716
Comment #6

Comment 6 for bug 1655716

Revision history for this message

Francis Ginther (fginther) wrote on 2017-01-12: Re: bootstrap fails with various EOF responses from autopilot newton openstack deployment

I did some live debugging and found the EOFs appear to be the result of nova getting timeouts from keystone. One of the keystone units was co-located on the same physical machine as the juju bootstrap unit. We've discovered from lp:1655169 that there is intense memory pressure on the bootstrap unit and it's possible that keystone is slow because of swap thrashing. The bootstrap unit was also co-located with mysql, which also has high memory demands. This isn't likely to help.

One thing I haven't quite figured out yet is why this occurs just after the autopilot cloud is deployed. At that point in time, there isn't enough memory usage to use swap yet (as determined by looking at landscape's memory graphs), so swap thrashing doesn't make sense. A couple of possible causes are the cloud not yet being ready (for example, the keystone units aren't ready to accept requests yet) or another performance issue is impacting this (such has high disk usage).

Will keep trying to collect more data and isolate the problem.