MAAS

Bug #1707971
Comment #16

Comment 16 for bug 1707971

Revision history for this message

Jason Hobbs (jason-hobbs) wrote on 2017-08-21: Re: [Bug 1707971] Re: MAAS becomes unstable after rack controller restart

#16

Ok well I thought limiting to one IP on our machine would have the same
effect. Let me know when there is a build we can test.

On Mon, Aug 21, 2017 at 10:31 AM, Andres Rodriguez <email address hidden>
wrote:

> The reason why you are probably still experiencing the issue is because
> we have not a 2.2 nor 2.3 with this fix yet.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1707971
>
> Title:
> MAAS becomes unstable after rack controller restart
>
> Status in MAAS:
> Fix Committed
> Status in MAAS 2.2 series:
> Fix Committed
>
> Bug description:
> Problem
> =======
> We have an HA setup - 3 region API nodes and 2 rack controllers. When
> we restart a rack controller, the MAAS API becomes
> unresponsive/unstable/slow for a varying period of time. Sometimes it
> never responds to an API request, sometimes the UI shows as disconnected,
> sometimes an API request takes 30+ second to get responded to, other times
> less than a second.
>
> Here's a 'zones read' call that fails once and then succeeds. This is
> done immediately after restarting both rack controllers:
> http://paste.ubuntu.com/25221159/
>
> The amount of time it stays this way varies - we currently have a 5
> minute sleep after restarting maas-rackd before trying to setup
> networks through the API and that isn't always long enough - we
> sometimes get API calls disconnected without a response.
>
> Also, the racks sometimes never show up as fully connected again. They
> show up as 8% connected here:
> http://paste.ubuntu.com/25221156/
>
> The logs are full of questionable stuff, "Successfully configured DNS"
> is repeated over and over:
> 2017-08-01 16:35:28 maasserver.region_controller: [info] Successfully
> configured DNS.
> 2017-08-01 16:35:30 maasserver.region_controller: [info] Successfully
> configured DNS.
> 2017-08-01 16:35:32 maasserver.region_controller: [info] Successfully
> configured DNS.
>
> So are errors like this:
> Failed to register rack controller '4shpr4' into the database.
> Connection will be dropped.
>
> And repeated messages like this:
> Aug 1 16:37:05 infra1 maas.rpc.rackcontrollers: [info] Existing rack
> controller 'infra2' has connected to region 'infra1'.
> Aug 1 16:37:12 infra1 maas.rpc.rackcontrollers: [info] Existing rack
> controller 'infra2' has connected to region 'infra1'.
> Aug 1 16:37:22 infra1 maas.service_monitor: [info] Service 'ntp' has
> been restarted. Its current state is 'on' and 'running'.
> Aug 1 16:37:52 infra1 maas.service_monitor: [info] Service 'ntp' has
> been restarted. Its current state is 'on' and 'running'.
>
> And this:
> 2017-08-01 16:37:39 provisioningserver.rpc.clusterservice: [critical]
> Failed to contact region. (While requesting RPC info at b'http://
> [::ffff:10.245.208.33]/MAAS/rpc/').
>
>
> Expected Behavior
> =================
> - Restarting a rack controller should not affect region controller API
> availability. We should be able to restart rack controllers and immediately
> use the API.
> - Restarted rack controllers should not remain in a 'degraded' 8%
> connected state.
>
> We're using 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1707971/+subscriptions
>

Ok well I thought limiting to one IP on our machine would have the same
effect.  Let me know when there is a build we can test.

On Mon, Aug 21, 2017 at 10:31 AM, Andres Rodriguez <andreserl@ubuntu-pe.org>
wrote:

> The reason why you are probably still experiencing the issue is because
> we have not a 2.2 nor 2.3 with this fix yet.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1707971
>
> Title:
>   MAAS becomes unstable after rack controller restart
>
> Status in MAAS:
>   Fix Committed
> Status in MAAS 2.2 series:
>   Fix Committed
>
> Bug description:
>   Problem
>   =======
>   We have an HA setup - 3 region API nodes and 2 rack controllers.  When
> we restart a rack controller, the MAAS API becomes
> unresponsive/unstable/slow for a varying period of time.  Sometimes it
> never responds to an API request, sometimes the UI shows as disconnected,
> sometimes an API request takes 30+ second to get responded to, other times
> less than a second.
>
>   Here's a 'zones read' call that fails once and then succeeds.  This is
> done immediately after restarting both rack controllers:
>   http://paste.ubuntu.com/25221159/
>
>   The amount of time it stays this way varies - we currently have a 5
>   minute sleep after restarting maas-rackd before trying to setup
>   networks through the API and that isn't always long enough - we
>   sometimes get API calls disconnected without a response.
>
>   Also, the racks sometimes never show up as fully connected again.  They
> show up as 8% connected here:
>   http://paste.ubuntu.com/25221156/
>
>   The logs are full of questionable stuff, "Successfully configured DNS"
> is repeated over and over:
>   2017-08-01 16:35:28 maasserver.region_controller: [info] Successfully
> configured DNS.
>   2017-08-01 16:35:30 maasserver.region_controller: [info] Successfully
> configured DNS.
>   2017-08-01 16:35:32 maasserver.region_controller: [info] Successfully
> configured DNS.
>
>   So are errors like this:
>   Failed to register rack controller '4shpr4' into the database.
> Connection will be dropped.
>
>   And repeated messages like this:
>   Aug  1 16:37:05 infra1 maas.rpc.rackcontrollers: [info] Existing rack
> controller 'infra2' has connected to region 'infra1'.
>   Aug  1 16:37:12 infra1 maas.rpc.rackcontrollers: [info] Existing rack
> controller 'infra2' has connected to region 'infra1'.
>   Aug  1 16:37:22 infra1 maas.service_monitor: [info] Service 'ntp' has
> been restarted. Its current state is 'on' and 'running'.
>   Aug  1 16:37:52 infra1 maas.service_monitor: [info] Service 'ntp' has
> been restarted. Its current state is 'on' and 'running'.
>
>   And this:
>   2017-08-01 16:37:39 provisioningserver.rpc.clusterservice: [critical]
> Failed to contact region. (While requesting RPC info at b'http://
> [::ffff:10.245.208.33]/MAAS/rpc/').
>
>
>   Expected Behavior
>   =================
>   - Restarting a rack controller should not affect region controller API
> availability. We should be able to restart rack controllers and immediately
> use the API.
>   - Restarted rack controllers should not remain in a 'degraded' 8%
> connected state.
>
>   We're using 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1707971/+subscriptions
>