Ok well I thought limiting to one IP on our machine would have the same
effect. Let me know when there is a build we can test.
On Mon, Aug 21, 2017 at 10:31 AM, Andres Rodriguez <email address hidden>
wrote:
> The reason why you are probably still experiencing the issue is because
> we have not a 2.2 nor 2.3 with this fix yet.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1707971
>
> Title:
> MAAS becomes unstable after rack controller restart
>
> Status in MAAS:
> Fix Committed
> Status in MAAS 2.2 series:
> Fix Committed
>
> Bug description:
> Problem
> =======
> We have an HA setup - 3 region API nodes and 2 rack controllers. When
> we restart a rack controller, the MAAS API becomes
> unresponsive/unstable/slow for a varying period of time. Sometimes it
> never responds to an API request, sometimes the UI shows as disconnected,
> sometimes an API request takes 30+ second to get responded to, other times
> less than a second.
>
> Here's a 'zones read' call that fails once and then succeeds. This is
> done immediately after restarting both rack controllers:
> http://paste.ubuntu.com/25221159/
>
> The amount of time it stays this way varies - we currently have a 5
> minute sleep after restarting maas-rackd before trying to setup
> networks through the API and that isn't always long enough - we
> sometimes get API calls disconnected without a response.
>
> Also, the racks sometimes never show up as fully connected again. They
> show up as 8% connected here:
> http://paste.ubuntu.com/25221156/
>
> The logs are full of questionable stuff, "Successfully configured DNS"
> is repeated over and over:
> 2017-08-01 16:35:28 maasserver.region_controller: [info] Successfully
> configured DNS.
> 2017-08-01 16:35:30 maasserver.region_controller: [info] Successfully
> configured DNS.
> 2017-08-01 16:35:32 maasserver.region_controller: [info] Successfully
> configured DNS.
>
> So are errors like this:
> Failed to register rack controller '4shpr4' into the database.
> Connection will be dropped.
>
> And repeated messages like this:
> Aug 1 16:37:05 infra1 maas.rpc.rackcontrollers: [info] Existing rack
> controller 'infra2' has connected to region 'infra1'.
> Aug 1 16:37:12 infra1 maas.rpc.rackcontrollers: [info] Existing rack
> controller 'infra2' has connected to region 'infra1'.
> Aug 1 16:37:22 infra1 maas.service_monitor: [info] Service 'ntp' has
> been restarted. Its current state is 'on' and 'running'.
> Aug 1 16:37:52 infra1 maas.service_monitor: [info] Service 'ntp' has
> been restarted. Its current state is 'on' and 'running'.
>
> And this:
> 2017-08-01 16:37:39 provisioningserver.rpc.clusterservice: [critical]
> Failed to contact region. (While requesting RPC info at b'http://
> [::ffff:10.245.208.33]/MAAS/rpc/').
>
>
> Expected Behavior
> =================
> - Restarting a rack controller should not affect region controller API
> availability. We should be able to restart rack controllers and immediately
> use the API.
> - Restarted rack controllers should not remain in a 'degraded' 8%
> connected state.
>
> We're using 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1707971/+subscriptions
>
Ok well I thought limiting to one IP on our machine would have the same
effect. Let me know when there is a build we can test.
On Mon, Aug 21, 2017 at 10:31 AM, Andres Rodriguez <email address hidden>
wrote:
> The reason why you are probably still experiencing the issue is because /bugs.launchpad .net/bugs/ 1707971 unstable/ slow for a varying period of time. Sometimes it paste.ubuntu. com/25221159/ paste.ubuntu. com/25221156/ region_ controller: [info] Successfully region_ controller: [info] Successfully region_ controller: [info] Successfully rackcontrollers : [info] Existing rack rackcontrollers : [info] Existing rack monitor: [info] Service 'ntp' has monitor: [info] Service 'ntp' has ver.rpc. clusterservice: [critical] 10.245. 208.33] /MAAS/rpc/ '). 0ubuntu1~ 16.04.1) /bugs.launchpad .net/maas/ +bug/1707971/ +subscriptions
> we have not a 2.2 nor 2.3 with this fix yet.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> MAAS becomes unstable after rack controller restart
>
> Status in MAAS:
> Fix Committed
> Status in MAAS 2.2 series:
> Fix Committed
>
> Bug description:
> Problem
> =======
> We have an HA setup - 3 region API nodes and 2 rack controllers. When
> we restart a rack controller, the MAAS API becomes
> unresponsive/
> never responds to an API request, sometimes the UI shows as disconnected,
> sometimes an API request takes 30+ second to get responded to, other times
> less than a second.
>
> Here's a 'zones read' call that fails once and then succeeds. This is
> done immediately after restarting both rack controllers:
> http://
>
> The amount of time it stays this way varies - we currently have a 5
> minute sleep after restarting maas-rackd before trying to setup
> networks through the API and that isn't always long enough - we
> sometimes get API calls disconnected without a response.
>
> Also, the racks sometimes never show up as fully connected again. They
> show up as 8% connected here:
> http://
>
> The logs are full of questionable stuff, "Successfully configured DNS"
> is repeated over and over:
> 2017-08-01 16:35:28 maasserver.
> configured DNS.
> 2017-08-01 16:35:30 maasserver.
> configured DNS.
> 2017-08-01 16:35:32 maasserver.
> configured DNS.
>
> So are errors like this:
> Failed to register rack controller '4shpr4' into the database.
> Connection will be dropped.
>
> And repeated messages like this:
> Aug 1 16:37:05 infra1 maas.rpc.
> controller 'infra2' has connected to region 'infra1'.
> Aug 1 16:37:12 infra1 maas.rpc.
> controller 'infra2' has connected to region 'infra1'.
> Aug 1 16:37:22 infra1 maas.service_
> been restarted. Its current state is 'on' and 'running'.
> Aug 1 16:37:52 infra1 maas.service_
> been restarted. Its current state is 'on' and 'running'.
>
> And this:
> 2017-08-01 16:37:39 provisioningser
> Failed to contact region. (While requesting RPC info at b'http://
> [::ffff:
>
>
> Expected Behavior
> =================
> - Restarting a rack controller should not affect region controller API
> availability. We should be able to restart rack controllers and immediately
> use the API.
> - Restarted rack controllers should not remain in a 'degraded' 8%
> connected state.
>
> We're using 2.2.2 (6099-g8751f91-
>
> To manage notifications about this bug go to:
> https:/
>