Master Cluster fails to connect after importing multiple images and multiple subarchs in 1.7 and 1.8
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Blake Rouse | ||
1.8 |
Fix Released
|
Critical
|
Blake Rouse |
Bug Description
Maas Version: 1.7.5+bzr3369-
Boot Images: http://
Problem:
Last night I was monitoring our maas server. And was notified that the master cluster was disconnected. After looking through the logs, the following appears to scroll over and over after restarting the maas-clusterd service
2015-07-08 12:48:44-0400 [ClusterClient,
Traceback (most recent call last):
File "/usr/lib/
self.
File "/usr/lib/
deferred.
File "/usr/lib/
callbackKe
File "/usr/lib/
self.
--- <exception caught here> ---
File "/usr/lib/
current.result = callback(
File "/usr/lib/
aBox.
File "/usr/lib/
proto.
File "/usr/lib/
self.
File "/usr/lib/
raise TooLong(False, True, v, k)
twisted.
2015-07-08 12:48:44-0400 [ClusterClient,
==> /var/log/
INFO 2015-07-08 12:48:44,186 twisted RegionServer connection lost (HOST:IPv4Addre
ERROR 2015-07-08 12:48:44,187 django.request Internal Server Error: /MAAS/clusters/
Traceback (most recent call last):
File "/usr/lib/
response = response.render()
File "/usr/lib/
self.content = self.rendered_
File "/usr/lib/
content = template.
File "/usr/lib/
return self._render(
File "/usr/lib/
return self.nodelist.
File "/usr/lib/
bit = self.render_
File "/usr/lib/
return node.render(
File "/usr/lib/
return compiled_
File "/usr/lib/
return self.nodelist.
File "/usr/lib/
bit = self.render_
File "/usr/lib/
return node.render(
File "/usr/lib/
result = block.nodelist.
File "/usr/lib/
bit = self.render_
File "/usr/lib/
return node.render(
File "/usr/lib/
nodelist.
File "/usr/lib/
return self.render_
File "/usr/lib/
output = template.
File "/usr/lib/
return self._render(
File "/usr/lib/
return self.nodelist.
File "/usr/lib/
bit = self.render_
File "/usr/lib/
return node.render(
File "/usr/lib/
six.
File "/usr/lib/
obj = self.var.
File "/usr/lib/
value = self._resolve_
File "/usr/lib/
current = current()
File "/usr/lib/
images = get_boot_
File "/usr/lib/
return func(*args, **kwargs)
File "/usr/lib/
return call.wait(
File "/usr/lib/
result.
File "<string>", line 2, in raiseException
ConnectionDone: Connection was closed cleanly.
ERROR 2015-07-08 12:48:46,083 maasserver Unable to get RPC connection for cluster 'Cluster master' (87685582-
ERROR 2015-07-08 12:48:46,084 maasserver Unable to get RPC connection for cluster 'Cluster master' (87685582-
ERROR 2015-07-08 12:48:46,086 maasserver Unable to get RPC connection for cluster 'Cluster master' (87685582-
ERROR 2015-07-08 12:48:46,089 maasserver Unable to get RPC connection for cluster 'Cluster master' (87685582-
The master cluster has been operating fine for the last few weeks. We have not had any issues with it. This appeared out of the blue. Following the errors, which appear to be related to images. I began to remove the Wiley images and reimport. After doing this, the master cluster reconnected and was synced as expected.
I cleared the contents of /var/lib/
For now I removed the 15.10 images and filing this bug. Hoping to get to the bottom of the problem.
Related branches
- Blake Rouse (community): Approve
-
Diff: 20 lines (+2/-1)1 file modifiedsrc/provisioningserver/rpc/cluster.py (+2/-1)
- Blake Rouse (community): Approve
-
Diff: 20 lines (+2/-1)1 file modifiedsrc/provisioningserver/rpc/cluster.py (+2/-1)
- Blake Rouse (community): Approve
-
Diff: 20 lines (+2/-1)1 file modifiedsrc/provisioningserver/rpc/cluster.py (+2/-1)
- Gavin Panella (community): Approve
-
Diff: 442 lines (+222/-36)8 files modifiedsrc/maasserver/bootresources.py (+13/-2)
src/maasserver/clusterrpc/boot_images.py (+25/-5)
src/maasserver/clusterrpc/tests/test_boot_images.py (+75/-1)
src/maasserver/tests/test_bootresources.py (+65/-23)
src/provisioningserver/rpc/cluster.py (+24/-0)
src/provisioningserver/rpc/clusterservice.py (+9/-0)
src/provisioningserver/rpc/tests/test_clusterservice.py (+9/-4)
src/provisioningserver/rpc/tests/test_docs.py (+2/-1)
- Blake Rouse (community): Approve
-
Diff: 373 lines (+165/-44)8 files modifiedsrc/maasserver/api/tests/test_events.py (+2/-2)
src/maasserver/clusterrpc/boot_images.py (+43/-35)
src/maasserver/clusterrpc/tests/test_boot_images.py (+75/-1)
src/maasserver/utils/version.py (+1/-1)
src/provisioningserver/rpc/cluster.py (+24/-0)
src/provisioningserver/rpc/clusterservice.py (+9/-0)
src/provisioningserver/rpc/tests/test_clusterservice.py (+9/-4)
src/provisioningserver/rpc/tests/test_docs.py (+2/-1)
- Blake Rouse (community): Approve
-
Diff: 615 lines (+162/-172)7 files modifiedsrc/maasserver/clusterrpc/boot_images.py (+43/-35)
src/maasserver/clusterrpc/tests/test_boot_images.py (+75/-1)
src/maasserver/utils/version.py (+0/-131)
src/provisioningserver/rpc/cluster.py (+24/-0)
src/provisioningserver/rpc/clusterservice.py (+9/-0)
src/provisioningserver/rpc/tests/test_clusterservice.py (+9/-4)
src/provisioningserver/rpc/tests/test_docs.py (+2/-1)
tags: | added: hs-arm64 |
tags: | removed: hs-arm64 |
Changed in maas: | |
milestone: | none → 1.9.0 |
Changed in maas: | |
status: | Triaged → Fix Committed |
Changed in maas: | |
assignee: | nobody → Blake Rouse (blake-rouse) |
tags: | added: amp |
no longer affects: | maas/1.7 |
Changed in maas: | |
status: | Fix Committed → Fix Released |
Since updating to 1.7.5 of MAAS, version 1.8.0+bzr4001- 0ubuntu2~ trusty1 has hit the stable ppa. So, i'm still a little hesitant to do a somewhat massive upgrade since this maas controller manages much of our lab hardware.