Commissioning x86_64 node never completes, sitting at grub prompt, pserv py tbs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Raphaël Badin | ||
1.8 |
Fix Released
|
Critical
|
Raphaël Badin | ||
python-tx-tftp |
New
|
Undecided
|
Gavin Panella | ||
python-tx-tftp (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Trusty |
Fix Released
|
Undecided
|
Unassigned | ||
Utopic |
Won't Fix
|
Undecided
|
Unassigned | ||
Vivid |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
When TFTP booting with UEFI, the TFTP server would stack trace when terminating the transfer. This would lead to some UEFI boot issues when using UEFI
[Test Case]
1. Install MAAS
2. Setup UEFI on machine to PXE boot from MAAS
3. UEFI boot machine, it will fail as tftp chrases.
4. With fix, UEFI boot machine, it will succeed as tftp doesn't crash.
[Regression Potential]
Minimal. This has tested and QA and proven to be working as expected.
ubuntu 14.04LTS + MaaS 1.5 on x86_64
Controller:
esxi vm xeon + vmnet3/ixgbe
Nodes:
supermicro twinblades
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
128GB RAM
2@ ige
2@ ixgbe <<< used for PXE booting
Trying to add physical nodes configured for Trusty Tahr amd64. IPMI powerctl cycles the node, tftp's two boot files, then commissioning goes out to lunch:
15:12:11.465976 IP 0.0.0.0.bootpc > 255.255.
15:12:11.468982 IP 172.30.
15:12:11.475270 IP 172.30.255.101.1294 > 172.30.193.38.tftp: 41 RRQ "bootx64.efi" octet tsize 0 blksize 1468
15:12:11.535326 IP 172.30.255.101.1295 > 172.30.193.38.tftp: 33 RRQ "bootx64.efi" octet blksize 1468
15:12:12.024716 IP 172.30.255.101.1296 > 172.30.193.38.tftp: 33 RRQ "/grubx64.efi" octet blksize 512
These tb's coincide with above traffic and node sitting at the grub prompt indefinitely:
2014-05-08 15:12:11-0700 [-] Starting protocol <tftp.bootstrap
2014-05-08 15:12:11-0700 [RemoteOriginRe
Traceback (most recent call last):
File "/usr/lib/
return context.
File "/usr/lib/
return self.currentCon
File "/usr/lib/
return func(*args,**kw)
File "/usr/lib/
why = selectable.doRead()
--- <exception caught here> ---
File "/usr/lib/
self.
File "/usr/lib/
datagram = TFTPDatagramFac
File "/usr/lib/
return datagram_
File "/usr/lib/
raise InvalidErrorcod
tftp.errors.
2014-05-08 15:12:11-0700 [RemoteOriginRe
2014-05-08 15:12:11-0700 [TFTP (UDP)] Datagram received from ('172.30.255.101', 1295): <RRQDatagram(
2014-05-08 15:12:11-0700 [TFTP (UDP)] Datagram received from ('172.30.255.101', 1295): <RRQDatagram(
2014-05-08 15:12:11-0700 [-] RemoteOriginRea
2014-05-08 15:12:11-0700 [-] RemoteOriginRea
2014-05-08 15:12:11-0700 [-] Starting protocol <tftp.bootstrap
2014-05-08 15:12:11-0700 [-] Starting protocol <tftp.bootstrap
2014-05-08 15:12:12-0700 [RemoteOriginRe
2014-05-08 15:12:12-0700 [RemoteOriginRe
2014-05-08 15:12:12-0700 [-] (UDP Port 43143 Closed)
2014-05-08 15:12:12-0700 [-] (UDP Port 43143 Closed)
2014-05-08 15:12:12-0700 [-] Stopping protocol <tftp.bootstrap
2014-05-08 15:12:12-0700 [-] Stopping protocol <tftp.bootstrap
2014-05-08 15:12:12-0700 [TFTP (UDP)] Datagram received from ('172.30.255.101', 1296): <RRQDatagram(
2014-05-08 15:12:12-0700 [TFTP (UDP)] Datagram received from ('172.30.255.101', 1296): <RRQDatagram(
2014-05-08 15:12:12-0700 [-] RemoteOriginRea
2014-05-08 15:12:12-0700 [-] RemoteOriginRea
2014-05-08 15:12:12-0700 [-] Starting protocol <tftp.bootstrap
2014-05-08 15:12:12-0700 [-] Starting protocol <tftp.bootstrap
2014-05-08 15:12:12-0700 [RemoteOriginRe
2014-05-08 15:12:12-0700 [RemoteOriginRe
2014-05-08 15:12:12-0700 [RemoteOriginRe
2014-05-08 15:12:12-0700 [RemoteOriginRe
2014-05-08 15:12:12-0700 [RemoteOriginRe
2014-05-08 15:12:12-0700 [RemoteOriginRe
2014-05-08 15:12:12-0700 [-] (UDP Port 56400 Closed)
2014-05-08 15:12:12-0700 [-] (UDP Port 56400 Closed)
2014-05-08 15:12:12-0700 [-] Stopping protocol <tftp.bootstrap
2014-05-08 15:12:12-0700 [-] Stopping protocol <tftp.bootstrap
2014-05-08 15:12:13-0700 [-] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/
self.config, oldstdout, oldstderr, self.profiler, reactor)
File "/usr/lib/
reactor.run()
File "/usr/lib/
self.
File "/usr/lib/
self.
--- <exception caught here> ---
File "/usr/lib/
call.
File "/usr/lib/
self.
File "/usr/lib/
return self.socket.
exceptions.
2014-05-08 15:12:13-0700 [-] Logged OOPS id OOPS-4ad4c1419556eb88cc72311fd54f737b: AttributeError: 'Port' object has no attribute 'socket'
Nodes and controller are on the same untagged subnet but there is an lldp'd link between the bladeserver's onboard xgb switches and the controller's connected xgb Arista.
root@pre-
Desired=
| Status=
|/ Err?=(none)
||/ Name Version Architecture Description
+++-===
ii maas 1.5+bzr2252-
ii maas-cli 1.5+bzr2252-
ii maas-cluster-
ii maas-common 1.5+bzr2252-
ii maas-dhcp 1.5+bzr2252-
ii maas-dns 1.5+bzr2252-
ii maas-region-
ii maas-region-
ii python-django-maas 1.5+bzr2252-
ii python-maas-client 1.5+bzr2252-
ii python-
Repro:
This is a pretty standard initial configuration afaict, following the provided instructions. I notice there are no grub.cfg-* anywhere, only the grub.cfg template. Could that be why none of the nodes are doing anything once they're in the grub shell?
root@pre-
# MAAS GRUB2 pre-loader configuration file
# Load based on MAC address first.
configfile (pxe)/grub/
# Failed to load based on MAC address.
# Load amd64 by default, UEFI only supported by 64-bit
configfile (pxe)/grub/
root@pre-
total 4
-rw-r--r-- 1 root root 270 May 6 18:23 grub.cfg
root@pre-
/boot/grub/grub.cfg
/usr/share/
/var/lib/
Controller VM is connected to unrouted internal private network and external lab, which is not used by MaaS. Nodes are only connected to the private n/w. Controller is managing tftp, dhcp and dns and ip helper pointed to its private IP.
Nodes are configured for 'Default Ubuntu Release' Trusty Tahr. Boot images:
4 trusty amd64 generic commissioning release May 6, 2014, 6:23 p.m.
7 trusty amd64 generic install release May 6, 2014, 6:23 p.m.
3 trusty amd64 generic xinstall release May 6, 2014, 6:23 p.m.
5 trusty i386 generic commissioning release May 6, 2014, 6:23 p.m.
12 trusty i386 generic install release May 6, 2014, 6:23 p.m.
9 trusty i386 generic xinstall release May 6, 2014, 6:23 p.m.
6 precise amd64 generic commissioning release May 6, 2014, 6:23 p.m.
11 precise amd64 generic install release May 6, 2014, 6:23 p.m.
10 precise amd64 generic xinstall release May 6, 2014, 6:23 p.m.
2 precise i386 generic commissioning release May 6, 2014, 6:23 p.m.
8 precise i386 generic install release May 6, 2014, 6:23 p.m.
1 precise i386 generic xinstall release May 6, 2014, 6:23 p.m.
Related branches
- Gavin Panella (community): Approve
-
Diff: 110 lines (+49/-2)4 files modifiedsrc/provisioningserver/monkey.py (+13/-0)
src/provisioningserver/plugin.py (+5/-1)
src/provisioningserver/tests/test_monkey.py (+22/-1)
src/provisioningserver/tests/test_plugin.py (+9/-0)
- Raphaël Badin (community): Approve
-
Diff: 110 lines (+49/-2)4 files modifiedsrc/provisioningserver/monkey.py (+13/-0)
src/provisioningserver/plugin.py (+5/-1)
src/provisioningserver/tests/test_monkey.py (+22/-1)
src/provisioningserver/tests/test_plugin.py (+9/-0)
Changed in maas: | |
status: | Triaged → Incomplete |
assignee: | Andres Rodriguez (andreserl) → nobody |
Changed in maas: | |
status: | Expired → Confirmed |
Changed in maas: | |
milestone: | none → 1.7.2 |
tags: | removed: patch |
Changed in maas: | |
assignee: | Gavin Panella (allenap) → Andres Rodriguez (andreserl) |
Changed in maas: | |
milestone: | 1.7.2 → 1.7.3 |
description: | updated |
no longer affects: | maas/1.7 |
Looks related to the recent UEFI work, passing over to Andres.