MAAS deployment failures on server with Redfish
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Incomplete
|
High
|
Unassigned | ||
3.3 |
Triaged
|
High
|
Unassigned |
Bug Description
On an Ampere Altra based server (flavio) in the Server Certification lab, deployments are failing when the node is configured to use Redfish. The following appears in the node's deployment log in the web UI:
Fri, 03 Feb. 2023 15:19:33 Failed to power on node - Power on for the node failed: Failed to complete power action: Redfish request failed with response status code: 400.
Fri, 03 Feb. 2023 15:19:33 Node changed status - From 'Deploying' to 'Failed deployment'
Fri, 03 Feb. 2023 15:19:33 Marking node failed - Power on for the node failed: Failed to complete power action: Redfish request failed with response status code: 400.
Fri, 03 Feb. 2023 15:18:28 Powering on
Fri, 03 Feb. 2023 15:18:27 Node - Started deploying 'flavio'.
Fri, 03 Feb. 2023 15:18:27 Deploying
I'm attaching additional MAAS log files to this bug report.
The deployment fails after the node has powered on but while it's still in POST; it doesn't even get to the point where it PXE-boots for the first time.
Two other nodes on our network also use Redfish and appear to be unaffected. The affected server has an ARM64 CPU and is running the latest firmware. I'm 95% certain that it worked when it was first installed a few months ago. We first noticed this problem on 17 January, 2023. Our current MAAS version is 3.2.6-12016-
As a workaround, we can set the server to use IPMI rather than Redfish.
For some reason, `set_pxe_boot` returned 400, even though we send a valid static JSON for all machines. Looks like vendor-specific issue, although further investigation is required