I seem to have at least one system with a BMC that handles ('/usr/sbin/ipmipower', '-W', 'opensesspriv', '--driver-type', 'LAN_2_0', '-h', '1.2.3.4', '-u', 'maas', '-p', 'xxxxxxxxx', '--cycle', '--on-if-off') poorly. Based on it's name, I'd expect that command to have no effect if the power is already on, and power it on if it's off. This node doesn't handle it that way.
What happens is that if the power is on when this command is issued, it cycles off for a bit, then back on. The period of this cycle is just beyond the max delay between the command and the --stat check in maas.drivers.power (it usually takes this node about 15 seconds for power to settle back into the "on" state; the max delay is 12 seconds).
So MAAS issues --cycle --on-if-off, waits 4 seconds, then issues --stat. Sees power is off.
Then the same thing, but with an 8 second wait, then with a 12 second wait. Each --cycle pushes out the time at which the node will be 'on' by about 15 seconds. This causes MAAS to fail to see the power ever transition to on, and conclude it failed to power on the node.
It seems like this is a defect (or perhaps simply a missing feature?) in this server's BMC firmware. However, I suspect this may not be uncommon in the wild, and perhaps MAAS could be more resilient in the face of such behavior.
One simple workaround might be to increase the final delay time in IPMIPowerDriver.wait_times to be fairly large; for example wait_times = (4, 8, 24). There are probably more elegant ways to deal with this, though, perhaps with more frequent --stat checks between --cycle commands, so the full wait isn't taken if the BMC responds quickly.
n
--
I can attach some info about this specific BMC if that's helpful --- what's the best way to obtain this? The commissioning output doesn't seem to have anything that looks relevant.
nturner@maas1:/usr/lib/python3/dist-packages/provisioningserver/drivers/power$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==================================-============-=================================================
ii maas 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all MAAS server common files
ii maas-dhcp 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all MAAS DHCP server
ii maas-dns 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all MAAS DNS server
ii maas-proxy 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.0.0~rc3+bzr5180-0ubuntu2~16.04.1 all MAAS server provisioning libraries (Python 3)
nturner@maas1:/usr/lib/python3/dist-packages/provisioningserver/drivers/power$
In the short term, would it be possible to increase the final IPMIPowerDriver .wait_times value?
Something like (4, 8, 16) isn't a huge change and would probably be good enough for many systems, and has a nice exponential shape to it...