ipmitool "timing" flags are not working as expected causing failure to manage power of baremetal nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Ironic Conductor Charm |
Fix Released
|
Undecided
|
Unassigned | ||
ironic (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
In a focal-ussuri cloud environment where there is some amount of packet loss between the ironic-conductor and the BMC network, I'm experiencing random timeout issues with ipmitool failures.
The root issue I'm having is that using:
ipmitool -R 12 -N 5 <command>
is resulting in ipmitool hanging for 60 seconds (12 commands are sent even though the session is never properly started) and then timing out within the ironic-conductor application, causing "clean failed" state when transitioning a node from 'manage' to 'provide' status.
Ultimately, it appears that ussuri runs this bit of code that determines that ipmitool accepts -R and -N flags and instead of performing retries of ipmitool within the ironic code, it relies on ipmitool to perform all of the retries.
This has been addressed in the mainline code by the addition of an operator configurable option 'use_ipmitool_
https:/
In my environment, I require to re-run ipmitool multiple separate times to avoid failure.
Can we please backport this functionality into focal-ussuri?
https:/
Also, please expose charm configuration to allow operator to set "[ipmi] use_ipmitool_
tags: | added: sts |
Changed in charm-ironic-conductor: | |
milestone: | none → 22.04 |
Changed in charm-ironic-conductor: | |
status: | Fix Committed → Fix Released |
FYI, the commit with the option is available in Victoria+