[Bug 1943765] Re: ipmitool "timing" flags are not working as expected causing failure to manage power of baremetal nodes
OpenStack Infra
1943765 at bugs.launchpad.net
Wed Feb 23 19:49:59 UTC 2022
Reviewed: https://review.opendev.org/c/openstack/charm-ironic-conductor/+/810610
Committed: https://opendev.org/openstack/charm-ironic-conductor/commit/73a5b90d4026b5acf2cefe1f1057d078c8e923e4
Submitter: "Zuul (22348)"
Branch: master
commit 73a5b90d4026b5acf2cefe1f1057d078c8e923e4
Author: Hemanth Nakkina <hemanth.nakkina at canonical.com>
Date: Thu Sep 23 16:49:14 2021 +0530
Add support for new option use-ipmitool-retries
Add new configuration option use-ipmitool-retries to the charm.
Closes-Bug: #1943765
Change-Id: I2d11198d1955f3b96d27163683ac0947639d2f74
** Changed in: charm-ironic-conductor
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ironic in Ubuntu.
https://bugs.launchpad.net/bugs/1943765
Title:
ipmitool "timing" flags are not working as expected causing failure to
manage power of baremetal nodes
Status in OpenStack Ironic Conductor Charm:
Fix Committed
Status in ironic package in Ubuntu:
New
Bug description:
In a focal-ussuri cloud environment where there is some amount of
packet loss between the ironic-conductor and the BMC network, I'm
experiencing random timeout issues with ipmitool failures.
The root issue I'm having is that using:
ipmitool -R 12 -N 5 <command>
is resulting in ipmitool hanging for 60 seconds (12 commands are sent
even though the session is never properly started) and then timing out
within the ironic-conductor application, causing "clean failed" state
when transitioning a node from 'manage' to 'provide' status.
Ultimately, it appears that ussuri runs this bit of code that
determines that ipmitool accepts -R and -N flags and instead of
performing retries of ipmitool within the ironic code, it relies on
ipmitool to perform all of the retries.
https://opendev.org/openstack/ironic/src/branch/stable/ussuri/ironic/drivers/modules/ipmitool.py#L538-L546
This has been addressed in the mainline code by the addition of an
operator configurable option 'use_ipmitool_retries' to let ipmitool
perform retries via -R flag, or fall back to letting ironic execute
ipmitool multiple separate times.
https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/ipmitool.py#L494
In my environment, I require to re-run ipmitool multiple separate
times to avoid failure.
Can we please backport this functionality into focal-ussuri?
https://opendev.org/openstack/ironic/commit/1de3db3b16f3e0475e506e540ca5d5ed6edb4cbf
Also, please expose charm configuration to allow operator to set
"[ipmi] use_ipmitool_retries" = False.
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ironic-conductor/+bug/1943765/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list