[Bug 2088620] Re: [SRU] Deprecated usage of cpu_util
Robie Basak
2088620 at bugs.launchpad.net
Tue Mar 4 23:16:15 UTC 2025
I see a technical analysis of the change that you're wanting to make
under "Impact", but no actual explanation of an issue with an impact to
users.
For example, why is it that watcher's use of the deprecated ceilometer
metric, cpu_util, which reported cpu utilization as a percentage, is a
problem for users? Why would they want to change this behaviour in a
stable release? Generally, whether some behaviour is "deprecated" or not
in a stable Ubuntu release is not relevant, because we commit to
maintaining that behaviour until the release EOLs. So why does it matter
here?
And assuming I'm correct in understanding that you want to change user
behaviour, how will that affect users relying on the previous behaviour,
and why is that OK? Or if that is not possible, then please explain how
it is not possible.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2088620
Title:
[SRU] Deprecated usage of cpu_util
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive antelope series:
New
Status in Ubuntu Cloud Archive bobcat series:
New
Status in Ubuntu Cloud Archive caracal series:
Fix Released
Status in Ubuntu Cloud Archive dalmatian series:
Fix Released
Status in Ubuntu Cloud Archive epoxy series:
Fix Released
Status in Ubuntu Cloud Archive yoga series:
New
Status in Ubuntu Cloud Archive zed series:
Won't Fix
Status in watcher package in Ubuntu:
Fix Released
Status in watcher source package in Focal:
Confirmed
Status in watcher source package in Jammy:
Confirmed
Status in watcher source package in Noble:
Fix Released
Status in watcher source package in Oracular:
Fix Released
Bug description:
[ Impact ]
* The watcher releases targeted by this SRU are using a deprecated
ceilometer metric, cpu_util, which reported cpu utilization as a
percentage. This metric was deprecated in Openstack Rocky in favor of
the Gnocchi rate calculation equivalent [1].
* Upstream Watcher continued to use cpu_util until the commit at [2]
landed on master for 2024.1. This commit correctly performs the cpu
calculation and removes the deprecated metric. The calculation is
summarized in the next bullet point and there is an example
calculation in the original commit
* The gnocchi calculation uses the cumulative cpu time in ns
(reported by the cpu metric), taken as a rate (the difference in
cumulative time over the last two sampling intervals) to find the
total cpu time during the previous sampling period. Dividing the cpu
time in one interval by the duration of the interval multiplied by the
number of vcpus provides the cpu utilization as a percentage:
cpu_usage = [cpu_time / (period * 10^9 * nvcpus)] * 100%. A sample
calculation is provided in the original commit message.
* I cherry-picked to stable/2023.2 [3], but the other branches have
gone unmaintained
[ Test Plan ]
* Deploy openstack yoga on jammy with watcher and gnocchi services
* Launch a server and take note of it's resource id. Then find the
gnocchi cpu metric associated with the instance
* Create a watcher audit based on a goal that previously depended on instance cpu utilization. For example the workload_balance goal [4]
Ex. openstack optimize audit create -t CONTINUOUS -i 60 -g workload_balancing -s workload_balance --auto-trigger
Without the patch instance_cpu_usage appears as None in the audits. With the patch you can observe the correct cpu utilization percentage in the watcher-decision-engine.log
* Wait for at least one sampling period to elapse and check
/var/log/watcher/watcher-decision-engine.log for entries showing
"instance_cpu_usage" - this is the cpu utilization as a percentage.
* To verify the percentage with a manual calculation, run gnocchi
measure show <metric uuid> --aggregation "rate:mean" and perform the
calculation instance_cpu_usage = 100*[<value> / (period * 10^9 *
nvcpus) using the cpu time from the corresponding sampling period
[ What can go wrong ]
* While this is replacing a deprecated methodology and metric and
should lead to improvements, any custom strategies relying on cpu_util
may be affected.
[1] https://docs.openstack.org/releasenotes/ceilometer/rocky.html
[2] https://review.opendev.org/c/openstack/watcher/+/898791
[3] https://review.opendev.org/c/openstack/watcher/+/934181
[4] https://docs.openstack.org/watcher/rocky/strategies/workload_balance.html
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2088620/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list