[Bug 2088620] Re: [SRU] Deprecated usage of cpu_util

Bryan Fraschetti 2088620 at bugs.launchpad.net
Tue May 20 20:19:36 UTC 2025


** Description changed:

  [ Impact ]
  
    * The watcher releases targeted by this SRU are using a deprecated
- ceilometer metric, cpu_util, which reported cpu utilization as a
- percentage. This metric was deprecated in Openstack Rocky in favor of
- the Gnocchi rate calculation equivalent [1].
+ ceilometer metric, cpu_util, which previously reported cpu utilization
+ as a percentage. This metric was deprecated in Openstack Rocky in favor
+ of "the gnocchi rate calculation equivalent" [1] - essentially meaning
+ that the cpu utilization value should be obtained by performing a
+ calculation with gnocchi's rates. The ceilometer metric cpu_util was
+ then fully removed in Victoria.
  
    * Upstream Watcher continued to use cpu_util until the commit at [2]
- landed on master for 2024.1. This commit correctly performs the cpu
- calculation and removes the deprecated metric. The calculation is
- summarized in the next bullet point and there is an example calculation
- in the original commit
+ landed on master for 2024.1. Since the ceilometer no longer has a
+ cpu_util metric, polling this metric returns "None". What this means is
+ that all Watcher strategies, particularly those relating to workload
+ balancing and migration of VMs to under-utilized hosts, which rely on
+ cpu_util are non-functional from Victoria, when the metric was removed,
+ until Caracal.
  
-   * The gnocchi calculation uses the cumulative cpu time in ns (reported
- by the cpu metric), taken as a rate (the difference in cumulative time
- over the last two sampling intervals) to find the total cpu time during
- the previous sampling period. Dividing the cpu time in one interval by
- the duration of the interval multiplied by the number of vcpus provides
- the cpu utilization as a percentage: cpu_usage = [cpu_time / (period *
- 10^9 * nvcpus)] * 100%. A sample calculation is provided in the original
- commit message.
+   * This commit correctly performs the cpu calculation as intended using
+ gnocchi. The calculation is summarized in the next bullet point and
+ there is an example calculation in the original commit
+ 
+   * Gnocchi uses the cumulative cpu time in ns (reported by the
+ ceilometer metric, "cpu") and consumes it as a rate (essentially it
+ computes the difference in cumulative cpu time over the last two
+ sampling intervals) to find the total cpu time during the previous
+ sampling period. Dividing the cpu time in one interval by the duration
+ of the interval multiplied by the number of vcpus provides the cpu
+ utilization as a percentage: cpu_usage = [cpu_time / (period * 10^9 *
+ nvcpus)] * 100%. A sample calculation is provided in the original commit
+ message.
  
    * I cherry-picked to stable/2023.2 [3], but the other branches have
  gone unmaintained
  
  [ Test Plan ]
  
    * Deploy openstack yoga on jammy with watcher and gnocchi services
  
    * Launch a server and take note of it's resource id. Then find the
  gnocchi cpu metric associated with the instance
  
-   * Create a watcher audit based on a goal that previously depended on instance cpu utilization. For example the workload_balance goal [4]
+   * Create a watcher audit based on a goal that previously depended on instance cpu utilization (from Watcher's perspective this is called instance_cpu_usage). For example the workload_balance goal [4] depends on instance_cpu_usage
      Ex. openstack optimize audit create -t CONTINUOUS -i 60 -g workload_balancing -s workload_balance --auto-trigger
-     Without the patch instance_cpu_usage appears as None in the audits. With the patch you can observe the correct cpu utilization percentage in the watcher-decision-engine.log
+ 
+   * Without the patch, the workload_balance strategy does not work. The
+ audit will be created, but it cannot provide any meaningful action plan
+ since instance_cpu_usage is None in the audits. With the patch Watcher
+ obtains the correct cpu utilization percentage and the strategies work
+ as expected.
  
    * Wait for at least one sampling period to elapse and check
  /var/log/watcher/watcher-decision-engine.log for entries showing
  "instance_cpu_usage" - this is the cpu utilization as a percentage.
  
    * To verify the percentage with a manual calculation, run gnocchi
  measure show <metric uuid> --aggregation "rate:mean" and perform the
  calculation instance_cpu_usage = 100*[<value> / (period * 10^9 * nvcpus)
  using the cpu time from the corresponding sampling period
  
  [ What can go wrong ]
  
-   * While this is replacing a deprecated methodology and metric and
- should lead to improvements, any custom strategies relying on cpu_util
- may be affected.
+   * While the patch restores functionality by calculating cpu
+ utilization using gnocchi's rate metric, if gnocchi is misconfigured or
+ the relevant "cpu" metric is missing, the new calculation may not work
+ as anticipated
  
  [1] https://docs.openstack.org/releasenotes/ceilometer/rocky.html
  [2] https://review.opendev.org/c/openstack/watcher/+/898791
  [3] https://review.opendev.org/c/openstack/watcher/+/934181
- [4] https://docs.openstack.org/watcher/rocky/strategies/workload_balance.html
+ [4] https://docs.openstack.org/watcher/2024.1/strategies/workload_balance.html

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2088620

Title:
  [SRU] Deprecated usage of cpu_util

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive antelope series:
  Fix Committed
Status in Ubuntu Cloud Archive bobcat series:
  Fix Committed
Status in Ubuntu Cloud Archive caracal series:
  Fix Released
Status in Ubuntu Cloud Archive dalmatian series:
  Fix Released
Status in Ubuntu Cloud Archive epoxy series:
  Fix Released
Status in Ubuntu Cloud Archive yoga series:
  New
Status in Ubuntu Cloud Archive zed series:
  Won't Fix
Status in watcher package in Ubuntu:
  Fix Released
Status in watcher source package in Focal:
  Confirmed
Status in watcher source package in Jammy:
  Confirmed
Status in watcher source package in Noble:
  Fix Released
Status in watcher source package in Oracular:
  Fix Released

Bug description:
  [ Impact ]

    * The watcher releases targeted by this SRU are using a deprecated
  ceilometer metric, cpu_util, which previously reported cpu utilization
  as a percentage. This metric was deprecated in Openstack Rocky in
  favor of "the gnocchi rate calculation equivalent" [1] - essentially
  meaning that the cpu utilization value should be obtained by
  performing a calculation with gnocchi's rates. The ceilometer metric
  cpu_util was then fully removed in Victoria.

    * Upstream Watcher continued to use cpu_util until the commit at [2]
  landed on master for 2024.1. Since the ceilometer no longer has a
  cpu_util metric, polling this metric returns "None". What this means
  is that all Watcher strategies, particularly those relating to
  workload balancing and migration of VMs to under-utilized hosts, which
  rely on cpu_util are non-functional from Victoria, when the metric was
  removed, until Caracal.

    * This commit correctly performs the cpu calculation as intended
  using gnocchi. The calculation is summarized in the next bullet point
  and there is an example calculation in the original commit

    * Gnocchi uses the cumulative cpu time in ns (reported by the
  ceilometer metric, "cpu") and consumes it as a rate (essentially it
  computes the difference in cumulative cpu time over the last two
  sampling intervals) to find the total cpu time during the previous
  sampling period. Dividing the cpu time in one interval by the duration
  of the interval multiplied by the number of vcpus provides the cpu
  utilization as a percentage: cpu_usage = [cpu_time / (period * 10^9 *
  nvcpus)] * 100%. A sample calculation is provided in the original
  commit message.

    * I cherry-picked to stable/2023.2 [3], but the other branches have
  gone unmaintained

  [ Test Plan ]

    * Deploy openstack yoga on jammy with watcher and gnocchi services

    * Launch a server and take note of it's resource id. Then find the
  gnocchi cpu metric associated with the instance

    * Create a watcher audit based on a goal that previously depended on instance cpu utilization (from Watcher's perspective this is called instance_cpu_usage). For example the workload_balance goal [4] depends on instance_cpu_usage
      Ex. openstack optimize audit create -t CONTINUOUS -i 60 -g workload_balancing -s workload_balance --auto-trigger

    * Without the patch, the workload_balance strategy does not work.
  The audit will be created, but it cannot provide any meaningful action
  plan since instance_cpu_usage is None in the audits. With the patch
  Watcher obtains the correct cpu utilization percentage and the
  strategies work as expected.

    * Wait for at least one sampling period to elapse and check
  /var/log/watcher/watcher-decision-engine.log for entries showing
  "instance_cpu_usage" - this is the cpu utilization as a percentage.

    * To verify the percentage with a manual calculation, run gnocchi
  measure show <metric uuid> --aggregation "rate:mean" and perform the
  calculation instance_cpu_usage = 100*[<value> / (period * 10^9 *
  nvcpus) using the cpu time from the corresponding sampling period

  [ What can go wrong ]

    * While the patch restores functionality by calculating cpu
  utilization using gnocchi's rate metric, if gnocchi is misconfigured
  or the relevant "cpu" metric is missing, the new calculation may not
  work as anticipated

  [1] https://docs.openstack.org/releasenotes/ceilometer/rocky.html
  [2] https://review.opendev.org/c/openstack/watcher/+/898791
  [3] https://review.opendev.org/c/openstack/watcher/+/934181
  [4] https://docs.openstack.org/watcher/2024.1/strategies/workload_balance.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2088620/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list