NAK: [SRU][J][PULL] hwmon: (coretemp) Fix core count limitation

Kleber Souza kleber.sacilotto.de.souza at canonical.com
Thu Mar 28 17:07:09 UTC 2024


On 22.03.24 14:28, Thibault Ferrante wrote:
> BugLink: https://bugs.launchpad.net/bugs/2058668
> 
> [Impact]
> 
> In linux 6.8 the coretemp driver supports at most 128 cores per package.
> Cores higher than 128 will lose their core temperature information.
> 
> There is an upstream patch set that allows to support more than 128
> cores per package, it's applied to linux-next, then to Noble.
> 
> We should apply the patch set to the Jammy 5.15 kernel, so that we can
> properly support systems with a large amount of cores per package.
> 
> [Test case]
> 
> Read temperature info from /sys/class/hwmon on a system with > 128 cores
> per package (that means we don't have a proper test case to verify the
> fix at the moment).
> 
> [Fix]
> 
> A series of patch is part of this improvement:
> 
> 1a793caf6f69 hwmon: (coretemp) Use dynamic allocated memory for core temp_data
> 18b24a5f9ca3 hwmon: (coretemp) Remove redundant temp_data->is_pkg_data
> 326241f71f3d hwmon: (coretemp) Split package temp_data and core temp_data
> b0b01414a261 hwmon: (coretemp) Abstract core_temp helpers
> 87eb801925a0 hwmon: (coretemp) Remove redundant pdata->cpu_map[]
> 18d8f5583388 hwmon: (coretemp) Replace sensor_device_attribute with device_attribute
> 25f8e01baa05 hwmon: (coretemp) Remove unnecessary dependency of array index
> c8c2074020a8 hwmon: (coretemp) Introduce enum for attr index
> 
> And some patch are required to make the backporting clean:
> 
> 34cf8c657cf03 hwmon: (coretemp) Enlarge per package core count limit
> fdaf0c8629d45 hwmon: (coretemp) Fix bogus core_id to attr name mapping
> 4e440abc89458 hwmon: (coretemp) Fix out-of-bounds memory access
> a2930f6dc90f0 hwmon: (coretemp) Delete an obsolete comment
> 6c2b659913ad9 hwmon: (coretemp) Delete tjmax debug message
> 0f8b916bc5b5d hwmon: (coretemp) avoid RDMSR interrupts to isolated CPUs
> fae30e3c203e0 hwmon: (coretemp) Add support for dynamic ttarget
> c0c67f8761cec hwmon: (coretemp) Add support for dynamic tjmax
> 2bc0e6d07ee50 hwmon: (coretemp) rearrange tjmax handing code
> 5c0e64dde80ff hwmon: (coretemp) Remove obsolete temp_data->valid
> 
> Only 5c0e64dde80ff has to be modified as it's deleting a variable which changed type
> because of a refactoring.
> 
> There is a number of commits, but they are only changing one file.
> 
> [Regression potential]
> 
> We may experience hwmon-related regressions, either systems reading
> incorrect temperature information or even bugs/crashes when accessing
> data from /sys/class/hwmon.
> 
> The following changes since commit 54bc2c0c9882a:
> 
>    hwmon: (coretemp) Remove obsolete temp_data->valid
> 
> are available in the Git repository at:
> 
>    git://git.launchpad.net/~thibf/+git/jammy-linux bug2058668
> 
> for you to fetch changes up to 1bf7903b38225:
> 
>    hwmon: (coretemp) Use dynamic allocated memory for core temp_data
> 
> ----------------------------------------------------------------
> Marcelo Tosatti (1):
>    hwmon: (coretemp) avoid RDMSR interrupts to isolated CPUs
> 
> Thibault Ferrante (1):
>    hwmon: (coretemp) Remove obsolete temp_data->valid
> 
> Zhang Rui (15):
>    hwmon: (coretemp) rearrange tjmax handing code
>    hwmon: (coretemp) Add support for dynamic tjmax
>    hwmon: (coretemp) Add support for dynamic ttarget
>    hwmon: (coretemp) Delete tjmax debug message
>    hwmon: (coretemp) Fix out-of-bounds memory access
>    hwmon: (coretemp) Fix bogus core_id to attr name mapping
>    hwmon: (coretemp) Enlarge per package core count limit
>    hwmon: (coretemp) Introduce enum for attr index
>    hwmon: (coretemp) Remove unnecessary dependency of array index
>    hwmon: (coretemp) Replace sensor_device_attribute with
>      device_attribute
>    hwmon: (coretemp) Remove redundant pdata->cpu_map[]
>    hwmon: (coretemp) Abstract core_temp helpers
>    hwmon: (coretemp) Split package temp_data and core temp_data
>    hwmon: (coretemp) Remove redundant temp_data->is_pkg_data
>    hwmon: (coretemp) Use dynamic allocated memory for core temp_data
> 
>   drivers/hwmon/coretemp.c | 434 ++++++++++++++++++++++-----------------
>   1 file changed, 246 insertions(+), 188 deletions(-)
> 

My apologies, I've made a mistake here with the list of patchsets which
were expected to be backported to 5.15. This one is *not* supposed to
be backported to 5.15, only to 6.8 (which was already done).

I'm sorry about everyone's wasted efforts.

Kleber



More information about the kernel-team mailing list