NAK: [SRU][J][PULL] hwmon: (coretemp) Fix core count limitation
Kleber Souza
kleber.sacilotto.de.souza at canonical.com
Thu Mar 28 17:07:09 UTC 2024
On 22.03.24 14:28, Thibault Ferrante wrote:
> BugLink: https://bugs.launchpad.net/bugs/2058668
>
> [Impact]
>
> In linux 6.8 the coretemp driver supports at most 128 cores per package.
> Cores higher than 128 will lose their core temperature information.
>
> There is an upstream patch set that allows to support more than 128
> cores per package, it's applied to linux-next, then to Noble.
>
> We should apply the patch set to the Jammy 5.15 kernel, so that we can
> properly support systems with a large amount of cores per package.
>
> [Test case]
>
> Read temperature info from /sys/class/hwmon on a system with > 128 cores
> per package (that means we don't have a proper test case to verify the
> fix at the moment).
>
> [Fix]
>
> A series of patch is part of this improvement:
>
> 1a793caf6f69 hwmon: (coretemp) Use dynamic allocated memory for core temp_data
> 18b24a5f9ca3 hwmon: (coretemp) Remove redundant temp_data->is_pkg_data
> 326241f71f3d hwmon: (coretemp) Split package temp_data and core temp_data
> b0b01414a261 hwmon: (coretemp) Abstract core_temp helpers
> 87eb801925a0 hwmon: (coretemp) Remove redundant pdata->cpu_map[]
> 18d8f5583388 hwmon: (coretemp) Replace sensor_device_attribute with device_attribute
> 25f8e01baa05 hwmon: (coretemp) Remove unnecessary dependency of array index
> c8c2074020a8 hwmon: (coretemp) Introduce enum for attr index
>
> And some patch are required to make the backporting clean:
>
> 34cf8c657cf03 hwmon: (coretemp) Enlarge per package core count limit
> fdaf0c8629d45 hwmon: (coretemp) Fix bogus core_id to attr name mapping
> 4e440abc89458 hwmon: (coretemp) Fix out-of-bounds memory access
> a2930f6dc90f0 hwmon: (coretemp) Delete an obsolete comment
> 6c2b659913ad9 hwmon: (coretemp) Delete tjmax debug message
> 0f8b916bc5b5d hwmon: (coretemp) avoid RDMSR interrupts to isolated CPUs
> fae30e3c203e0 hwmon: (coretemp) Add support for dynamic ttarget
> c0c67f8761cec hwmon: (coretemp) Add support for dynamic tjmax
> 2bc0e6d07ee50 hwmon: (coretemp) rearrange tjmax handing code
> 5c0e64dde80ff hwmon: (coretemp) Remove obsolete temp_data->valid
>
> Only 5c0e64dde80ff has to be modified as it's deleting a variable which changed type
> because of a refactoring.
>
> There is a number of commits, but they are only changing one file.
>
> [Regression potential]
>
> We may experience hwmon-related regressions, either systems reading
> incorrect temperature information or even bugs/crashes when accessing
> data from /sys/class/hwmon.
>
> The following changes since commit 54bc2c0c9882a:
>
> hwmon: (coretemp) Remove obsolete temp_data->valid
>
> are available in the Git repository at:
>
> git://git.launchpad.net/~thibf/+git/jammy-linux bug2058668
>
> for you to fetch changes up to 1bf7903b38225:
>
> hwmon: (coretemp) Use dynamic allocated memory for core temp_data
>
> ----------------------------------------------------------------
> Marcelo Tosatti (1):
> hwmon: (coretemp) avoid RDMSR interrupts to isolated CPUs
>
> Thibault Ferrante (1):
> hwmon: (coretemp) Remove obsolete temp_data->valid
>
> Zhang Rui (15):
> hwmon: (coretemp) rearrange tjmax handing code
> hwmon: (coretemp) Add support for dynamic tjmax
> hwmon: (coretemp) Add support for dynamic ttarget
> hwmon: (coretemp) Delete tjmax debug message
> hwmon: (coretemp) Fix out-of-bounds memory access
> hwmon: (coretemp) Fix bogus core_id to attr name mapping
> hwmon: (coretemp) Enlarge per package core count limit
> hwmon: (coretemp) Introduce enum for attr index
> hwmon: (coretemp) Remove unnecessary dependency of array index
> hwmon: (coretemp) Replace sensor_device_attribute with
> device_attribute
> hwmon: (coretemp) Remove redundant pdata->cpu_map[]
> hwmon: (coretemp) Abstract core_temp helpers
> hwmon: (coretemp) Split package temp_data and core temp_data
> hwmon: (coretemp) Remove redundant temp_data->is_pkg_data
> hwmon: (coretemp) Use dynamic allocated memory for core temp_data
>
> drivers/hwmon/coretemp.c | 434 ++++++++++++++++++++++-----------------
> 1 file changed, 246 insertions(+), 188 deletions(-)
>
My apologies, I've made a mistake here with the list of patchsets which
were expected to be backported to 5.15. This one is *not* supposed to
be backported to 5.15, only to 6.8 (which was already done).
I'm sorry about everyone's wasted efforts.
Kleber
More information about the kernel-team
mailing list