APPLIED: [X/B][PATCH 0/2] Improve TSC refinement (and calibration) reliability
Khaled Elmously
khalid.elmously at canonical.com
Thu May 14 04:02:10 UTC 2020
Thanks!
On 2020-05-10 13:24:59 , Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1877858
>
> [Impact]
> * We received a report recently of a missing TSC refinement across multiple
> reboots of a server, in an Intel Skylake-based processor. This was only
> reproducible in Bionic pre-5.0.
>
> * After checking kernel commits, we came up with 2 commits that largely improve
> the situation: a786ef152cdc ("x86/tsc: Make calibration refinement more
> robust") [git.kernel.org/linus/a786ef152cdc] and 604dc9170f24 ("x86/tsc: Use
> CPUID.0x16 to calculate missing crystal frequency")
> [git.kernel.org/linus/604dc9170f24]. We hereby request SRU for both of them.
>
> * The first commit contains improvement in comments and in an offset to match
> more recent (fast) machines, but the important part is a retry mechanism in
> the TSC refinement (in case it fails due to some disturbance on TSC read, like
> NMIs/SMIs).
>
> * The second commit is an improvement in TSC calibration for Skylake (and some
> other models), by checking a register instead of relying on table-based
> hardcoded values.
>
> * A note for Xenial (kernel 4.4): the second patch would require the inclusion
> of more commits, so given the "maturity" of this release (and the fact kernel
> 4.15 is an HWE for Xenial), I've kept it out of Xenial, backporting only the
> first and more important patch for 4.4 .
>
> [Test case]
> * Unfortunately there's not an easy way to test the effectiveness of the
> commits, specially the refinement improvement.
>
> * The user that reported us the missing refinements was able to test 300
> reboots with a regular Bionic kernel (and it reproduced the issue at least
> once), whereas when they tested with Bionic kernel + both hereby proposed
> commits, the problem didn't happen.
>
> * Regarding the calibration commit, it was well-tested by community using
> multiple machines and checking the TSC calibration read vs. tables present
> in instlatx64.atw.hu .
>
> [Regression potential]
> * We consider the regression potential low, specially due to the nature of the
> patches: the first is basically a retry mechanism (and some improvement in an
> offset to reflect more recent machines), and the 2nd is an improvement for TSC
> calibration on some platforms (that are currently hardcoded in a table-based
> way in kernel). Also, the patches are present upstream for a while and I
> couldn't find any fixes for them.
>
> * An hypothetical regression from the 2nd patch could be in TSC precision
> calculation, which refinement itself might as well circumvent. From the first
> patch, a bug in code is the one hypothetical regression I could think.
>
> Daniel Drake (1):
> x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency
>
> Daniel Vacek (1):
> x86/tsc: Make calibration refinement more robust
>
> arch/x86/kernel/tsc.c | 77 ++++++++++++++++++++++++-------------------
> 1 file changed, 43 insertions(+), 34 deletions(-)
>
> --
> 2.25.2
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
More information about the kernel-team
mailing list