ACK: [Oneiric PATCH] x86, tsc: Fix SMI induced variation in quick_pit_calibrate()

Stefan Bader stefan.bader at canonical.com
Wed Mar 28 07:37:22 UTC 2012


On 27.03.2012 16:34, Tim Gardner wrote:
> From: Linus Torvalds <torvalds at linux-foundation.org>
> 
> BugLink: http://bugs.launchpad.net/bugs/965586
> 
> pit_expect_msb() returns success wrongly in the below SMI scenario:
> 
> a. pit_verify_msb() has not yet seen the MSB transition.
> 
> b. we are close to the MSB transition though and got a SMI immediately after
>    returning from pit_verify_msb() which didn't see the MSB transition. PIT MSB
>    transition has happened somewhere during SMI execution.
> 
> c. returned from SMI and we noted down the 'tsc', saw the pit MSB change now and
>    exited the loop to calculate 'deltatsc'. Instead of noting the TSC at the MSB
>    transition, we are way off because of the SMI.  And as the SMI happened
>    between the pit_verify_msb() and before the 'tsc' is recorded in the
>    for loop, 'delattsc' (d1/d2 in quick_pit_calibrate()) will be small and
>    quick_pit_calibrate() will not notice this error.
> 
> Depending on whether SMI disturbance happens while computing d1 or d2, we will
> see the TSC calibrated value smaller or bigger than the expected value. As a
> result, in a cluster we were seeing a variation of approximately +/- 20MHz in
> the calibrated values, resulting in NTP failures.
> 
>   [ As far as the SMI source is concerned, this is a periodic SMI that gets
>     disabled after ACPI is enabled by the OS. But the TSC calibration happens
>     before the ACPI is enabled. ]
> 
> To address this, change pit_expect_msb() so that
> 
>  - the 'tsc' is the TSC in between the two reads that read the MSB
> change from the PIT (same as before)
> 
>  - the 'delta' is the difference in TSC from *before* the MSB changed
> to *after* the MSB changed.
> 
> Now the delta is twice as big as before (it covers four PIT accesses,
> roughly 4us) and quick_pit_calibrate() will loop a bit longer to get
> the calibrated value with in the 500ppm precision. As the delta (d1/d2)
> covers four PIT accesses, actual calibrated result might be closer to
> 250ppm precision.
> 
> As the loop now takes longer to stabilize, double MAX_QUICK_PIT_MS to 50.
> 
> SMI disturbance will showup as much larger delta's and the loop will take
> longer than usual for the result to be with in the accepted precision. Or will
> fallback to slow PIT calibration if it takes more than 50msec.
> 
> Also while we are at this, remove the calibration correction that aims to
> get the result to the middle of the error bars. We really don't know which
> direction to correct into, so remove it.
> 
> Reported-and-tested-by: Suresh Siddha <suresh.b.siddha at intel.com>
> Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
> Signed-off-by: Suresh Siddha <suresh.b.siddha at intel.com>
> Link: http://lkml.kernel.org/r/1326843337.5291.4.camel@sbsiddha-mobl2
> Signed-off-by: H. Peter Anvin <hpa at zytor.com>
> (cherry picked from commit 68f30fbee19cc67849b9fa8e153ede70758afe81)
> 
> Signed-off-by: Tim Gardner <tim.gardner at canonical.com>
> ---
>  arch/x86/kernel/tsc.c |   14 ++++++--------
>  1 files changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 6cc6922..fc60bd9 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -291,14 +291,15 @@ static inline int pit_verify_msb(unsigned char val)
>  static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *deltap)
>  {
>  	int count;
> -	u64 tsc = 0;
> +	u64 tsc = 0, prev_tsc = 0;
>  
>  	for (count = 0; count < 50000; count++) {
>  		if (!pit_verify_msb(val))
>  			break;
> +		prev_tsc = tsc;
>  		tsc = get_cycles();
>  	}
> -	*deltap = get_cycles() - tsc;
> +	*deltap = get_cycles() - prev_tsc;
>  	*tscp = tsc;
>  
>  	/*
> @@ -312,9 +313,9 @@ static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *de
>   * How many MSB values do we want to see? We aim for
>   * a maximum error rate of 500ppm (in practice the
>   * real error is much smaller), but refuse to spend
> - * more than 25ms on it.
> + * more than 50ms on it.
>   */
> -#define MAX_QUICK_PIT_MS 25
> +#define MAX_QUICK_PIT_MS 50
>  #define MAX_QUICK_PIT_ITERATIONS (MAX_QUICK_PIT_MS * PIT_TICK_RATE / 1000 / 256)
>  
>  static unsigned long quick_pit_calibrate(void)
> @@ -384,15 +385,12 @@ success:
>  	 *
>  	 * As a result, we can depend on there not being
>  	 * any odd delays anywhere, and the TSC reads are
> -	 * reliable (within the error). We also adjust the
> -	 * delta to the middle of the error bars, just
> -	 * because it looks nicer.
> +	 * reliable (within the error).
>  	 *
>  	 * kHz = ticks / time-in-seconds / 1000;
>  	 * kHz = (t2 - t1) / (I * 256 / PIT_TICK_RATE) / 1000
>  	 * kHz = ((t2 - t1) * PIT_TICK_RATE) / (I * 256 * 1000)
>  	 */
> -	delta += (long)(d2 - d1)/2;
>  	delta *= PIT_TICK_RATE;
>  	do_div(delta, i*256*1000);
>  	printk("Fast TSC calibration using PIT\n");

Seems to implement as the description say. Upstream and cherry picked and bad
time values are bad.

Acked-by: Stefan Bader <stefan.bader at canonical.com>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20120328/48d430a6/attachment.sig>


More information about the kernel-team mailing list