[ack-ish] Re: [Precise SRU] Call xen_poll_irq with interrupts disabled

Tue Feb 5 14:24:04 UTC 2013

On 05.02.2013 15:19, Stefan Bader wrote:
> On 05.02.2013 14:25, Andy Whitcroft wrote:
>> On Mon, Feb 04, 2013 at 05:39:41PM +0100, Stefan Bader wrote:
>>> This has been sitting around for a while now. The problem is that
>>> I was not yet able to find a convincing explanation *why* this is
>>> fixing things. But testing was very positive. So its probably better
>>> to carry it as SAUCE for now, at least for Precise.
>>> I need to re-check things with latest kernels and Xen versions.
>>>
>>> -Stefan
>>>
>>> ---
>>>
>>>
>>> From e89064720b518e3c64f0fe56b11edb44125f8b7e Mon Sep 17 00:00:00 2001
>>> From: Stefan Bader <stefan.bader at canonical.com>
>>> Date: Thu, 18 Oct 2012 21:40:37 +0200
>>> Subject: [PATCH] UBUNTU: SAUCE: xen/pv-spinlock: Never enable interrupts in
>>>  xen_spin_lock_slow()
>>>
>>> The pv-ops spinlock code for Xen will call xen_spin_lock_slow()
>>> if a lock cannot be obtained quickly by normal spinning. The
>>> slow variant will put the VCPU onto a waiters list, then
>>> enable interrupts (which are event channel messages in this
>>> case) and calls a hypervisor function that is supposed to return
>>> when the lock is released.
>>>
>>> Using a test case which has a lot of threads running that cause
>>> a high utilization of this spinlock code, it is observed that
>>> the VCPU which is the first one on the waiters list seems to be
>>> doing other things (no trace of the hv call on the stack) while
>>> the waiters list still has it as the head. So other VCPUs which
>>> try to acquire the lock are stuck in the hv call forever. And
>>> that sooner than later end up in some circular locking.
>>>
>>> By testing I can see this gets avoided when the interrupts are
>>> not enabled before the hv call (and disabled right after).
>>
>> I think this should read "... avoided if we leave interrupts off before
>> the call and remain off after it."
> 
> It could be put that way as well. The code has interrupts disabled up to the
> point where the poll_irq hypercall is done. In the old form it could be
> (depending on the state of interrupts before the spinlock call) that interrupts
> are enabled and right after the call they get disabled again. So sometimes
> interrupts are enabled during the hypercall.
> 
>>
>>> In theory this could cause a performance degression. Though it
>>> looked like the hypervisor would actuall re-enable the interrupts
>>> after some safety measures. So maybe not much regression at all.
>>> At least testing did not yield a significant penalty.
>>>
>>> Xen PVM is the only affected setup because that is the only
>>> one using pv-ops spinlocks in a way that changes interrupt
>>> flags (KVM maps the variant which could to the function that
>>> does not modify those flags).
>>
>> I am taking this to say "KVM which uses a similar scheme also leaves the
>> interrupts off over the equivalent calls.".
> 
> Right, there is a spinlock_ops structure (maybe not that exact name) which has
> two spinlock interfaces. One that allows to pass on the flags as they were and
> the other which does not care. KVM only uses the don-care version which disables
> interrupts and gives you no hint whether they where enabled before or  not.
> 
> 
>>
>>> BugLink: http://bugs.launchpad.net/bugs/1011792
>>>
>>> Signed-off-by: Stefan Bader <stefan.bader at canonical.com>
>>> ---
>>>  arch/x86/xen/spinlock.c |   23 +++++------------------
>>>  1 file changed, 5 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
>>> index d69cc6c..fa926ab 100644
>>> --- a/arch/x86/xen/spinlock.c
>>> +++ b/arch/x86/xen/spinlock.c
>>> @@ -24,7 +24,6 @@ static struct xen_spinlock_stats
>>>  	u32 taken_slow_nested;
>>>  	u32 taken_slow_pickup;
>>>  	u32 taken_slow_spurious;
>>> -	u32 taken_slow_irqenable;
>>>  
>>>  	u64 released;
>>>  	u32 released_slow;
>>> @@ -197,7 +196,7 @@ static inline void unspinning_lock(struct xen_spinlock *xl, struct xen_spinlock
>>>  	__this_cpu_write(lock_spinners, prev);
>>>  }
>>>  
>>> -static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable)
>>> +static noinline int xen_spin_lock_slow(struct arch_spinlock *lock)
>>>  {
>>>  	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
>>>  	struct xen_spinlock *prev;
>>> @@ -218,8 +217,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab
>>>  	ADD_STATS(taken_slow_nested, prev != NULL);
>>>  
>>>  	do {
>>> -		unsigned long flags;
>>> -
>>>  		/* clear pending */
>>>  		xen_clear_irq_pending(irq);
>>>  
>>> @@ -239,12 +236,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab
>>>  			goto out;
>>>  		}
>>>  
>>> -		flags = arch_local_save_flags();
>>> -		if (irq_enable) {
>>> -			ADD_STATS(taken_slow_irqenable, 1);
>>> -			raw_local_irq_enable();
>>> -		}
>>> -
>>>  		/*
>>>  		 * Block until irq becomes pending.  If we're
>>>  		 * interrupted at this point (after the trylock but
>>> @@ -256,8 +247,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab
>>>  		 */
>>>  		xen_poll_irq(irq);
>>>  
>>> -		raw_local_irq_restore(flags);
>>> -
>>>  		ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq));
>>>  	} while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */
>>>  
>>> @@ -270,7 +259,7 @@ out:
>>>  	return ret;
>>>  }
>>>  
>>> -static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
>>> +static inline void __xen_spin_lock(struct arch_spinlock *lock)
>>>  {
>>>  	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
>>>  	unsigned timeout;
>>> @@ -302,19 +291,19 @@ static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
>>>  		spin_time_accum_spinning(start_spin_fast);
>>>  
>>>  	} while (unlikely(oldval != 0 &&
>>> -			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock, irq_enable))));
>>> +			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock))));
>>>  
>>>  	spin_time_accum_total(start_spin);
>>>  }
>>>  
>>>  static void xen_spin_lock(struct arch_spinlock *lock)
>>>  {
>>> -	__xen_spin_lock(lock, false);
>>> +	__xen_spin_lock(lock);
>>>  }
>>>  
>>>  static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
>>>  {
>>> -	__xen_spin_lock(lock, !raw_irqs_disabled_flags(flags));
>>> +	__xen_spin_lock(lock);
>>>  }
>>>  
>>>  static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
>>> @@ -424,8 +413,6 @@ static int __init xen_spinlock_debugfs(void)
>>>  			   &spinlock_stats.taken_slow_pickup);
>>>  	debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
>>>  			   &spinlock_stats.taken_slow_spurious);
>>> -	debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug,
>>> -			   &spinlock_stats.taken_slow_irqenable);
>>>  
>>>  	debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.released);
>>>  	debugfs_create_u32("released_slow", 0444, d_spin_debug,
>>
>> Based on the overall testing being so good, and on the fact that the
>> change seems to make us more like KVM this ought to be good.
>>
>> As the fix is not yet upstream and under discussion still there, we might
>> want to 'reduce' this patch a little and just modify xen_spin_lock_slow
>> to essentially accept but ignore the irq_enable flag.  It would reduce
>> the code churn until we have an approved fix.
>>
>> Overall I think this has merit and though I would prefer a smaller code
>> snippet for this, I think we do need this fix, and if others are happy
>> with this larger patch I am happy:
>>
>> Acked-by: Andy Whitcroft <apw at canonical.com>
>>
>> -apw
>>
> 
> Yeah, I could basically just comment out the two interrupt changing lines in
> xen_spin_lock_slow() if that is preferred. Just did not seem to make sense to
> fake the statistics there.
> 
> -Stefan
> 
> 
> 

Not compile tested but something like this

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d69cc6c..6051359 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -242,7 +242,7 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock
                flags = arch_local_save_flags();
                if (irq_enable) {
                        ADD_STATS(taken_slow_irqenable, 1);
-                       raw_local_irq_enable();
+                       /* raw_local_irq_enable(); */
                }

                /*
@@ -256,7 +256,7 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock
                 */
                xen_poll_irq(irq);

-               raw_local_irq_restore(flags);
+               /* raw_local_irq_restore(flags); */

                ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq));
        } while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 899 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20130205/5c672514/attachment.sig>