[Trusty][PATCH] UBUNTU: SAUCE: xen: do not re-use pirq number cached in pci device msi msg data

Wed Jan 18 16:08:48 UTC 2017

On Mon, Jan 16, 2017 at 6:34 AM, Stefan Bader
<stefan.bader at canonical.com> wrote:
> On 13.01.2017 22:13, Dan Streetman wrote:
>> BugLink: http://bugs.launchpad.net/bugs/1656381
>>
>> Revert the main part of commit:
>> af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>>
>> That commit introduced reading the pci device's msi message data to see
>> if a pirq was previously configured for the device's msi/msix, and re-use
>> that pirq.  At the time, that was the correct behavior.  However, a
>> later change to Qemu caused it to call into the Xen hypervisor to unmap
>> all pirqs for a pci device, when the pci device disables its MSI/MSIX
>> vectors; specifically the Qemu commit:
>> c976437c7dba9c7444fb41df45468968aaa326ad
>> ("qemu-xen: free all the pirqs for msi/msix when driver unload")
>>
>> Once Qemu added this pirq unmapping, it was no longer correct for the
>> kernel to re-use the pirq number cached in the pci device msi message
>> data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
>> pirqs when the pci device disables its MSI/MSIX vectors.
>
> In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the
> kernel (hm, actually same would apply to xenial via hwe path) would cause
> regressions when not using the cloud-archive...

Yes, but applying this patch to the Trusty kernel is completely
unrelated to a Trusty-based Xen hypervisor.  This change is to the
guest kernel.  Meaning, if we don't apply this kernel patch to the
Trusty kernel, X/Y/Z patched kernels will still see problems if
they're running under a Trusty-based Xen hypervisor (i.e. qemu 2.0.0
or earlier), while T unpatched kernel will see problems if it's
running under a newer (Vivid or later, or UCA Kilo or later) Xen
hypervisor.

Applying this to T as well as X/Y/Z means, all of them will work right
under qemu 2.1.0 or later Xen hypervisor, and all of them will not
work right under qemu 2.0.0 or earlier Xen hypervisor.

> But then, changing the 2.0.0 qemu to add this unmapping might cause other
> unexpected problems...

I opened bug 1657489 to track fixing Trusty qemu.
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1657489

I included a matrix there of hypervisor version and guest kernel
version, which I'll include here.

1) qemu not patched (2.0.0 and earlier), guest kernel not patched:
CORRECT behavior
hypervisor: Trusty qemu or UCA Icehouse qemu
guest: all without patch from bug 1656381
failure: none

2) qemu not patched (2.0.0 and earlier), guest kernel patched:
INCORRECT behavior
hypervisor: Trusty qemu or UCA Icehouse qemu
guest: all with patch from bug 1656381
failure: MSI interrupts will fail to be configured for any device, if
the device disables and then re-enables its MSI. Only the first time a
device enables MSI will work. For example, unloading a driver will
result in failure to enable MSI when the driver is reloaded.

3) qemu patched (2.1.0 and later), guest kernel not patched: INCORRECT behavior
hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
guest: all without patch from bug 1656381
failure: MSI interrupts in the guest may not be correctly mapped if
device B enables its MSI after device A has disabled its MSI; when
device A re-enables its MSI, some of its interrupts will fail to be
configured correctly. NVMe shows this repeatedly with multiple NVMe
controllers; usually only 1 NVMe controller will finish initialization
correctly.

4) qemu patched (2.1.0 and later), guest kernel patched: CORRECT behavior
hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
guest: all with patch from bug 1656381
failure: none

>
> -Stefan
>
>>
>> This bug is causing failures to initialize multiple NVMe controllers
>> under Xen, because the NVMe driver sets up a single MSIX vector for
>> each controller (concurrently), and then after using that to talk to
>> the controller for some configuration data, it disables the single MSIX
>> vector and re-configures all the MSIX vectors it needs.  So the MSIX
>> setup code tries to re-use the cached pirq from the first vector
>> for each controller, but the hypervisor has already given away that
>> pirq to another controller, and its initialization fails.
>>
>> This is discussed in more detail at:
>> https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
>>
>> Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>> Signed-off-by: Dan Streetman <dan.streetman at canonical.com>
>> (backported from X/Y/Z patch for context/function name changes only)
>>
>> ---
>>  arch/x86/pci/xen.c | 23 +++++++----------------
>>  1 file changed, 7 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> index 5eee495..6969c76 100644
>> --- a/arch/x86/pci/xen.c
>> +++ b/arch/x86/pci/xen.c
>> @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>>               return 1;
>>
>>       list_for_each_entry(msidesc, &dev->msi_list, list) {
>> -             __read_msi_msg(msidesc, &msg);
>> -             pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>> -                     ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>> -             if (msg.data != XEN_PIRQ_MSI_DATA ||
>> -                 xen_irq_from_pirq(pirq) < 0) {
>> -                     pirq = xen_allocate_pirq_msi(dev, msidesc);
>> -                     if (pirq < 0) {
>> -                             irq = -ENODEV;
>> -                             goto error;
>> -                     }
>> -                     xen_msi_compose_msg(dev, pirq, &msg);
>> -                     __write_msi_msg(msidesc, &msg);
>> -                     dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>> -             } else {
>> -                     dev_dbg(&dev->dev,
>> -                             "xen: msi already bound to pirq=%d\n", pirq);
>> +             pirq = xen_allocate_pirq_msi(dev, msidesc);
>> +             if (pirq < 0) {
>> +                     irq = -ENODEV;
>> +                     goto error;
>>               }
>> +             xen_msi_compose_msg(dev, pirq, &msg);
>> +             __write_msi_msg(msidesc, &msg);
>> +             dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>>               irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
>>                                              (type == PCI_CAP_ID_MSIX) ?
>>                                              "msi-x" : "msi",
>>
>
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>