NACK: [SRU][P][PATCH 0/1] I/O performance regression on NVMes under same bridge (dual port nvme) (LP: #2115738)
Massimiliano Pellizzer
massimiliano.pellizzer at canonical.com
Thu Jul 24 20:03:09 UTC 2025
On Thu, 24 Jul 2025 at 09:43, Cengiz Can <cengiz.can at canonical.com> wrote:
>
> On 18-07-25 15:46:10, Massimiliano Pellizzer wrote:
> > BugLink: https://bugs.launchpad.net/bugs/2115738
> >
> > [ Impact ]
> >
> > iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes
> >
> > The iotlb_sync_map iommu ops allows drivers to perform necessary cache
> > flushes when new mappings are established. For the Intel iommu driver,
> > this callback specifically serves two purposes:
> >
> > - To flush caches when a second-stage page table is attached to a device
> > whose iommu is operating in caching mode (CAP_REG.CM==1).
> > - To explicitly flush internal write buffers to ensure updates to memory-
> > resident remapping structures are visible to hardware (CAP_REG.RWBF==1).
> >
> > However, in scenarios where neither caching mode nor the RWBF flag is
> > active, the cache_tag_flush_range_np() helper, which is called in the
> > iotlb_sync_map path, effectively becomes a no-op.
> >
> > Despite being a no-op, cache_tag_flush_range_np() involves iterating
> > through all cache tags of the iommu's attached to the domain, protected
> > by a spinlock. This unnecessary execution path introduces overhead,
> > leading to a measurable I/O performance regression. On systems with NVMes
> > under the same bridge, performance was observed to drop from approximately
> > ~6150 MiB/s down to ~4985 MiB/s.
> >
> > Introduce a flag in the dmar_domain structure. This flag will only be set
> > when iotlb_sync_map is required (i.e., when CM or RWBF is set). The
> > cache_tag_flush_range_np() is called only for domains where this flag is
> > set. This flag, once set, is immutable, given that there won't be mixed
> > configurations in real-world scenarios where some IOMMUs in a system
> > operate in caching mode while others do not. Theoretically, the
> > immutability of this flag does not impact functionality.
> >
> > [ Fix ]
> >
> > Backport the following commit:
> > - 12724ce3fe1a iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF
> > modes
> > from linux-next to Plucky.
> >
> > [ Test Plan ]
>
> Acked-by: Cengiz Can <cengiz.can at canonical.com>
>
> >
> > Run fio against two NVMEs under the same pci bridge (dual port NVMe):
> >
> > $ sudo fio --readwrite=randread --blocksize=4k --iodepth=32 --numjobs=8
> > --time_based --runtime=40 --ioengine=libaio --direct=1 --group_reporting
> > --new_group --name=job1 --filename=/dev/nvmeXnY --new_group --name=job2
> > --filename=/dev/nvmeWnZ
> >
> > verify that the speed reached with the two NVMEs under the same bridge is the
> > same that would have been reached if the two NVMEs were not under the same
> > bridge.
> >
> > [ Regression Potential ]
> >
> > This fix affects the Intel IOMMU (VT-d) driver.
> > An issue with this fix may introduce problems such as
> > incorrect omission of required IOTLB cache or write buffer flushes
> > when attaching devices to a domain.
> > This could result in memory remapping structures not being visible
> > to hardware in configurations that actually require synchronization.
> > As a consequence, devices performing DMA may exhibit data corruption,
> > access violations, or inconsistent behavior due to stale or incomplete
> > translations being used by the hardware.
> >
> >
> > --
> > kernel-team mailing list
> > kernel-team at lists.ubuntu.com
> > https://lists.ubuntu.com/mailman/listinfo/kernel-team
Nacked because this patch introduced a regression upstream that has
been addressed here:
- https://lore.kernel.org/all/20250721051657.1695788-1-baolu.lu@linux.intel.com/
I will send the patch again with the fix
--
Massimiliano Pellizzer
More information about the kernel-team
mailing list