ACK: [SRU][P][PATCH 0/1] I/O performance regression on NVMes under same bridge (dual port nvme) (LP: #2115738)

Tim Whisonant tim.whisonant at canonical.com
Sat Jul 19 01:59:07 UTC 2025


On Fri, Jul 18, 2025 at 03:46:10PM +0200, Massimiliano Pellizzer wrote:
> BugLink: https://bugs.launchpad.net/bugs/2115738
> 
> [ Impact ]
> 
> iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes
> 
> The iotlb_sync_map iommu ops allows drivers to perform necessary cache
> flushes when new mappings are established. For the Intel iommu driver,
> this callback specifically serves two purposes:
> 
> - To flush caches when a second-stage page table is attached to a device
>   whose iommu is operating in caching mode (CAP_REG.CM==1).
> - To explicitly flush internal write buffers to ensure updates to memory-
>   resident remapping structures are visible to hardware (CAP_REG.RWBF==1).
> 
> However, in scenarios where neither caching mode nor the RWBF flag is
> active, the cache_tag_flush_range_np() helper, which is called in the
> iotlb_sync_map path, effectively becomes a no-op.
> 
> Despite being a no-op, cache_tag_flush_range_np() involves iterating
> through all cache tags of the iommu's attached to the domain, protected
> by a spinlock. This unnecessary execution path introduces overhead,
> leading to a measurable I/O performance regression. On systems with NVMes
> under the same bridge, performance was observed to drop from approximately
> ~6150 MiB/s down to ~4985 MiB/s.
> 
> Introduce a flag in the dmar_domain structure. This flag will only be set
> when iotlb_sync_map is required (i.e., when CM or RWBF is set). The
> cache_tag_flush_range_np() is called only for domains where this flag is
> set. This flag, once set, is immutable, given that there won't be mixed
> configurations in real-world scenarios where some IOMMUs in a system
> operate in caching mode while others do not. Theoretically, the
> immutability of this flag does not impact functionality.
> 
> [ Fix ]
> 
> Backport the following commit:
> - 12724ce3fe1a iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF
> modes
> from linux-next to Plucky.
> 
> [ Test Plan ]
> 
> Run fio against two NVMEs under the same pci bridge (dual port NVMe):
> 
> $ sudo fio --readwrite=randread --blocksize=4k --iodepth=32 --numjobs=8
> --time_based --runtime=40 --ioengine=libaio --direct=1 --group_reporting
> --new_group --name=job1 --filename=/dev/nvmeXnY --new_group --name=job2
> --filename=/dev/nvmeWnZ
> 
> verify that the speed reached with the two NVMEs under the same bridge is the
> same that would have been reached if the two NVMEs were not under the same
> bridge.
> 
> [ Regression Potential ]
> 
> This fix affects the Intel IOMMU (VT-d) driver.
> An issue with this fix may introduce problems such as
> incorrect omission of required IOTLB cache or write buffer flushes
> when attaching devices to a domain.
> This could result in memory remapping structures not being visible
> to hardware in configurations that actually require synchronization.
> As a consequence, devices performing DMA may exhibit data corruption,
> access violations, or inconsistent behavior due to stale or incomplete
> translations being used by the hardware.
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

Acked-by: Tim Whisonant <tim.whisonant at canonical.com>



More information about the kernel-team mailing list