[SRU][P][PATCH 0/1] I/O performance regression on NVMes under same bridge (dual port nvme) (LP: #2115738)

Massimiliano Pellizzer massimiliano.pellizzer at canonical.com
Fri Jul 18 13:46:10 UTC 2025


BugLink: https://bugs.launchpad.net/bugs/2115738

[ Impact ]

iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes

The iotlb_sync_map iommu ops allows drivers to perform necessary cache
flushes when new mappings are established. For the Intel iommu driver,
this callback specifically serves two purposes:

- To flush caches when a second-stage page table is attached to a device
  whose iommu is operating in caching mode (CAP_REG.CM==1).
- To explicitly flush internal write buffers to ensure updates to memory-
  resident remapping structures are visible to hardware (CAP_REG.RWBF==1).

However, in scenarios where neither caching mode nor the RWBF flag is
active, the cache_tag_flush_range_np() helper, which is called in the
iotlb_sync_map path, effectively becomes a no-op.

Despite being a no-op, cache_tag_flush_range_np() involves iterating
through all cache tags of the iommu's attached to the domain, protected
by a spinlock. This unnecessary execution path introduces overhead,
leading to a measurable I/O performance regression. On systems with NVMes
under the same bridge, performance was observed to drop from approximately
~6150 MiB/s down to ~4985 MiB/s.

Introduce a flag in the dmar_domain structure. This flag will only be set
when iotlb_sync_map is required (i.e., when CM or RWBF is set). The
cache_tag_flush_range_np() is called only for domains where this flag is
set. This flag, once set, is immutable, given that there won't be mixed
configurations in real-world scenarios where some IOMMUs in a system
operate in caching mode while others do not. Theoretically, the
immutability of this flag does not impact functionality.

[ Fix ]

Backport the following commit:
- 12724ce3fe1a iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF
modes
from linux-next to Plucky.

[ Test Plan ]

Run fio against two NVMEs under the same pci bridge (dual port NVMe):

$ sudo fio --readwrite=randread --blocksize=4k --iodepth=32 --numjobs=8
--time_based --runtime=40 --ioengine=libaio --direct=1 --group_reporting
--new_group --name=job1 --filename=/dev/nvmeXnY --new_group --name=job2
--filename=/dev/nvmeWnZ

verify that the speed reached with the two NVMEs under the same bridge is the
same that would have been reached if the two NVMEs were not under the same
bridge.

[ Regression Potential ]

This fix affects the Intel IOMMU (VT-d) driver.
An issue with this fix may introduce problems such as
incorrect omission of required IOTLB cache or write buffer flushes
when attaching devices to a domain.
This could result in memory remapping structures not being visible
to hardware in configurations that actually require synchronization.
As a consequence, devices performing DMA may exhibit data corruption,
access violations, or inconsistent behavior due to stale or incomplete
translations being used by the hardware.




More information about the kernel-team mailing list