[SRU][N][PATCH 0/1] nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
Ioanna Alifieraki
ioanna-maria.alifieraki at canonical.com
Tue May 6 13:11:50 UTC 2025
BugLink: https://bugs.launchpad.net/bugs/2106381
[Impact]
A user reported a bug in nvme over tcp driver affecting aarch64 architectures.
In weakly ordered architectures the compiler can reorder the instructions reading/setting
queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging.
The bug has been fixed upstream in [1], introduced in 6.14.
[Test Plan]
The bug is reproducible on arm64 architectures.
Setup nvme over tcp.
Using an arm based machien as the target run a fio test with the following config:
[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/path/to/storage
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite
[Where problems could occur]
To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE
statements and queue->rcv_state with smp_load_acquire and smp_store_release.
The patch modifies the nvme-tcp driver and therefore any potential regressions
regard setups using nvme over tpc.
[Other Info]
The user is able to reproduce the issue with kernles 5.19(no longer supported), 6.8 and 6.11.
For 6.11 Oracular the patch is pulled in as part of upstream stable patchset 2025-04-15, LP #2107437, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2107437
[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352
Meir Elisha (1):
nvmet-tcp: Fix a possible sporadic response drops in weakly ordered
arch
drivers/nvme/target/tcp.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
--
2.34.1
More information about the kernel-team
mailing list