[SRU][J:linux-bluefield][PATCH v1 0/1] tcp: fix forever orphan socket caused by tcp_abort
Stav Aviram
saviram at nvidia.com
Sun Jul 6 13:15:37 UTC 2025
BugLink: https://bugs.launchpad.net/bugs/2114965
SRU Justification:
[Impact]
In BFB version DOCA_2.6.0_BSP_4.6.0_Ubuntu_22.04-2.20240114, container
deletion via removal of its kubelet YAML from /etc/kubelet.d sometimes
fails to complete. The process waits for the container to disappear from
crictl ps, but the container remains in Running state indefinitely. This
behavior is seen with container version 2.dev.50 and FW 32.40.0324.
The issue appears to stem from a kernel bug affecting orphaned TCP
sockets stuck in a zero-window state. These sockets are not closed and
timers are not rescheduled, leading to "forever orphan" behavior that
prevents resource cleanup.
[Fix]
Backporting the upstream commit:
bac76cf89816bff06c4ec2f3df97dc34e150a1c4 ("tcp: fix forever orphan socket caused by tcp_abort")
This commit removes a conditional check on SOCK_DEAD in tcp_abort,
allowing proper closure of orphaned sockets and preventing indefinite
stalling. Backporting is needed as the error handling and logging
methods differ from the original upstream code.
[Test Case]
Compile tested on linux-bluefield-5.15 on the master-next branch.
Further testing includes reproducing the issue by removing the pod's
YAML from /etc/kubelet.d and monitoring container termination using
crictl ps. With the patch applied, the container should no longer
remain stuck in Running state.
[Regression Potential]
The patch targets a specific edge case in TCP socket handling, and after
backporting, it is as close as possible to the original upstream commit.
However, since the change removes a check that previously avoided
closing SOCK_DEAD sockets, there's a small risk if other kernel paths
still rely on the earlier behavior. This could theoretically lead to
unexpected side effects in force-close logic if assumptions about socket
state are violated. Also, the backport is not an absolute match for the
original commit, and so there's a possibility for unexpected behavior in
edge cases related to socket teardown.
Xueming Feng (1):
tcp: fix forever orphan socket caused by tcp_abort
net/ipv4/tcp.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
--
2.34.1
More information about the kernel-team
mailing list