[Bug 1940908] Re: resolved: closes listening socket too rapidly and sends Destination port unreachable
TJ
1940908 at bugs.launchpad.net
Wed Aug 25 20:07:14 UTC 2021
I've cherry-picked the upstream patches and built the package in my bug-
fixes PPA:
https://launchpad.net/~tj/+archive/ubuntu/bugfixes
Verified it solves the issue even in the face of a 1000ms delay being
imposed by the router using:
## example traffic control to slow down UDP port 53 traffic from a
specific upstream DNS server being forwarded by router for egress from
the LOCAL bridge device.
# tc qdisc add dev LOCAL root handle 1:0 prio
# tc qdisc add dev LOCAL parent 1:2 handle 10: netem delay 1000ms
# tc filter add dev LOCAL protocol ipv6 parent 1: prio 1 u32 match ip6 src fddc:7e00:e001:ee00::1/64 match ip6 sport 53 0xffff flowid 10:1
# tc filter add dev LOCAL protocol ipv6 parent 1: prio 1 u32 match ip6 dst fddc:7e00:e001:ee00::1/64 match ip6 dport 53 0xffff flowid 10:1
tc -s qdisc ls dev LOCAL
qdisc prio 1: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 4643351 bytes 7676 pkt (dropped 0, overlimits 0 requeues 0)
backlog 138b 1p requeues 0
qdisc netem 10: parent 1:2 limit 1000 delay 1s
Sent 2682417 bytes 3245 pkt (dropped 0, overlimits 0 requeues 0)
backlog 138b 1p requeues 0
## prio[rity] creates 3 bands (classes :1 :2 :3) by default. Interactive/immediate packets (UDP 53 DNS) should have Type Of Service (TOS 0x1000) set in the IP packet header by the resolvers. Default priomap puts those packets in the 2nd band (:2 for Interactive/Minimise delay). The netem delay qdisc is attached to $parent:2 with handle 10: (major:minor - minor defaults to 0). u32 (unsigned 32-bit) filters that match the UDP port 53 traffic direct it to the handle of the netem qdisc (flowid 10:1 - :1 being the first leaf) where a 300ms delay is imposed.
# tcpdump -vvvni enp2s0 "(ip6 and port 53) or (icmp6[icmp6type] = 1 and icmp6[icmp6code] = 4)"
...
21:01:49.232778 IP6 (flowlabel 0xc8a82, hlim 64, next-header UDP (17) payload length: 56) fddc:7e00:e001:ee00:fa75:a4ff:fef3:42b4.59484 > fddc:7e0
0:e001:ee00::1.53: [bad udp cksum 0x7528 -> 0x9b42!] 25832+ [1au] AAAA? packages.ubuntu.com. ar: . OPT UDPsize=512 (48)
21:01:49.232862 IP6 (flowlabel 0x9137e, hlim 64, next-header UDP (17) payload length: 56) fddc:7e00:e001:ee00:fa75:a4ff:fef3:42b4.43177 > fddc:7e0
0:e001:ee00::1.53: [bad udp cksum 0x7528 -> 0x5114!] 61129+ [1au] AAAA? packages.ubuntu.com. ar: . OPT UDPsize=512 (48)
21:01:49.319885 IP6 (flowlabel 0x5decb, hlim 63, next-header UDP (17) payload length: 84) fddc:7e00:e001:ee00::1.53 > fddc:7e00:e001:ee00:fa75:a4f
f:fef3:42b4.43177: [udp sum ok] 61129 q: AAAA? packages.ubuntu.com. 1/0/1 packages.ubuntu.com. [10m] AAAA 2a01:7e00:e001:ee64::5bbd:5e25 ar: . OPT
UDPsize=1232 (76)
21:01:49.319920 IP6 (flowlabel 0x45773, hlim 63, next-header UDP (17) payload length: 84) fddc:7e00:e001:ee00::1.53 > fddc:7e00:e001:ee00:fa75:a4f
f:fef3:42b4.59484: [udp sum ok] 25832 q: AAAA? packages.ubuntu.com. 1/0/1 packages.ubuntu.com. [10m] AAAA 2a01:7e00:e001:ee64::5bbd:5e25 ar: . OPT
UDPsize=1232 (76)
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1940908
Title:
resolved: closes listening socket too rapidly and sends Destination
port unreachable
Status in systemd package in Ubuntu:
Incomplete
Bug description:
Afffects Ubuntu 18.04 through 21.04 (fixes are in systemd v248)
With systemd v245 (and v247) and systemd-resolved we're seeing
frequent problems due to resolved rapidly closing the socket on which
it sends out a query before the server has answered. The server
answers and then resolved sends an ICMP Destination Unreachable (Port
Unreachable) response!
This breaks name lookups frequently. In our case the DNS server is
reached via a Wireguard tunnel over a satellite link and latencies can
vary.
A typical example captured via tcpdump:
07:22:03.446919 IP6 fddc:7e00:e001:ee00:fffe:f875:a4f3:42b4.45338 > fddc:7e00:e001:ee00::1.53: 2963+ [1au] AAAA? contile-images.services.mozilla.com. (64)
07:22:03.501089 IP6 fddc:7e00:e001:ee00::1.53 > fddc:7e00:e001:ee00:fffe:f875:a4f3:42b4.45338: 2963 1/0/1 AAAA 2a01:7e00:e001:ee64::2278:7366 (92)
07:22:03.501152 IP6 fddc:7e00:e001:ee00:fffe:f875:a4f3:42b4 > fddc:7e00:e001:ee00::1: ICMP6, destination unreachable, unreachable port, fddc:7e00:e001:ee00:fffe:f875:a4f3:42b4 udp port 45338, length 148
The time difference here is only 0.054170 and there is no way to alter
the timeout in resolved.
There are recent upstream commits to fix this which ought to be
cherry-picked. See:
https://github.com/systemd/systemd/issues/17421
https://github.com/systemd/systemd/pull/17535
https://github.com/systemd/systemd/commit/e03d156f78cb5a0cac85d1e1310d89fdfa4f1b88
If I am reading the code correctly the timeout is very short:
src/resolve/resolved-dns-transaction.c:22:#define DNS_TIMEOUT_USEC
(SD_RESOLVED_QUERY_TIMEOUT_USEC / DNS_TRANSACTION_ATTEMPTS_MAX)
src/resolve/resolved-def.h:79:#define SD_RESOLVED_QUERY_TIMEOUT_USEC
(120 * USEC_PER_SEC)
src/resolve/resolved-dns-transaction.h:212:#define
DNS_TRANSACTION_ATTEMPTS_MAX 24
So in micro-seconds that is 120 /24 = 5 per query with, as inferred,
up to 24 attempts (I don't see multiple duplicate requests on the wire
so not sure DNS_TRANSACTION_ATTEMPTS_MAX affects this.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1940908/+subscriptions
More information about the foundations-bugs
mailing list