[Bug 1961697] Re: Transaction ID collisions cause slow DNS lookups in getaddrinfo

Michael Hudson-Doyle 1961697 at bugs.launchpad.net
Thu Mar 10 21:08:26 UTC 2022


So for SRU we ideally want a nice, self-contained, ubuntu-based test
case. Is that possible here? It reads to me as if it's a bit non-
deterministic, is that true?

@kjtsanaktsidis, do you think you can write up reproduction
instructions? If not, would you be able to test the proposed glibc in
your environment? We'll be patching focal first fwiw.

** Description changed:

- When resolving DNS names with getaddrinfo(), I have seen this hang for 5
- seconds and then retry and succeed. The issue is that glibc will issue a
- both an A and AAAA query on the same socket, and in some circumstances
- they can be sent with the same DNS transaction ID as well.
+ [impact]
+ When resolving DNS names with getaddrinfo(), I have seen this hang for 5 seconds and then retry and succeed. The issue is that glibc will issue a both an A and AAAA query on the same socket, and in some circumstances they can be sent with the same DNS transaction ID as well.
  
- I verified this with a packet capture; in the packet capture, I saw the
- A and AAAA queries for a name be made with the same DNS transaction ID,
- get responses, do nothing for five seconds, and then send the same DNS
- query again. On the glibc side, I confirmed that it's blocked waiting
- for the DNS response by interrupting it with gdb, even though the packet
- capture shows the response has well and truly arrived. I've attached a
- packet capture & a backtrace of the glibc hang.
+ [test case]
+ TBD
+ 
+ [regression potential]
+ TBD.
+ 
+ [original description]
+ I verified this with a packet capture; in the packet capture, I saw the A and AAAA queries for a name be made with the same DNS transaction ID, get responses, do nothing for five seconds, and then send the same DNS query again. On the glibc side, I confirmed that it's blocked waiting for the DNS response by interrupting it with gdb, even though the packet capture shows the response has well and truly arrived. I've attached a packet capture & a backtrace of the glibc hang.
  
  I believe this is the same issue reported in these places:
-     * In RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1904153
-     * Also RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1903880
-     * Upstream: https://sourceware.org/bugzilla/show_bug.cgi?id=26600
+     * In RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1904153
+     * Also RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1903880
+     * Upstream: https://sourceware.org/bugzilla/show_bug.cgi?id=26600
  
  The environment I noticed this bug in was:
-     * Docker for Mac on an arm64 m1 Macbook
-     * Docker for Mac Linux kernel version is 5.10.76-linuxkit
-     * Linux is also arm64, not emulated
-     * Container with the buggy DNS environment is Ubuntu bionic (also arm64, not emulated)
-     * Glibc 2.27-3ubuntu1.4
+     * Docker for Mac on an arm64 m1 Macbook
+     * Docker for Mac Linux kernel version is 5.10.76-linuxkit
+     * Linux is also arm64, not emulated
+     * Container with the buggy DNS environment is Ubuntu bionic (also arm64, not emulated)
+     * Glibc 2.27-3ubuntu1.4
  
  However one of the redhat reporters noticed this issue in m6 series EC2
  instances in AWS.
  
  A patch has been provided upstream for this issue:
  https://sourceware.org/pipermail/libc-alpha/2020-September/117547.html
  
  I applied the upstream patch to glibc 2.27-3ubuntu1.4 and rebuilt the
  package, and the problem went away. I've attached the exact patch I
  applied, since I had to work through some conflicts.
  
  So, I think that patch just needs to be backported to Bionic and (I
  think) Focal as well. Is that reasonable?
  
  Thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1961697

Title:
  Transaction ID collisions cause slow DNS lookups in getaddrinfo

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Confirmed
Status in glibc source package in Focal:
  New

Bug description:
  [impact]
  When resolving DNS names with getaddrinfo(), I have seen this hang for 5 seconds and then retry and succeed. The issue is that glibc will issue a both an A and AAAA query on the same socket, and in some circumstances they can be sent with the same DNS transaction ID as well.

  [test case]
  TBD

  [regression potential]
  TBD.

  [original description]
  I verified this with a packet capture; in the packet capture, I saw the A and AAAA queries for a name be made with the same DNS transaction ID, get responses, do nothing for five seconds, and then send the same DNS query again. On the glibc side, I confirmed that it's blocked waiting for the DNS response by interrupting it with gdb, even though the packet capture shows the response has well and truly arrived. I've attached a packet capture & a backtrace of the glibc hang.

  I believe this is the same issue reported in these places:
      * In RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1904153
      * Also RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1903880
      * Upstream: https://sourceware.org/bugzilla/show_bug.cgi?id=26600

  The environment I noticed this bug in was:
      * Docker for Mac on an arm64 m1 Macbook
      * Docker for Mac Linux kernel version is 5.10.76-linuxkit
      * Linux is also arm64, not emulated
      * Container with the buggy DNS environment is Ubuntu bionic (also arm64, not emulated)
      * Glibc 2.27-3ubuntu1.4

  However one of the redhat reporters noticed this issue in m6 series
  EC2 instances in AWS.

  A patch has been provided upstream for this issue:
  https://sourceware.org/pipermail/libc-alpha/2020-September/117547.html

  I applied the upstream patch to glibc 2.27-3ubuntu1.4 and rebuilt the
  package, and the problem went away. I've attached the exact patch I
  applied, since I had to work through some conflicts.

  So, I think that patch just needs to be backported to Bionic and (I
  think) Focal as well. Is that reasonable?

  Thanks!

To manage notifications about this bug go to:
https://bugs.launchpad.net/glibc/+bug/1961697/+subscriptions




More information about the foundations-bugs mailing list