[Bug 2003851] Re: occasional hanging 'apt-get update' from daily cronjob since Jammy 22.04

Julian Andres Klode 2003851 at bugs.launchpad.net
Tue Jun 24 16:15:49 UTC 2025


** Description changed:

+ [Impact]
+ APT can hang during `apt update` if a server that we request multiple repositories from produces a temporary failure such as a 503 for at least one of the InRelease file after successfully fetching another one.
+ 
+ The expected behavior is that APT retries here, and does not hang, and
+ eventually fails if the error is persistent.
+ 
+ [Test plan]
+ The extensive test suite of apt has been extended with a regression test 
+ 
+     test/integration/test-ubuntu-bug-2003851-retry-after
+ 
+ that reproduces the problematic scenario and ensures it is fixed.
+ 
+ Reproducing this with real-life deployments is tricky since it does
+ depend on network timing to some extend: The failing InRelease file must
+ be recorded as failing *before* we start queueing Packages from the good
+ repository.
+ 
+ [Where problems could occur]
+ The SRU addresses the problem by changing the ordering of items in the queue to enqueue them with decreasing order of retry-after times, rather than increasing ones. This ensures that queues that got stuck are detected as such, but it invariably changes the behavior of the retry functionality:
+ 
+ Previously the retry-after was file specific. If file A failed, we still
+ tried file B before trying file A again. Now files with a retry-after
+ come in decreasing order, so the longest retry time effectively is
+ applied to all files not yet requested from the server.
+ 
+ This can cause APT to operate slower, but on the flip side, it avoids
+ bombarding the server with follow-up requests for a good InRelease file
+ such as Packages which then likely also fail - failures in real life
+ aren't usually file specific, given that we serve fixed files; even on a
+ CDN, it's likely the entire backend fails rather than only one file.
+ 
+ There's some risk that APT could get stuck on other scenarios now,
+ albeit, given the nature of the fix, this seems reasonably low.
+ 
+ [Original bug report]
  Hi!
  
  Yesterday I spotted several machines of ours where a period `apt-get
  update` was stalled. The `http` children were hanging in `WaitFd`
  (waiting for parent instructions/queue). The parent was looping in
  `AcquireUpdate` every 500ms.
  
- 
- We have a cronjob that runs every few hours which calls `apt-get update` and does some post-processing. We noticed that several of them had stalled at some point in time. Killing the parent (apt-get) got it unstuck, removing the locks.
+ We have a cronjob that runs every few hours which calls `apt-get update`
+ and does some post-processing. We noticed that several of them had
+ stalled at some point in time. Killing the parent (apt-get) got it
+ unstuck, removing the locks.
  
  Example:
  ```
  # apt-get update
  Reading package lists... Done
  E: Could not get lock /var/lib/apt/lists/lock. It is held by process 154026 (apt-get)
  N: Be aware that removing the lock file is not a solution and may break your system.
  E: Unable to lock directory /var/lib/apt/lists/
  ```
  
  Task listing:
  ```
  root      153929  \_ /usr/sbin/CRON -f -P
  root      153942      \_ /bin/sh -c  [ -x /etc/zabbix/scripts/dpkg.updates ] && /etc/zabbix/scripts/dpkg.updates --cron
  root      153943          \_ /bin/sh /etc/zabbix/scripts/dpkg.updates --cron
  root      154026              \_ apt-get update
  _apt      154029                  \_ /usr/lib/apt/methods/http
  _apt      154030                  \_ /usr/lib/apt/methods/http
  _apt      154031                  \_ /usr/lib/apt/methods/http
  _apt      154033                  \_ /usr/lib/apt/methods/gpgv
  ```
  Open (TCP) sockets. All have 1 item in the Recv-Q (probably a FIN or RST?):
  ```
  # netstat -apn | grep -E '154026|154029|154030|154031|154033'
- tcp  1  0  10.x.x.x:60868  217.x.x.x:80  CLOSE_WAIT  154030/http         
- tcp  1  0  10.x.x.x:40756  178.x.x.x:80  CLOSE_WAIT  154029/http         
- tcp  1  0  10.x.x.x:56818  185.x.x.x:80  CLOSE_WAIT  154031/http         
+ tcp  1  0  10.x.x.x:60868  217.x.x.x:80  CLOSE_WAIT  154030/http
+ tcp  1  0  10.x.x.x:40756  178.x.x.x:80  CLOSE_WAIT  154029/http
+ tcp  1  0  10.x.x.x:56818  185.x.x.x:80  CLOSE_WAIT  154031/http
  ```
  All children (including gpgv) were waiting using pselect6(1, [0], NULL, NULL, NULL, NULL).
  
  The parent (apt-get) was waiting using pselect6(10, [5 6 7 9], [], NULL,
  {tv_sec=0, tv_nsec=500000000}, NULL).
  
  The http sockets in the children were at fd=3.
  
  Parent lsof:
  ```
  # lsof -p 154026 +E
  ...
  apt-get   154026 root    4uW  REG    8,1        0  262281 /var/lib/apt/lists/lock
  apt-get   154026 root    5r  FIFO   0,13      0t0 4015176 pipe 154029,http,1w
  apt-get   154026 root    6r  FIFO   0,13      0t0 4012448 pipe 154030,http,1w
  apt-get   154026 root    7r  FIFO   0,13      0t0 4015192 pipe 154031,http,1w
  apt-get   154026 root    8w  FIFO   0,13      0t0 4015177 pipe 154029,http,0r
  apt-get   154026 root    9r  FIFO   0,13      0t0 4015233 pipe 154033,gpgv,1w
  apt-get   154026 root   10w  FIFO   0,13      0t0 4012449 pipe 154030,http,0r
  apt-get   154026 root   12w  FIFO   0,13      0t0 4015193 pipe 154031,http,0r
  apt-get   154026 root   14w  FIFO   0,13      0t0 4015234 pipe 154033,gpgv,0r
  http      154029 _apt    0r  FIFO   0,13      0t0 4015177 pipe 154026,apt-get,8w
  http      154029 _apt    1w  FIFO   0,13      0t0 4015176 pipe 154026,apt-get,5r
  http      154030 _apt    0r  FIFO   0,13      0t0 4012449 pipe 154026,apt-get,10w
  http      154030 _apt    1w  FIFO   0,13      0t0 4012448 pipe 154026,apt-get,6r
  http      154031 _apt    0r  FIFO   0,13      0t0 4015193 pipe 154026,apt-get,12w
  http      154031 _apt    1w  FIFO   0,13      0t0 4015192 pipe 154026,apt-get,7r
  gpgv      154033 _apt    0r  FIFO   0,13      0t0 4015234 pipe 154026,apt-get,14w
  gpgv      154033 _apt    1w  FIFO   0,13      0t0 4015233 pipe 154026,apt-get,9r
  ```
  So:
  - apt-get is waiting for any data written by any of its four children (at fd 5/6/7/9)
  - http and gpgv are waiting for any data written by their parent (at their respective fd 0)
  
  Parent backtrace:
  ```
  #0  0x00007f420116a74d in select ()
-    from /lib/x86_64-linux-gnu/libc.so.6
+    from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x00007f420153fb5d in pkgAcquire::Run(int) ()
-    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
+    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
  #2  0x00007f420161d535 in AcquireUpdate(pkgAcquire&, int, bool, bool) ()
-    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
+    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
  #3  0x00007f420161d986 in ListUpdate(pkgAcquireStatus&, pkgSourceList&, int) ()
-    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
+    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
  #4  0x00007f42016d127b in DoUpdate (CmdL=...)
-     at ./apt-private/private-update.cc:73
+     at ./apt-private/private-update.cc:73
  #5  0x00007f420156d73f in CommandLine::DispatchArg(CommandLine::Dispatch const*, bool) ()
-    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
- #6  0x00007f420169fa97 in DispatchCommandLine (CmdL=..., 
-     Cmds=std::vector of length 27, capacity 32 = {...})
-     at ./apt-private/private-cmndline.cc:588
+    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
+ #6  0x00007f420169fa97 in DispatchCommandLine (CmdL=...,
+     Cmds=std::vector of length 27, capacity 32 = {...})
+     at ./apt-private/private-cmndline.cc:588
  #7  0x0000561fc06bafbd in main (argc=2, argv=0x7ffc0e14aa38)
-     at ./cmdline/apt-get.cc:447
+     at ./cmdline/apt-get.cc:447
  ```
  Child backtrace:
  ```
  #0  0x00007f58b1c9b74d in select ()
-    from /lib/x86_64-linux-gnu/libc.so.6
+    from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x00007f58b237cb43 in WaitFd(int, bool, unsigned long) ()
-    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
+    from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
  #2  0x00005643e37bc7b3 in BaseHttpMethod::Loop (
-     this=0x7ffdeaed56c0) at ./methods/basehttp.cc:611
+     this=0x7ffdeaed56c0) at ./methods/basehttp.cc:611
  #3  main (argv=<optimized out>) at ./methods/http.cc:1052
  ```
  Where this is:
  ```
  (gdb) print *this
  $1 = {<aptAuthConfMethod> = {<aptMethod> = {<pkgAcqMethod> = {<No data fields>}, <aptConfigWrapperForMethods> = {
-         methodNames = std::vector of length 2, capacity 2 = {
-           "http", "http::XXXXXX.nl"}}, Binary = "http", 
-       SeccompFlags = 6}, 
-     authconfs = std::vector of length 1, capacity 1 = {
-       std::unique_ptr<FileFd> = {get() = 0x5643e5343100}}}, 
-   Server = std::unique_ptr<ServerState> = {
-     get() = 0x5643e534fba0}, NextURI = "", 
-   AllowRedirect = true, Debug = false, PipelineDepth = 10, 
-   static FailFile = {static npos = 18446744073709551615, 
-     _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
-       _M_p = 0x5643e534d650 "/var/lib/apt/lists/partial/apt.osso.nl_ubuntu-security_dists_jammy-security_InRelease"}, 
-     _M_string_length = 85, {
-       _M_local_buf = "\226", '\000' <repeats 14 times>, 
-       _M_allocated_capacity = 150}}, static FailFd = -1, 
-   static FailTime = 1674071760}
+         methodNames = std::vector of length 2, capacity 2 = {
+           "http", "http::XXXXXX.nl"}}, Binary = "http",
+       SeccompFlags = 6},
+     authconfs = std::vector of length 1, capacity 1 = {
+       std::unique_ptr<FileFd> = {get() = 0x5643e5343100}}},
+   Server = std::unique_ptr<ServerState> = {
+     get() = 0x5643e534fba0}, NextURI = "",
+   AllowRedirect = true, Debug = false, PipelineDepth = 10,
+   static FailFile = {static npos = 18446744073709551615,
+     _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
+       _M_p = 0x5643e534d650 "/var/lib/apt/lists/partial/apt.osso.nl_ubuntu-security_dists_jammy-security_InRelease"},
+     _M_string_length = 85, {
+       _M_local_buf = "\226", '\000' <repeats 14 times>,
+       _M_allocated_capacity = 150}}, static FailFd = -1,
+   static FailTime = 1674071760}
  ```
  
  Relevant code in parent:
  ```
  pkgAcquire::RunResult pkgAcquire::Run(int PulseInterval)
  {
  ...
-    // Run till all things have been acquired
-    struct timeval tv = SteadyDurationToTimeVal(std::chrono::microseconds(PulseInterval));
-    while (ToFetch > 0)
+    // Run till all things have been acquired
+    struct timeval tv = SteadyDurationToTimeVal(std::chrono::microseconds(PulseInterval));
+    while (ToFetch > 0)
  ...
-       int Res;
-       do
-       {
-          Res = select(Highest+1,&RFds,&WFds,0,&tv);
-       }
-       while (Res < 0 && errno == EINTR);
+       int Res;
+       do
+       {
+          Res = select(Highest+1,&RFds,&WFds,0,&tv);
+       }
+       while (Res < 0 && errno == EINTR);
  ```
  Relevant code in child:
  ```
  int BaseHttpMethod::Loop()
  {
-    signal(SIGTERM,SigTerm);
-    signal(SIGINT,SigTerm);
-    
-    Server = 0;
-    
-    int FailCounter = 0;
-    while (1)
-    {      
-       // We have no commands, wait for some to arrive
-       if (Queue == 0)
-       {
-          if (WaitFd(STDIN_FILENO) == false)
-             return 0;
+    signal(SIGTERM,SigTerm);
+    signal(SIGINT,SigTerm);
+ 
+    Server = 0;
+ 
+    int FailCounter = 0;
+    while (1)
+    {
+       // We have no commands, wait for some to arrive
+       if (Queue == 0)
+       {
+          if (WaitFd(STDIN_FILENO) == false)
+             return 0;
  ```
  
  Versions:
  ```
  Ubuntu 22.04.1 LTS (Jammy Jellyfish)
  apt 2.4.8
  ```
  
  Preliminiary conclusion/notes:
  
  - I don't know why it stalls, but it does. And this was not a single
  incident. All the stalling apt processes had stopped at a different
  time/day.
  
  - Apparently there is a situation possible where there are no commands
  queued from the parent, and the parent isn't planning on sending any
  either.
  
  - Our apt-get is invoked by a cron job which calls either `update-
  notifier-common` or `apt-get update`: https://github.com/ossobv/zabbix-
  agent-
  osso/blob/c61aee6087c0d03e66d03013c61acd9f65d0eaab/scripts/dpkg.updates#L14-L31
  -- on the machines where apt-get was hanging, there was no `update-
  notifier-common`.
  
  - I don't know if this is 2.4.x related (on Focal we have apt 2.0.x),
  but it sure looks like it. I did check a few other Focal machines that
  lacked `update-notifier-common`, but they were not having a stalled apt-
  get (although _not_ seeing a hung process proves nothing)
  
- 
- Let me know if there's anything I can get you. Should this be filed elsewhere? (Debian Salsa?)
+ Let me know if there's anything I can get you. Should this be filed
+ elsewhere? (Debian Salsa?)
  
  Cheers,
  Walter Doekes
  OSSO B.V.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to apt in Ubuntu.
https://bugs.launchpad.net/bugs/2003851

Title:
  occasional hanging 'apt-get update' from daily cronjob since Jammy
  22.04

Status in apt package in Ubuntu:
  In Progress
Status in apt source package in Jammy:
  New
Status in apt source package in Noble:
  New
Status in apt source package in Oracular:
  New
Status in apt source package in Plucky:
  New
Status in apt source package in Questing:
  In Progress

Bug description:
  [Impact]
  APT can hang during `apt update` if a server that we request multiple repositories from produces a temporary failure such as a 503 for at least one of the InRelease file after successfully fetching another one.

  The expected behavior is that APT retries here, and does not hang, and
  eventually fails if the error is persistent.

  [Test plan]
  The extensive test suite of apt has been extended with a regression test 

      test/integration/test-ubuntu-bug-2003851-retry-after

  that reproduces the problematic scenario and ensures it is fixed.

  Reproducing this with real-life deployments is tricky since it does
  depend on network timing to some extend: The failing InRelease file
  must be recorded as failing *before* we start queueing Packages from
  the good repository.

  [Where problems could occur]
  The SRU addresses the problem by changing the ordering of items in the queue to enqueue them with decreasing order of retry-after times, rather than increasing ones. This ensures that queues that got stuck are detected as such, but it invariably changes the behavior of the retry functionality:

  Previously the retry-after was file specific. If file A failed, we
  still tried file B before trying file A again. Now files with a retry-
  after come in decreasing order, so the longest retry time effectively
  is applied to all files not yet requested from the server.

  This can cause APT to operate slower, but on the flip side, it avoids
  bombarding the server with follow-up requests for a good InRelease
  file such as Packages which then likely also fail - failures in real
  life aren't usually file specific, given that we serve fixed files;
  even on a CDN, it's likely the entire backend fails rather than only
  one file.

  There's some risk that APT could get stuck on other scenarios now,
  albeit, given the nature of the fix, this seems reasonably low.

  [Original bug report]
  Hi!

  Yesterday I spotted several machines of ours where a period `apt-get
  update` was stalled. The `http` children were hanging in `WaitFd`
  (waiting for parent instructions/queue). The parent was looping in
  `AcquireUpdate` every 500ms.

  We have a cronjob that runs every few hours which calls `apt-get
  update` and does some post-processing. We noticed that several of them
  had stalled at some point in time. Killing the parent (apt-get) got it
  unstuck, removing the locks.

  Example:
  ```
  # apt-get update
  Reading package lists... Done
  E: Could not get lock /var/lib/apt/lists/lock. It is held by process 154026 (apt-get)
  N: Be aware that removing the lock file is not a solution and may break your system.
  E: Unable to lock directory /var/lib/apt/lists/
  ```

  Task listing:
  ```
  root      153929  \_ /usr/sbin/CRON -f -P
  root      153942      \_ /bin/sh -c  [ -x /etc/zabbix/scripts/dpkg.updates ] && /etc/zabbix/scripts/dpkg.updates --cron
  root      153943          \_ /bin/sh /etc/zabbix/scripts/dpkg.updates --cron
  root      154026              \_ apt-get update
  _apt      154029                  \_ /usr/lib/apt/methods/http
  _apt      154030                  \_ /usr/lib/apt/methods/http
  _apt      154031                  \_ /usr/lib/apt/methods/http
  _apt      154033                  \_ /usr/lib/apt/methods/gpgv
  ```
  Open (TCP) sockets. All have 1 item in the Recv-Q (probably a FIN or RST?):
  ```
  # netstat -apn | grep -E '154026|154029|154030|154031|154033'
  tcp  1  0  10.x.x.x:60868  217.x.x.x:80  CLOSE_WAIT  154030/http
  tcp  1  0  10.x.x.x:40756  178.x.x.x:80  CLOSE_WAIT  154029/http
  tcp  1  0  10.x.x.x:56818  185.x.x.x:80  CLOSE_WAIT  154031/http
  ```
  All children (including gpgv) were waiting using pselect6(1, [0], NULL, NULL, NULL, NULL).

  The parent (apt-get) was waiting using pselect6(10, [5 6 7 9], [],
  NULL, {tv_sec=0, tv_nsec=500000000}, NULL).

  The http sockets in the children were at fd=3.

  Parent lsof:
  ```
  # lsof -p 154026 +E
  ...
  apt-get   154026 root    4uW  REG    8,1        0  262281 /var/lib/apt/lists/lock
  apt-get   154026 root    5r  FIFO   0,13      0t0 4015176 pipe 154029,http,1w
  apt-get   154026 root    6r  FIFO   0,13      0t0 4012448 pipe 154030,http,1w
  apt-get   154026 root    7r  FIFO   0,13      0t0 4015192 pipe 154031,http,1w
  apt-get   154026 root    8w  FIFO   0,13      0t0 4015177 pipe 154029,http,0r
  apt-get   154026 root    9r  FIFO   0,13      0t0 4015233 pipe 154033,gpgv,1w
  apt-get   154026 root   10w  FIFO   0,13      0t0 4012449 pipe 154030,http,0r
  apt-get   154026 root   12w  FIFO   0,13      0t0 4015193 pipe 154031,http,0r
  apt-get   154026 root   14w  FIFO   0,13      0t0 4015234 pipe 154033,gpgv,0r
  http      154029 _apt    0r  FIFO   0,13      0t0 4015177 pipe 154026,apt-get,8w
  http      154029 _apt    1w  FIFO   0,13      0t0 4015176 pipe 154026,apt-get,5r
  http      154030 _apt    0r  FIFO   0,13      0t0 4012449 pipe 154026,apt-get,10w
  http      154030 _apt    1w  FIFO   0,13      0t0 4012448 pipe 154026,apt-get,6r
  http      154031 _apt    0r  FIFO   0,13      0t0 4015193 pipe 154026,apt-get,12w
  http      154031 _apt    1w  FIFO   0,13      0t0 4015192 pipe 154026,apt-get,7r
  gpgv      154033 _apt    0r  FIFO   0,13      0t0 4015234 pipe 154026,apt-get,14w
  gpgv      154033 _apt    1w  FIFO   0,13      0t0 4015233 pipe 154026,apt-get,9r
  ```
  So:
  - apt-get is waiting for any data written by any of its four children (at fd 5/6/7/9)
  - http and gpgv are waiting for any data written by their parent (at their respective fd 0)

  Parent backtrace:
  ```
  #0  0x00007f420116a74d in select ()
     from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x00007f420153fb5d in pkgAcquire::Run(int) ()
     from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
  #2  0x00007f420161d535 in AcquireUpdate(pkgAcquire&, int, bool, bool) ()
     from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
  #3  0x00007f420161d986 in ListUpdate(pkgAcquireStatus&, pkgSourceList&, int) ()
     from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
  #4  0x00007f42016d127b in DoUpdate (CmdL=...)
      at ./apt-private/private-update.cc:73
  #5  0x00007f420156d73f in CommandLine::DispatchArg(CommandLine::Dispatch const*, bool) ()
     from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
  #6  0x00007f420169fa97 in DispatchCommandLine (CmdL=...,
      Cmds=std::vector of length 27, capacity 32 = {...})
      at ./apt-private/private-cmndline.cc:588
  #7  0x0000561fc06bafbd in main (argc=2, argv=0x7ffc0e14aa38)
      at ./cmdline/apt-get.cc:447
  ```
  Child backtrace:
  ```
  #0  0x00007f58b1c9b74d in select ()
     from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x00007f58b237cb43 in WaitFd(int, bool, unsigned long) ()
     from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
  #2  0x00005643e37bc7b3 in BaseHttpMethod::Loop (
      this=0x7ffdeaed56c0) at ./methods/basehttp.cc:611
  #3  main (argv=<optimized out>) at ./methods/http.cc:1052
  ```
  Where this is:
  ```
  (gdb) print *this
  $1 = {<aptAuthConfMethod> = {<aptMethod> = {<pkgAcqMethod> = {<No data fields>}, <aptConfigWrapperForMethods> = {
          methodNames = std::vector of length 2, capacity 2 = {
            "http", "http::XXXXXX.nl"}}, Binary = "http",
        SeccompFlags = 6},
      authconfs = std::vector of length 1, capacity 1 = {
        std::unique_ptr<FileFd> = {get() = 0x5643e5343100}}},
    Server = std::unique_ptr<ServerState> = {
      get() = 0x5643e534fba0}, NextURI = "",
    AllowRedirect = true, Debug = false, PipelineDepth = 10,
    static FailFile = {static npos = 18446744073709551615,
      _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
        _M_p = 0x5643e534d650 "/var/lib/apt/lists/partial/apt.osso.nl_ubuntu-security_dists_jammy-security_InRelease"},
      _M_string_length = 85, {
        _M_local_buf = "\226", '\000' <repeats 14 times>,
        _M_allocated_capacity = 150}}, static FailFd = -1,
    static FailTime = 1674071760}
  ```

  Relevant code in parent:
  ```
  pkgAcquire::RunResult pkgAcquire::Run(int PulseInterval)
  {
  ...
     // Run till all things have been acquired
     struct timeval tv = SteadyDurationToTimeVal(std::chrono::microseconds(PulseInterval));
     while (ToFetch > 0)
  ...
        int Res;
        do
        {
           Res = select(Highest+1,&RFds,&WFds,0,&tv);
        }
        while (Res < 0 && errno == EINTR);
  ```
  Relevant code in child:
  ```
  int BaseHttpMethod::Loop()
  {
     signal(SIGTERM,SigTerm);
     signal(SIGINT,SigTerm);

     Server = 0;

     int FailCounter = 0;
     while (1)
     {
        // We have no commands, wait for some to arrive
        if (Queue == 0)
        {
           if (WaitFd(STDIN_FILENO) == false)
              return 0;
  ```

  Versions:
  ```
  Ubuntu 22.04.1 LTS (Jammy Jellyfish)
  apt 2.4.8
  ```

  Preliminiary conclusion/notes:

  - I don't know why it stalls, but it does. And this was not a single
  incident. All the stalling apt processes had stopped at a different
  time/day.

  - Apparently there is a situation possible where there are no commands
  queued from the parent, and the parent isn't planning on sending any
  either.

  - Our apt-get is invoked by a cron job which calls either `update-
  notifier-common` or `apt-get update`:
  https://github.com/ossobv/zabbix-agent-
  osso/blob/c61aee6087c0d03e66d03013c61acd9f65d0eaab/scripts/dpkg.updates#L14-L31
  -- on the machines where apt-get was hanging, there was no `update-
  notifier-common`.

  - I don't know if this is 2.4.x related (on Focal we have apt 2.0.x),
  but it sure looks like it. I did check a few other Focal machines that
  lacked `update-notifier-common`, but they were not having a stalled
  apt-get (although _not_ seeing a hung process proves nothing)

  Let me know if there's anything I can get you. Should this be filed
  elsewhere? (Debian Salsa?)

  Cheers,
  Walter Doekes
  OSSO B.V.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2003851/+subscriptions




More information about the foundations-bugs mailing list