[Bug 1528921] Re: rsync hangs on select(5, [], [4], [], {60, 0}
Sir Pamelton
spam at codefreak.net
Tue Mar 6 12:45:51 UTC 2018
A few weeks ago one of my five rsync backup processes that runs from a
daily cron 03:00 am has started failing with this exact issue.
I can only manually reproduce it when executing my rsync command with
tripe verbosity:
rsync -avvv --delete /source_dir /target_dir
Manually calling rsync with less than triple verbosity runs the whole
process as expected. However the hanging command used in cron is
$ rsync -aq --delete /source_dir /target_dir > /dev/null
$ strace -p 16857
16857 16:38:37 lstat("Saved/SavedArks/retracted.profilebak", {st_mode=S_IFREG|0600, st_size=22986, ...}) = 0
16857 16:38:37 write(1, "[sender] make_file(Saved/SavedArks/retracted.profilebak,*,2)\n", 69) = 69
16857 16:38:37 lstat("Saved/SavedArks/retracted.arkprofile", {st_mode=S_IFREG|0600, st_size=1810, ...}) = 0
16857 16:38:37 write(1, "[sender] make_file(Saved/SavedArks/retracted.arkprofile,*,2)\n", 69) = 69
16857 16:38:37 select(5, [], [4], [], {60, 0} <unfinished ...>
16858 16:39:36 <... select resumed> ) = 0 (Timeout)
16858 16:39:36 select(2, [], [1], [], {60, 0} <unfinished ...>
16859 16:39:37 <... select resumed> ) = 0 (Timeout)
16859 16:39:37 select(5, [], [4], [], {60, 0} <unfinished ...>
16857 16:39:37 <... select resumed> ) = 0 (Timeout)
16857 16:39:37 select(5, [], [4], [], {60, 0}
$ lsof -p 15678 -ad 4,5 # note: I used multiple processes over multiple days, the PID will not be constant
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rsync 15678 pamel 4u unix 0xffff8f88f9fec400 0t0 44294 type=STREAM
rsync 15678 pamel 5u unix 0xffff8f88f9614800 0t0 44295 type=STREAM
$ ls -l /proc/15678/fd/4
lrwx------ 1 pamel pamel 64 Mar 6 13:35 /proc/15678/fd/4 -> socket:[44294]
$ ls -l /proc/15678/fd/5
lrwx------ 1 pamel pamel 64 Mar 6 13:35 /proc/15678/fd/5 -> socket:[44295]
$ ss -nxp | grep -E '44294|44295'
u_str ESTAB 0 59136 * 44296 * 44295 users:(("rsync",pid=15679,fd=1))
u_str ESTAB 0 97024 * 44294 * 44293 users:(("rsync",pid=15678,fd=4))
u_str ESTAB 82063 0 * 44293 * 44294 users:(("rsync",pid=15680,fd=0))
u_str ESTAB 53885 0 * 44295 * 44296 users:(("rsync",pid=15678,fd=5))
I have this issue on Ubuntu 17.04 with rsync 3.1.2, Debian 9.3 with
rsync 3.1.2 and 3.1.3 (all as VM on ESXi 6.5). The only constants are
the files that are being transferred: 6.6GB over 11'437 files. I am also
certain that it does not hang on a specific file as it's sometimes a log
file and sometimes a profile. I would say it happens 0.5s into the
process after transferring about 44 files.
$ rsync -avvv --delete /srv/ark/redacted/ShooterGame/Saved /srv/ark/backup/redacted/
[...]
[sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2)
[sender] make_file(Saved/SavedArks/redacted.arktribe,*,2)
[sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2)
send_files(29, /srv/ark/redacted/ShooterGame/Saved/Logs)
Saved/Logs/
send_files(30, /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log)
send_files mapped /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log of size 65472
calling match_sums /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
^C
[...]
recv_file_name(Saved/SavedArks/redacted.profilebak)
recv_file_name(Saved/SavedArks/redacted.arkprofile)
recv_file_name(Saved/SavedArks/redacted.arkprofile)
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(504) [generator=3.1.3]
[generator] _exit_cleanup(code=20, file=io.c, line=504): about to call exit(20)
rsync: [generator] write error: Broken pipe (32)
$ md5sum /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
52596de045ba1759c68dfb8c1cd2d970 /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
$ id
uid=1001(pamel) gid=1001(pamel) groups=1001(pamel),60(games)
$ ls -lA /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
-rw------- 1 pamel pamel 65472 Apr 1 2017 /srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
$ ls -lA /srv/ark/backup/
total 4
[...]
drwxr-xr-x 3 pamel pamel 4096 Mar 6 13:25 redacted
I've found other occurences of this issue but mostly people say "this
sounds like a network issue". I'm only rsyncing between two local
directories so I really hope it has nothing to do with network.
Is there anything else I can do to further debug this issue? I can't
really send 6.6GB of confidential data to replicate this issue and it's
a really big issue that now has started to occur daily and requires me
to manually abort the process and restart the services every morning.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to rsync in Ubuntu.
https://bugs.launchpad.net/bugs/1528921
Title:
rsync hangs on select(5, [], [4], [], {60, 0}
Status in rsync package in Ubuntu:
Confirmed
Bug description:
In the last few months my home directory backup stopped completing.
I've been able to reproduce the problem on a single subdirectory
although I had to add the --debug=all flag to reproduce it on that
smaller directory. Specifically, this command never completes:
rsync --debug=all -avz /tmp/html2 /tmp/rsynctest/
The html2 directory is a copy of
gnuradio-3.7.8.1/build/docs/doxygen/html .
When I strace the command, I see this:
write(1, "sender finished /tmp/html2/atsc_"..., 58sender finished /tmp/html2/atsc__interleaver_8h__incl.md5
) = 58
write(1, "send_files(338, /tmp/html2/atsc_"..., 59send_files(338, /tmp/html2/atsc__interleaver_8h__incl.png)
) = 59
open("html2/atsc__interleaver_8h__incl.png", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0664, st_size=264657, ...}) = 0
write(1, "html2/atsc__interleaver_8h__incl"..., 37html2/atsc__interleaver_8h__incl.png
) = 37
read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\n\253\0\0\2\233\10\6\0\0\0h\242\""..., 262144) = 262144
select(6, [5], [4], [5], {60, 0}) = 2 (in [5], out [4], left {59, 999996})
read(5, "\0\0\0\0\0\0\0\1\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\240\0\0\0"..., 95) = 95
write(4, "r\311\0\7\177\377\232\237\264\272e\300\300\240\316\264&\314\301\252*\37\256y\225g\373^\315j\370\350"..., 51574) = 51574
select(5, [], [4], [], {60, 0}) = 1 (out [4], left {59, 999997})
write(4, "\7\320\0\7\177\377\234|\7X\223Y\273\255c\27\25f\306\212\202\214#E\272\212t\1\225A\fU"..., 53259) = 53259
select(5, [], [4], [], {60, 0}
The select command times out over and over. I get the same behavior
when trying to back up my entire home directory but I don't need the
--debug=all flag in that case.
lsb_release -rd
Description: Ubuntu 14.04.3 LTS
Release: 14.04
apt-cache policy rsync
rsync:
Installed: 3.1.0-2ubuntu0.1
Candidate: 3.1.0-2ubuntu0.1
Version table:
*** 3.1.0-2ubuntu0.1 0
500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main i386 Packages
500 http://security.ubuntu.com/ubuntu/ trusty-security/main i386 Packages
100 /var/lib/dpkg/status
3.1.0-2 0
500 http://us.archive.ubuntu.com/ubuntu/ trusty/main i386 Packages
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: rsync 3.1.0-2ubuntu0.1
ProcVersionSignature: Ubuntu 3.13.0-74.118-generic 3.13.11-ckt30
Uname: Linux 3.13.0-74-generic i686
NonfreeKernelModules: nvidia
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: i386
CurrentDesktop: KDE
Date: Wed Dec 23 09:44:17 2015
EcryptfsInUse: Yes
InstallationDate: Installed on 2010-09-18 (1922 days ago)
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Beta i386 (20100901.1)
SourcePackage: rsync
UpgradeStatus: Upgraded to trusty on 2014-12-27 (361 days ago)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsync/+bug/1528921/+subscriptions
More information about the foundations-bugs
mailing list