[Bug 1704843] [NEW] segfault processing very large input list
Chris Drost
1704843 at bugs.launchpad.net
Mon Jul 17 18:15:23 UTC 2017
Public bug reported:
The attached apport file was created from a segfault/core-dump observed
while using wget to try to audit a large number of websites to determine
which ones were online, which were redirects and where they redirected
to, etc.
The exact command-line attempts a considerable amount of obfuscation and
cares nothing at all for the files that are actually downloaded, which
are occasionally harvested for free space. The harvester did not run
anytime near this crash, though.
wget --tries=3 -i /path/to/getlist.txt -U 'Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115
Safari/537.36' --header="Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
--header="Accept-Encoding: gzip, deflate, br" --header="Accept-Language:
en-US,en;q=0.8" --header="Cache-Control: max-age=0" --header="Referer:
https://www.google.com/" -e robots=off --wait 0.5 --random-wait 2>&1 |
tee /path/to/logfile.txt
The getlist contained 144,551 URLs to process; this happened at the
44,417th URL. Wget successfully downloads the nearby URLs just fine now;
but here is the last several lines of logfile.txt:
- - - - - - - -
--2017-07-15 04:05:13-- http://urlshortener.actorsandcrew.com/
Resolving urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)... 64.13.228.85
Connecting to urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)|64.13.228.85|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1515 (1.5K) [text/html]
Saving to: ‘index.html.4732’
0K . 100%
127M=0s
2017-07-15 04:05:19 (127 MB/s) - ‘index.html.4732’ saved [1515/1515]
--2017-07-15 04:05:19-- http://varganess.soclog.se/p
Resolving varganess.soclog.se (varganess.soclog.se)... 83.140.155.4
Connecting to varganess.soclog.se (varganess.soclog.se)|83.140.155.4|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Location: http://dayviews.com [following]
--2017-07-15 04:05:25-- http://dayviews.com/
Connecting to dayviews.com (dayviews.com)|83.140.155.40|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘p’
0K .......... ........
115K=0.2s
2017-07-15 04:05:26 (115 KB/s) - ‘p’ saved [19057]
- - - - - - - -
The next site up for audit after this saved event was emitted was
http://drivingrevenue.com/ , which also downloads just fine when I run
it as a one-off.
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: wget 1.17.1-1ubuntu1.2
ProcVersionSignature: Ubuntu 4.4.0-75.96-generic 4.4.59
Uname: Linux 4.4.0-75-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.9
Architecture: amd64
Date: Mon Jul 17 12:40:33 2017
InstallationDate: Installed on 2014-06-23 (1120 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
ProcEnviron:
LC_CTYPE=en_US.UTF-8
TERM=screen
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: wget
UpgradeStatus: Upgraded to xenial on 2016-05-05 (437 days ago)
** Affects: wget (Ubuntu)
Importance: Undecided
Status: New
** Tags: amd64 apport-bug xenial
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to wget in Ubuntu.
https://bugs.launchpad.net/bugs/1704843
Title:
segfault processing very large input list
Status in wget package in Ubuntu:
New
Bug description:
The attached apport file was created from a segfault/core-dump
observed while using wget to try to audit a large number of websites
to determine which ones were online, which were redirects and where
they redirected to, etc.
The exact command-line attempts a considerable amount of obfuscation
and cares nothing at all for the files that are actually downloaded,
which are occasionally harvested for free space. The harvester did not
run anytime near this crash, though.
wget --tries=3 -i /path/to/getlist.txt -U 'Mozilla/5.0 (Windows NT
10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/59.0.3071.115 Safari/537.36' --header="Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
--header="Accept-Encoding: gzip, deflate, br" --header="Accept-
Language: en-US,en;q=0.8" --header="Cache-Control: max-age=0"
--header="Referer: https://www.google.com/" -e robots=off --wait 0.5
--random-wait 2>&1 | tee /path/to/logfile.txt
The getlist contained 144,551 URLs to process; this happened at the
44,417th URL. Wget successfully downloads the nearby URLs just fine
now; but here is the last several lines of logfile.txt:
- - - - - - - -
--2017-07-15 04:05:13-- http://urlshortener.actorsandcrew.com/
Resolving urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)... 64.13.228.85
Connecting to urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)|64.13.228.85|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1515 (1.5K) [text/html]
Saving to: ‘index.html.4732’
0K . 100%
127M=0s
2017-07-15 04:05:19 (127 MB/s) - ‘index.html.4732’ saved [1515/1515]
--2017-07-15 04:05:19-- http://varganess.soclog.se/p
Resolving varganess.soclog.se (varganess.soclog.se)... 83.140.155.4
Connecting to varganess.soclog.se (varganess.soclog.se)|83.140.155.4|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Location: http://dayviews.com [following]
--2017-07-15 04:05:25-- http://dayviews.com/
Connecting to dayviews.com (dayviews.com)|83.140.155.40|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘p’
0K .......... ........
115K=0.2s
2017-07-15 04:05:26 (115 KB/s) - ‘p’ saved [19057]
- - - - - - - -
The next site up for audit after this saved event was emitted was
http://drivingrevenue.com/ , which also downloads just fine when I run
it as a one-off.
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: wget 1.17.1-1ubuntu1.2
ProcVersionSignature: Ubuntu 4.4.0-75.96-generic 4.4.59
Uname: Linux 4.4.0-75-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.9
Architecture: amd64
Date: Mon Jul 17 12:40:33 2017
InstallationDate: Installed on 2014-06-23 (1120 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
ProcEnviron:
LC_CTYPE=en_US.UTF-8
TERM=screen
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: wget
UpgradeStatus: Upgraded to xenial on 2016-05-05 (437 days ago)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1704843/+subscriptions
More information about the foundations-bugs
mailing list