[Bug 1704843] [NEW] segfault processing very large input list

Chris Drost 1704843 at bugs.launchpad.net
Mon Jul 17 18:15:23 UTC 2017


Public bug reported:

The attached apport file was created from a segfault/core-dump observed
while using wget to try to audit a large number of websites to determine
which ones were online, which were redirects and where they redirected
to, etc.

The exact command-line attempts a considerable amount of obfuscation and
cares nothing at all for the files that are actually downloaded, which
are occasionally harvested for free space. The harvester did not run
anytime near this crash, though.

wget --tries=3 -i /path/to/getlist.txt -U 'Mozilla/5.0 (Windows NT 10.0;
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115
Safari/537.36' --header="Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
--header="Accept-Encoding: gzip, deflate, br" --header="Accept-Language:
en-US,en;q=0.8" --header="Cache-Control: max-age=0" --header="Referer:
https://www.google.com/" -e robots=off --wait 0.5 --random-wait 2>&1 |
tee /path/to/logfile.txt

The getlist contained 144,551 URLs to process; this happened at the
44,417th URL. Wget successfully downloads the nearby URLs just fine now;
but here is the last several lines of logfile.txt:

- - - - - - - -

--2017-07-15 04:05:13--  http://urlshortener.actorsandcrew.com/
Resolving urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)... 64.13.228.85
Connecting to urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)|64.13.228.85|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1515 (1.5K) [text/html]
Saving to: ‘index.html.4732’

     0K .                                                     100%
127M=0s

2017-07-15 04:05:19 (127 MB/s) - ‘index.html.4732’ saved [1515/1515]

--2017-07-15 04:05:19--  http://varganess.soclog.se/p
Resolving varganess.soclog.se (varganess.soclog.se)... 83.140.155.4
Connecting to varganess.soclog.se (varganess.soclog.se)|83.140.155.4|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Location: http://dayviews.com [following]
--2017-07-15 04:05:25--  http://dayviews.com/
Connecting to dayviews.com (dayviews.com)|83.140.155.40|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘p’

     0K .......... ........
115K=0.2s

2017-07-15 04:05:26 (115 KB/s) - ‘p’ saved [19057]

- - - - - - - -

The next site up for audit after this saved event was emitted was
http://drivingrevenue.com/ , which also downloads just fine when I run
it as a one-off.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: wget 1.17.1-1ubuntu1.2
ProcVersionSignature: Ubuntu 4.4.0-75.96-generic 4.4.59
Uname: Linux 4.4.0-75-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.9
Architecture: amd64
Date: Mon Jul 17 12:40:33 2017
InstallationDate: Installed on 2014-06-23 (1120 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
ProcEnviron:
 LC_CTYPE=en_US.UTF-8
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: wget
UpgradeStatus: Upgraded to xenial on 2016-05-05 (437 days ago)

** Affects: wget (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug xenial

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to wget in Ubuntu.
https://bugs.launchpad.net/bugs/1704843

Title:
  segfault processing very large input list

Status in wget package in Ubuntu:
  New

Bug description:
  The attached apport file was created from a segfault/core-dump
  observed while using wget to try to audit a large number of websites
  to determine which ones were online, which were redirects and where
  they redirected to, etc.

  The exact command-line attempts a considerable amount of obfuscation
  and cares nothing at all for the files that are actually downloaded,
  which are occasionally harvested for free space. The harvester did not
  run anytime near this crash, though.

  wget --tries=3 -i /path/to/getlist.txt -U 'Mozilla/5.0 (Windows NT
  10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
  Chrome/59.0.3071.115 Safari/537.36' --header="Accept:
  text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
  --header="Accept-Encoding: gzip, deflate, br" --header="Accept-
  Language: en-US,en;q=0.8" --header="Cache-Control: max-age=0"
  --header="Referer: https://www.google.com/" -e robots=off --wait 0.5
  --random-wait 2>&1 | tee /path/to/logfile.txt

  The getlist contained 144,551 URLs to process; this happened at the
  44,417th URL. Wget successfully downloads the nearby URLs just fine
  now; but here is the last several lines of logfile.txt:

  - - - - - - - -

  --2017-07-15 04:05:13--  http://urlshortener.actorsandcrew.com/
  Resolving urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)... 64.13.228.85
  Connecting to urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)|64.13.228.85|:80... connected.
  HTTP request sent, awaiting response... 200 OK
  Length: 1515 (1.5K) [text/html]
  Saving to: ‘index.html.4732’

       0K .                                                     100%
  127M=0s

  2017-07-15 04:05:19 (127 MB/s) - ‘index.html.4732’ saved [1515/1515]

  --2017-07-15 04:05:19--  http://varganess.soclog.se/p
  Resolving varganess.soclog.se (varganess.soclog.se)... 83.140.155.4
  Connecting to varganess.soclog.se (varganess.soclog.se)|83.140.155.4|:80... connected.
  HTTP request sent, awaiting response... 301 Moved Permanently
  Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
  Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
  Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
  Location: http://dayviews.com [following]
  --2017-07-15 04:05:25--  http://dayviews.com/
  Connecting to dayviews.com (dayviews.com)|83.140.155.40|:80... connected.
  HTTP request sent, awaiting response... 200 OK
  Length: unspecified [text/html]
  Saving to: ‘p’

       0K .......... ........
  115K=0.2s

  2017-07-15 04:05:26 (115 KB/s) - ‘p’ saved [19057]

  - - - - - - - -

  The next site up for audit after this saved event was emitted was
  http://drivingrevenue.com/ , which also downloads just fine when I run
  it as a one-off.

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: wget 1.17.1-1ubuntu1.2
  ProcVersionSignature: Ubuntu 4.4.0-75.96-generic 4.4.59
  Uname: Linux 4.4.0-75-generic x86_64
  ApportVersion: 2.20.1-0ubuntu2.9
  Architecture: amd64
  Date: Mon Jul 17 12:40:33 2017
  InstallationDate: Installed on 2014-06-23 (1120 days ago)
  InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
  ProcEnviron:
   LC_CTYPE=en_US.UTF-8
   TERM=screen
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  SourcePackage: wget
  UpgradeStatus: Upgraded to xenial on 2016-05-05 (437 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1704843/+subscriptions



More information about the foundations-bugs mailing list