segfault processing very large input list

Bug #1704843 reported by Chris Drost
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
wget (Ubuntu)
New
Undecided
Unassigned

Bug Description

The attached apport file was created from a segfault/core-dump observed while using wget to try to audit a large number of websites to determine which ones were online, which were redirects and where they redirected to, etc.

The exact command-line attempts a considerable amount of obfuscation and cares nothing at all for the files that are actually downloaded, which are occasionally harvested for free space. The harvester did not run anytime near this crash, though.

wget --tries=3 -i /path/to/getlist.txt -U 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36' --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8" --header="Accept-Encoding: gzip, deflate, br" --header="Accept-Language: en-US,en;q=0.8" --header="Cache-Control: max-age=0" --header="Referer: https://www.google.com/" -e robots=off --wait 0.5 --random-wait 2>&1 | tee /path/to/logfile.txt

The getlist contained 144,551 URLs to process; this happened at the 44,417th URL. Wget successfully downloads the nearby URLs just fine now; but here is the last several lines of logfile.txt:

- - - - - - - -

--2017-07-15 04:05:13-- http://urlshortener.actorsandcrew.com/
Resolving urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)... 64.13.228.85
Connecting to urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)|64.13.228.85|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1515 (1.5K) [text/html]
Saving to: ‘index.html.4732’

     0K . 100% 127M=0s

2017-07-15 04:05:19 (127 MB/s) - ‘index.html.4732’ saved [1515/1515]

--2017-07-15 04:05:19-- http://varganess.soclog.se/p
Resolving varganess.soclog.se (varganess.soclog.se)... 83.140.155.4
Connecting to varganess.soclog.se (varganess.soclog.se)|83.140.155.4|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se
Location: http://dayviews.com [following]
--2017-07-15 04:05:25-- http://dayviews.com/
Connecting to dayviews.com (dayviews.com)|83.140.155.40|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘p’

     0K .......... ........ 115K=0.2s

2017-07-15 04:05:26 (115 KB/s) - ‘p’ saved [19057]

- - - - - - - -

The next site up for audit after this saved event was emitted was http://drivingrevenue.com/ , which also downloads just fine when I run it as a one-off.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: wget 1.17.1-1ubuntu1.2
ProcVersionSignature: Ubuntu 4.4.0-75.96-generic 4.4.59
Uname: Linux 4.4.0-75-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.9
Architecture: amd64
Date: Mon Jul 17 12:40:33 2017
InstallationDate: Installed on 2014-06-23 (1120 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
ProcEnviron:
 LC_CTYPE=en_US.UTF-8
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: wget
UpgradeStatus: Upgraded to xenial on 2016-05-05 (437 days ago)

Revision history for this message
Chris Drost (drostie) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.