[Bug 1618288] [NEW] wget using "if-modified-since" is not idempotent and corrupts downloaded copy of website on second use

Carl Hauser 1618288 at bugs.launchpad.net
Tue Aug 30 02:15:42 UTC 2016


Public bug reported:

I use wget to copy a web site from one server to another, adjusting file
suffixes and paths.

Since updating to 16.04 LTS from 14.04 the command that I used
previously  has begun corrupting the destination site on second and
subsequent invocations.

The options relevant to the problem seem to be -N (use timestamping), -k
(convert links) and -E (adjust extensions). The problem arises with
linked files whose names do not end in .html. On the first invocation
everything is good: file foo.txt is downloaded and linked as foo.txt. On
the second invocation the wget log (option -v) suggests that it has
examined foo.txt on the server, but then it reports "File
'<copylocation>/foo.txt.html' not modified on server. Omitting
download." and then it changes the link in the referring file to
foo.txt.html.

I think this is a bug. Do others have an opinion?

Workaround: include the option "--no-if-modified-since" which seems to
restore the old, correct behavior.

Thanks.

P.S. The full command that misbehaves is: wget -nH -r -E -k -N -x -l inf
-P <destination for copy> "http://<source web site>"

** Affects: wget (Ubuntu)
     Importance: Undecided
         Status: New

** Package changed: ubuntu => wget (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to wget in Ubuntu.
https://bugs.launchpad.net/bugs/1618288

Title:
  wget using "if-modified-since" is not idempotent and corrupts
  downloaded copy of website on second use

Status in wget package in Ubuntu:
  New

Bug description:
  I use wget to copy a web site from one server to another, adjusting
  file suffixes and paths.

  Since updating to 16.04 LTS from 14.04 the command that I used
  previously  has begun corrupting the destination site on second and
  subsequent invocations.

  The options relevant to the problem seem to be -N (use timestamping),
  -k (convert links) and -E (adjust extensions). The problem arises with
  linked files whose names do not end in .html. On the first invocation
  everything is good: file foo.txt is downloaded and linked as foo.txt.
  On the second invocation the wget log (option -v) suggests that it has
  examined foo.txt on the server, but then it reports "File
  '<copylocation>/foo.txt.html' not modified on server. Omitting
  download." and then it changes the link in the referring file to
  foo.txt.html.

  I think this is a bug. Do others have an opinion?

  Workaround: include the option "--no-if-modified-since" which seems to
  restore the old, correct behavior.

  Thanks.

  P.S. The full command that misbehaves is: wget -nH -r -E -k -N -x -l
  inf -P <destination for copy> "http://<source web site>"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1618288/+subscriptions



More information about the foundations-bugs mailing list