wget using "if-modified-since" is not idempotent and corrupts downloaded copy of website on second use
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
wget (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
I use wget to copy a web site from one server to another, adjusting file suffixes and paths.
Since updating to 16.04 LTS from 14.04 the command that I used previously has begun corrupting the destination site on second and subsequent invocations.
The options relevant to the problem seem to be -N (use timestamping), -k (convert links) and -E (adjust extensions). The problem arises with linked files whose names do not end in .html. On the first invocation everything is good: file foo.txt is downloaded and linked as foo.txt. On the second invocation the wget log (option -v) suggests that it has examined foo.txt on the server, but then it reports "File '<copylocation>
I think this is a bug. Do others have an opinion?
Workaround: include the option "--no-if-
Thanks.
P.S. The full command that misbehaves is: wget -nH -r -E -k -N -x -l inf -P <destination for copy> "http://<source web site>"
affects: | ubuntu → wget (Ubuntu) |