[Bug 1937874] [NEW] one --accept-regex expression negates another
Bill Yikes
1937874 at bugs.launchpad.net
Fri Jul 23 19:44:58 UTC 2021
Public bug reported:
This command should theoretically fetch all PDFs on a page:
$ wget -v -d -r --level 1 --adjust-extension --no-clobber --no-directories\
--accept-regex 'administrative-orders/.*/administrative-order-matter-'\
--accept-regex 'administrative-orders.*.pdf'\
--accept-regex 'administrative-orders.page[^&]*$'\
--directory-prefix=/tmp\
'https://www.ncua.gov/regulation-supervision/enforcement-actions/administrative-orders?page=56'
But it fails to grab any of them, giving the output:
---
Deciding whether to enqueue "https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf".
https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf is excluded/not-included through regex.
Decided NOT to load it.
---
That's bogus. The workaround is to remove this option:
--accept-regex 'administrative-orders.page[^&]*$'
But that should not be necessary. Adding an --accept-* clause should
never cause another --accept-* clause to become invalidated and it
should not shrink the set of fetched files.
** Affects: wget (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to wget in Ubuntu.
https://bugs.launchpad.net/bugs/1937874
Title:
one --accept-regex expression negates another
Status in wget package in Ubuntu:
New
Bug description:
This command should theoretically fetch all PDFs on a page:
$ wget -v -d -r --level 1 --adjust-extension --no-clobber --no-directories\
--accept-regex 'administrative-orders/.*/administrative-order-matter-'\
--accept-regex 'administrative-orders.*.pdf'\
--accept-regex 'administrative-orders.page[^&]*$'\
--directory-prefix=/tmp\
'https://www.ncua.gov/regulation-supervision/enforcement-actions/administrative-orders?page=56'
But it fails to grab any of them, giving the output:
---
Deciding whether to enqueue "https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf".
https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf is excluded/not-included through regex.
Decided NOT to load it.
---
That's bogus. The workaround is to remove this option:
--accept-regex 'administrative-orders.page[^&]*$'
But that should not be necessary. Adding an --accept-* clause should
never cause another --accept-* clause to become invalidated and it
should not shrink the set of fetched files.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1937874/+subscriptions
More information about the foundations-bugs
mailing list