blog.david14.com: An alternative way to use wget

Tuesday, 19 November 2024

Normally wget works as expected but every now and then, you page is forbidden or similar, so the below is an alternative approach;

Firstly, download the html page with something like:

wget www.somepage.com/index.html

then extract the file types of the links you want, in this example, pdf files, with;

grep -o 'http[^"]*\.pdf' index.html > links.txt

then we would use wget again as follows;

wget -i links.txt

Obviously, this wont work for all cases but it has helped me on occasion.

blog.david14.com