Labels

Android (1) bash (2) boost (2) C (34) C++ (2) cheatsheet (2) CLion (6) css (3) Debian (33) DL (17) Docker (2) Dreamweaver (2) Eclipse (3) fail2ban (4) git (5) GitHub (4) Hacking (3) html (8) http (1) iOS (1) iPad (1) IRC (1) Java (31) javascript (3) Linux (169) Mac (19) Machine Learning (1) mySQL (49) Netbeans (4) Networking (1) Nexus (1) OpenVMS (6) Oracle (1) Pandas (3) php (16) Postgresql (8) Python (9) raid (1) RedHat (14) Samba (2) Slackware (48) SQL (14) svn (1) tar (1) ThinkPad (1) Virtualbox (3) Visual Basic (1) Visual Studio (1) Windows (2)

Tuesday, 19 November 2024

An alternative way to use wget

Normally wget works as expected but every now and then, you page is forbidden or similar, so the below is an alternative approach;

Firstly, download the html page with something like:

wget www.somepage.com/index.html

then extract the file types of the links you want, in this example, pdf files, with;

grep -o 'http[^"]*\.pdf' index.html > links.txt

then we would use wget again as follows;

wget -i links.txt

Obviously, this wont work for all cases but it has helped me on occasion.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.