How to Get and Download all File Type Links from a Web Page - Linux
Submitted by ingram on Thu, 02/28/2013 - 10:51pm



This tutorial explains how to take a URL and get all of the links for a specific file type (pdf, jpg, mp3, wav, whatever extension you want) exported into a list and download all of the links in Linux. In my example, I have a web page with over 20 links to pdf files. Instead of downloading them individually and manually, this script will allow me to download all of them at one time, and give me a list of each link.
You need to have lynx and wget installed before running this script. To install, run the following command:
Ubuntu: sudo apt-get install lynx-cur wget
openSUSE: sudo zypper install lynx wget
Save the following text as link-dl.sh and execute it by running "sh link-dl.sh":
#! /bin/bash
lynx --dump http://www.appassure.com/resources/technical-documentation/ | awk '/http/{print $2}' | grep pdf > /tmp/file.txt
for i in $( cat /tmp/file.txt ); do wget $i; done
Bookmark/Search this post with
- Add new comment
- 1 comment
Like this article?? Check out these:
User login
Popular content
Today's:
All time:
Last viewed:
Did you find this tutorial useful?
To assist with the bills and the author's time, please consider making a donation. Any amount helps:
Top Categories
Linux
Command Line
ubuntu
windows
Terminal
Windows 7
Exchange 2010
Exchange
Open Source
hosted exchange
hacking
Bible
Christain
SSH
Multi-Tenant
HP
WINE
Server 2008 R2
Switch
Password Recovery
Active Directory
Internet Explorer
nmap
One-Liner
Lync
Citrix
Games
Citrix Receiver
DNS
OpenPanel
Server
ping
Pure-FTPd
Script
RAID
robocopy
server 2008
SonicWall
profile migration
Windows 8
VPN
VM
tweak
Unity
PsExec
ESX
Outlook
BSOD
Awk
ASCII
Control Panel
iis
Exchange 2010 SP1
iptables
IE9
Fail2ban
IE ESC
7
2012
2008
Thanks for the script, save
Thanks for the script, save me time to download files.
Little improvement made to the script be more flexible
#! /bin/bash
lynx --dump $1 | awk '/http/{print $2}' | grep $2 > /tmp/file.txt
for i in $( cat /tmp/file.txt ); do wget $i; done
On execution :
./script_down_links <some_link> <filetype {pdf,doc,ppt,...}>
Post new comment