Wget can recursively download data or web pages. This is a key feature Wget has that cURL does not have. While cURL is a library with a command-line front end, Wget is a command-line tool. Since recursive download requires several Wget options,
wget --recursive -np -nc -nH --cut-dirs=4 --random-wait --wait 1 -e robots=off https://site.example/aaa/bbb/ccc/ddd/
This downloads the files to whatever directory you ran the command in.
To use Wget to recursively download using FTP, simply change
ftp:// using the FTP directory.
Wget recursive download options:
- download recursively (and place in recursive folders on your PC)
- recurse but
--level=1don’t go below specified directory
- total overall download
--quotaoption, for example to stop downloading after 1 GB has been downloaded altogether
- Never get parent directories (sometimes a site will link upwards)
- no clobber – don’t re-download files you already have
- no directory structure on download (put all files in one directory commanded by -P)
- don’t put vestigial site name directories on your PC
- only accept files matching globbed pattern
- don’t put a vestigial hierarchy of directories above the desired directory on your PC. Set the number equal to the number of directories on server (here aaa/bbb/ccc/ddd is four)
- Many sites will block robots from consuming data. Here we override this setting telling Apache that we’re (somewhat) human.
- To avoid excessive download requests (that can get you auto-banned from downloading) we politely wait in-between file downloads
- making the random wait time average to about 1 second before starting to download the next file. This helps avoid anti-leeching measures.