I needed to recursively download a set of files from a ftp server. As usual, my first option was to use wget to execute this job.
When I issued the command, the following interesting message showed up:
Note: I replaced the original FTP for <ftp_location>.
Saving to: ‘<ftp_location>/robots.txt’ public.dhe.ibm.com/robots 100%[=====================================>] 131 --.-KB/s in 0s 2017-04-05 09:34:37 (13.1 MB/s) - ‘<ftp_location>/robots.txt’ saved [131/131] FINISHED --2017-04-05 09:34:37-- Total wall clock time: 6.0s Downloaded: 2 files, 1.9K in 0s (65.9 MB/s)
Instead of downloading the files I wanted, it downloaded a file called robots.txt, which content politely says to go away 🙂
# go away User-agent: * Robot-version: 2.0 Allow: /support/knowledgecenter Allow: /softsupt/os2ddpak Allow: /download Disallow: /
From this reference: “...web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol…“. In my case, the robot didn’t allow me to access the files I wanted.
To overcome this issue, I did the following (self-explanation command):
wget -r --no-parent -l1 -e robots=off --wait 1 <FTP_DIR_URL>
wget: utility for non-interactive download of files from the Web (see man wget)
-r: recursive retrieving
–no-parent: do not ever ascend to the parent directory when retrieving recursively
-l1: specify recursion depth to the first level
-e: execute command as if it were a part of .wgetrc
robots=off: the command executed by the flag -e
–wait 1: wait 1 second between retrievals
It works 🙂