المساعد الشخصي الرقمي

مشاهدة النسخة كاملة : How to use WGET a command line program to download files from webpage



bahattab
02-09-2008, 01:38 PM
WGET for Windows (win32) - current version: 1.10.2







www.youtube.com/watch?v=14SSVLQx2-A

http://www.youtube.com/watch?v=14SSVLQx2-A



www.youtube.com/watch?v=j0u_F8-oYQE

http://www.youtube.com/watch?v=j0u_F8-oYQE

bahattab
17-07-2010, 09:54 AM
WGET for Windows (win32) - current version: 1.10.2






Read below to download wget.exe (http://users.ugent.be/%7Ebpuype/wget/#download) and for some help with wget (http://users.ugent.be/%7Ebpuype/wget/#usage).
GNU wget

From the official wget homepage (http://www.gnu.org/software/wget/wget.html):
"GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without Xsupport, etc."


While you can get Windows binaries from Heiko Herold's page (http://space.tin.it/computer/hherold/), the binaries here are tweaked a bit so they operate somewhat better on Windows.
The following changes, compared to the official distribution, were retained/added since 1.8.2:




Statically linked with (masm optimized) OpenSSL 0.9.7i, which makes wget.exe completely stand-alone.
Compressed with UPX 1.07 for smaller filesize



It seems the rfc1738 problems on Windows (see below) were fixed in wget 1.9, so there is no longer a need to edit the source code.
OpenSSL

Wget now supports Secure Socket Layer (SSL, https://. (https://./)..) among other things. Most available binaries are dynamically linked against OpenSSL, and require you to have a couple of dll's in your path. The binary on this site is statically linked with OpenSSL (which makes it larger in size, but stand-alone).



Note the license addendum:
"In addition, as a special exception, the Free Software Foundation gives permission to link the code of its release of Wget with the OpenSSL project's "OpenSSL" library (or with modified versions of it that use the same license as the "OpenSSL" library), and distribute the linked executables. You must obey the GNU General Public License in all respects for all of the code used other than "OpenSSL". If you modify this file, you may extend this exception to your version of the file, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version."
Furthermore, compiling (statically) with OpenSSL is cumbersome in VC++. If you were to try this yourself, necessary steps would include:


Getting the OpenSSL source (http://www.openssl.org/source/) and untarring it somewhere besides wget source.
Configuring for win32 - this involves Perl (e.g. ActivePerl (http://www.activestate.com/Products/ActivePerl/)). (ms\do_masm.bat will help)
Compiling static libraries (ms\nt.mak, not ms\ntdll.mak)...
... however, after making sure you are compiling with the correct multithreaded runtime libraries (/MT, not /MD) to match wget configuration.
configure.bat --msvc in the wget tree.
Adding the inc32\... to the wget include path, and both libraries in out32 to the wget link step. (edit SSLLIBS and DEFS in src\Makefile)

Downloads!

Latest version is 1.10.2, compiled with MS Visual C++ and linked with OpenSSL 0.9.7i. Page will be updated with new releases of wget. Wget tends to see a couple of incremental bugfix releases (i.e. 1.10.x). I am currently using wget 1.10.x on a daily basis.
>> wget.exe (http://users.ugent.be/%7Ebpuype/cgi-bin/fetch.pl?dl=wget/wget.exe) (332800 bytes) <<



: win32 binary with OpenSSL support.
MD5: dbe287eb8d58e6322e9fb67110ed7122
SHA1: 1cd5550de3a857540cbe79fda1c7186dd7721802
Usage

wget is a command line program. You start it from the command prompt, either command.com in Windows 9x/Me or cmd.exe in Windows 2000/XP. The command prompt can be found in the Start Menu (Accessories).
wget.exe must be placed in your path (e.g. c:\windows)
To retrieve a file:



wget http://users.ugent.be/~bpuype/wget/wget.exe (http://users.ugent.be/%7Ebpuype/wget/wget.exe)



http://www.atyafonline.com/imgcache/892.png



wget in action...




Basic options



Display all help:



wget --help




Completely mirror a site:



wget -mr http://. (http://./)..



-m: mirror



-r: recursive



Mirror without following links to other servers, parent directories:



wget



-mrnp http://. (http://./)..



-np: no-parent




Retrieve a html file and convert relative links to absolute ones:



wget -k http://users.ugent.be/~bpuype/wget (http://users.ugent.be/%7Ebpuype/wget)



-k: 'k'onvert links




Resume partially downloaded files (if supported by the server):



wget -c http://. (http://./)..



-c: continue






Read url's from a file and retrieve them:


wget -i file_with_urls.txt



-i: input-file


Ask for url's (read from stdin):



wget -i -. Enter url's on the command line, press enter after each url, and terminate with ^Z (press CTRL-Z) on an empty line.
FTP





--glob=off


Don't treat (, *, ? etc. as globbing characters. Use when transfering files with names that contain these characters.


--passive-ftp


Use passive mode for data connection (try this if you're behind a firewall, NAT box...)
Proxy

To make wget use a proxy, you must set up an environment variable before using wget. Type this at the command prompt:


set http_proxy=http://proxy.myprovider.net:8080

...where you use the correct proxy hostname and port for your ISP or network. You can use ftp_proxy to proxy ftp requests.



--proxy=on
--proxy=off

Turn proxy usage on/off once variable is set.



http://users.ugent.be/~bpuype/wget/ (http://users.ugent.be/%7Ebpuype/wget/)






Linux wget your ultimate command line downloader (http://www.cyberciti.biz/tips/linux-wget-your-ultimate-command-line-downloader.html)




It is a common practice to manage UNIX/Linux/BSD server remotely over ssh session. As you manage servers, you need to download the software or other files for installation or even download latest ISO of Linux distribution (or even MP3s). These days we have lots of GUI downloaders for X window such as:




d4x: http://www.krasu.ru/soft/chuchelo
kget: KDE download manager
gwget2 - GNOME 2 wget front-end


However, when it comes to command line (shell prompt) wget the non-interactive downloader rules. It supports http, ftp, https protocols along with authentication facility, and tons of options. Here are some tips to get most out of it:
Download a single file using wget






$ wget http://www.cyberciti.biz/here/lsst.tar.gz
$ wget ftp://ftp.freebsd.org/pub/sys.tar.gzDownload multiple files on command line using wget




$ wget http://www.cyberciti.biz/download/lsst.tar.gz ftp://ftp.freebsd.org/pub/sys.tar.gz ftp://ftp.redhat.com/pub/xyz-1rc-i386.rpmOR




i) Create variable that holds all urls and later use 'BASH for loop' to download all files:




$ URLS=”http://www.cyberciti.biz/download/lsst.tar.gz ftp://ftp.freebsd.org/pub/sys.tar.gz ftp://ftp.redhat.com/pub/xyz-1rc-i386.rpm http://xyz.com/abc.iso" ii) Use for loop as follows:




$ for u in $URLS; do wget $u; doneiii) However, a better way is to put all urls in text file and use -i option to wget to download all files:




(a) Create text file using vi


$ vi /tmp/download.txtAdd list of urls:


http://www.cyberciti.biz/download/lsst.tar.gz
ftp://ftp.freebsd.org/pub/sys.tar.gz
ftp://ftp.redhat.com/pub/xyz-1rc-i386.rpm
http://xyz.com/abc.iso(b) Run wget as follows:




$ wget -i /tmp/download.txt(c) Force wget to resume downloadYou can use -c option to wget. This is useful when you want to finish up a download started by a previous instance of wget and the net connection was lost. In such case you can add -c option as follows:


$ wget -c http://www.cyberciti.biz/download/lsst.tar.gz
$ wget -c -i /tmp/download.txtPlease note that all ftp/http server does not supports the download resume feature.




Force wget to download all files in background, and log the activity in a file:




$ wget -cb -o /tmp/download.log -i /tmp/download.txtOR




$ nohup wget -c -o /tmp/download.log -i /tmp/download.txt &nohup runs the given COMMAND (in this example wget) with hangup signals ignored, so that the command can continue running in the background after you log out.




Limit the download speed to amount bytes/kilobytes per seconds.

This is useful when you download a large file file, such as an ISO image. Recently one of admin started to download SuSe Linux DVD (http://www.novell.com/products/linuxprofessional/downloads/ftp/int_mirrors.html) on one of production server for evaluation purpose. Soon wget started to eat up all bandwidth. No need to predict end result of such a disaster.




$ wget -c -o /tmp/susedvd.log --limit-rate=50k ftp://ftp.novell.com/pub/suse/dvd1.iso Use m suffix for megabytes (--limit-rate=1m). Above command will limit the retrieval rate to 50KB/s. It is also possible to specify disk quota for automatic retrievals to avoid disk DoS attack. Following command will be aborted when the quota is
(100MB+) exceeded.




$ wget -cb -o /tmp/download.log -i /tmp/download.txt --quota=100mF) Use http username/password on an HTTP server:</span></span></span></span></span></span></span></span></span>




$ wget –http-user=foo –http-password=bar http://cyberciti.biz/vivek/csits.tar.gzG) Download all mp3 or pdf file from remote FTP server:
Generally you can use shell special character aka wildcards such as *, ?, [] to specify selection criteria for files. Same can be use with FTP servers while downloading files.




$ wget ftp://somedom.com/pub/downloads/*.pdf

$ wget ftp://somedom.com/pub/downloads/*.pdfOR$ wget -g on ftp://somedom.com/pub/downloads/*.pdfH) Use aget when you need multithreaded http download:

aget fetches HTTP URLs in a manner similar to wget, but segments the retrieval into multiple parts to increase download speed. It can be many times as fast as wget in some circumstances( it is just like Flashget under MS Windows but with CLI):




$ aget -n=5 http://download.soft.com/soft1.tar.gzAbove command will download soft1.tar.gz in 5 segments.
Please note that wget command is available on Linux and UNIX/BSD like oses.
See man page of wget(1) for more advanced options.





http://www.cyberciti.biz/tips/linux-wget-your-ultimate-command-line-downloader.html
(http://www.cyberciti.biz/tips/linux-wget-your-ultimate-command-line-downloader.html)





Wget resume broken download (http://www.cyberciti.biz/tips/wget-resume-broken-download.html)



The GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. Recently I was downloading Ubuntu Linux ISO (618 MB) for testing purpose at my home computer. My Uninterrupted Power Supply (UPS) was down. I started download with wget command:



$ wget http://ftp.ussg.iu.edu/linux/ubuntu-releases/5.10/ubuntu-5.10-install-i386.isoDue to power supply problem, my computer rebooted at 98% download. Again, after reboot I typed wget:




$ wget http://ftp.ussg.iu.edu/linux/ubuntu-releases/5.10/ubuntu-5.10-install-i386.isoHowever, wget restarted to download complete ISO image again. I thought wget should resume partially downloaded ISO file.
Command to resume file download with wget

After reading man page, I found -c option. It will continue getting a partially downloaded file. This is useful when you want to finish a download started by a previous instance of wget, or by another program.



$ wget -c http://ftp.ussg.iu.edu/linux/ubuntu-releases/5.10/ubuntu-5.10-install-i386.iso

(http://www.cyberciti.biz/tips/wget-resume-broken-download.html)
http://www.cyberciti.biz/tips/wget-resume-broken-download.html








Quick Tip: Run wget in background for an unattended download of files (http://www.cyberciti.biz/tips/howto-run-wget-in-background.html)



Here is quick tip, if you wish to perform an unattended download of large files such as Linux DVD ISO use wget as follows:
wget -bqc http://path.com/url.iso
Where,
=> -b : Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.
=> -q : Turn off Wget’s output aka save disk space.
=> -c : Resume broken download i.e. continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program.
This tip will save your time while downloading large ISO image from the internet.
You can also use nohup command to execute commands after you exit from a shell prompt (http://www.cyberciti.biz/tips/nohup-execute-commands-after-you-exit-from-a-shell-prompt.html):
$ nohup wget http://domain.com/dvd.iso &
$ exit




http://www.cyberciti.biz/tips/howto-run-wget-in-background.html