|   | GNU/Linux Desktop Survival Guide by Graham Williams |   | |||
| Wget Mirror Websites | 
20200526 A popular use case for wget is to make a complete copy of a website, perhaps for local perusal or local archival. For example, we might backup a conference website for archival and historical purposes:
| $ wget --mirror --convert-links --adjust-extension --page-requisites \ --no-parent https://ausdm18.ausdm.org/ | 
ausdm18.ausdm.org in the
current working directory. Browsing to this directory within a browser
using a URL like file:///home/kayon/ausdm18.ausdm.org will
interact with the local copy of the web site.
Another use case is to download all of the available Debian packages that start with r as available from a particular Debian mirror.
| 
  $ wget --mirror --accept '.deb' --no-directories \
    http://archive.ubuntu.com/ubuntu/ubuntu/pool/main/r/
 | 
Useful comman line options include -r
(--recursive) which indicates that we want
to recurse through the given URL link.  The
--mirror option includes
--recursive as well as some other options
(see the
manual page
for details). The -l 1
(--level=1) option specifies how many levels
we should dive into at the web site. Here we recurse only a single
level. The -A .deb
(--accept) resticts the download to just those
files the have a deb extension. The extenstions can be a
comma separated list. The -nd
(--no-directories) requests
wget to not create any directories locally—the files are
downloaded to the current directory.
For a website that no longer exists, the wayback machine is useful. To copy a website from there, install the wayback machine downloader and then:
| $ wayback_machine_downloader http://ausdm17.azurewebsites.net/ |