Skip to main content

Mirroring Web sites with wget

"This resulted in a large number of people hammering the site to obtain a local mirror. This minor panic reminded me of the importance of a good site mirroring tool.

"The quickest and easiest way to mirror a remote Web site is to use wget. Wget is similar to cURL (and I’ll be the first to admit that I prefer cURL over wget), but wget has some really slick and useful features that aren’t found in cURL, such as a means to download an entire Web site for local viewing:"

Recently, one of my favorite security-related sites was almost shut down due to the operator’s lack of time to keep it up to date. The site provided proof-of-concept and exploit code for various security vulnerabilities in a wide range of products across multiple platforms. The site, milw0rm, is invaluable to security researchers, and not to have access to that data would have been a huge loss.

Of course, once I heard the site might be going down, it was a mad rush between myself and many other security researchers to obtain a local mirror of the contents of the site for ourselves. This resulted in a large number of people hammering the site to obtain a local mirror. This minor panic reminded me of the importance of a good site mirroring tool.

The quickest and easiest way to mirror a remote Web site is to use wget. Wget is similar to cURL (and I’ll be the first to admit that I prefer cURL over wget), but wget has some really slick and useful features that aren’t found in cURL, such as a means to download an entire Web site for local viewing:

$ wget -rkp -l6 -np -nH -N http://example.com/

This command does a number of things. The -rkp option tells wget to download recursively, to convert downloaded links in HTML pages to point to local files, and to obtain all images and other files to properly render the page.

The -l6 option tells wget to recurse to a maximum of six nested levels, while -np tells it not to recurse to the parent directory. The -nH option tells wget not to create host directories; this means that the files will be downloaded to the current directory rather than a directory named after the hostname of the site being mirrored.

Finally, -N tells wget to use time-stamping, which is its way of trying to prevent downloading the same unchanged file more than once. Unfortunately, with dynamic sites being the norm, this may not work very well, but it’s worth adding, regardless.

Wget is capable of mirroring HTTP, HTTPS, and FTP sites. It can do so anonymously or with authentication for all of these protocols. The wget manpages have a lot of information on the wide variety of options, and it’s well worth checking out.


techrepublic.com

Comments

Popular posts from this blog

A KING WHO WAS POPULAL AMONG HIS PEOPLE (King Birendra)

King of Nepal Reign 31 January 1972 –1 June 2001 Predecessor Mahendra Successor Dipendra Consort Aishwarya Rajya Laxmi Devi Shah Father King Mahendra Bir Bikram Shah Dev Mother Indra Rajya Laxmi Born 28 December 1945 Kathmandu, Nepal Died 1 June 2001 (aged 55) Kathmandu, Nepal Religion Hinduism Birendra and Aishwaraya had three children. Prince Dipendra (27 June 1971 – 4 June 2001) Princess Shruti (15 October 1976 – 1 June 2001) Prince Nirajan (6 November 1977–1 June 2001) Known As: A soft-spoken man with glasses and a mustacheKing Birendra was 10th in his line to rule Nepal and considered by some to be an incarnation of the Hindu god Vishnu.On formal occasions, the king was known for stilted speeches full of jargon that did not generate much inspiration.He was more at ease and best liked for listening closely to the problems of common people, especially poor villagers in a country that is among the poorest in the world with ...

A KING WHO WAS FAMOUS (His Majesty King Mahendra)

King Mahendra  Born > 11 June 1920 Father > King Tribhuvan Bir Bikram Shah Married To >  Indra Rajya Laxmi, daughter of General Hari Shamsher Rana in 1940 . Three sons >  Birendra,  Gyanendra, and Dhirendra Three daughters >Shanti, Sharada and Shobha. After Queen Indra died in 1950. In 1952 Mahendra married Indra's sister Ratna Rajya Lakshmi Devi. Crowned > May 2, 1956 Mahendra was made a British Field Marshal in 1960. Mahendra implemented a land reform policy, which provided land to many landless people. The Mahendra Highway (also called East-West Highway) that runs along the entire Terai belt in southern Nepal was constructed during his reign. He played a key role in making Nepal a member of the United Nations. Mahendra died with a heart attack while hunting in Chitwan with Tiger Tops Hotel. It is believed  that his death was a conspiracy of CIA as John Coapman who was also proprietor of  Chitwan with Tiger Tops Hotel was ...

Quest Software’s Unicode (UTF-8)

Please make note of the following caveats Toad for Oracle supports any single-byte character set. Enabling UTF-8 does not automatically cause issues with Toad. The use of double-byte or multi-byte character does not automatically result in issues with Toad, provided that the Oracle database Client/Server configuration settings are in synch. For instance, Toad for Oracle is used extensively in Korea and it works well. Issues can arise with certain combinations of the Oracle client NLS settings and Windows regional settings. However, they seem to occur only when inserting or updating data (there are some scenarios in which data retrieval can cause issues). This makes it crucial for the database, client and OS to be in synch with any internationalization settings. In addition, when using UTF-8 data in the editor or a grid, a font that supports the desired characters must be selected as the default font. Mono-space fonts (also known as fixed-width or non-proportional) will not support...