Some of the sites I’ve purchased over the years are from people who’s sites have disappeared and become inactive. So I’m presented with the problem of how I restore these sites. It’s actually quite easy to do it. All you have to do is use a tool like archive.org otherwise known as the WayBackMachine. The first step is to type in the domain name you want to restore. If it exists in the archive thats a great first step.
After you type in the domain you will get a page that shows the number of samples taken during the year the site was active.
Start by clicking on the yellowed out boxes, where you see the most samples (black graphs). You will then see a blue dot on the calendar date the spider for archive.org went to grab the site. Click on the blue dot to pull up a copy of your site.
Once you click on that blue dot the site you want to restore comes up. However there are a few things that have been added that you will have to be aware of.
How to strip out the Wayback Headers
When the Wayback machine displays a sample of your site it includes some additional HTML and modifies links of your site. It’s actually pretty easy to identify these. But first we must do a file save as to save our first sampled page.
Open up a text editor and open the page you just saved. The front part of the page should look like this.
If you noticed you will see several links that precede the links of the site you are trying to restore. These need to be removed using your text editor. Use your text editor to remove these by copying the links above and replacing them with nothing (NULL). After you are done with that you can now remove the Wayback header HTML from the page. This is done by removing the text between text that looks like this:
<!-- BEGIN WAYBACK TOOLBAR INSERT -->BLA BLAH BLA <!-- END WAYBACK TOOLBAR INSERT -->
With your text editor start with the first <!– and delete everything up to “–>”
That should now leave your restored page without any traces from archive.org.
How To Name Restored Pages
This can get tricky because the first page that gets restored is usually the front page which usually is named something like index.html, index.htm or other name. However to restore the other pages of the site will require you to name them correctly.
Restoring Remaining Pages of a Site
To restore these pages one at a time, I would recommend that you click on the link of the page you want to restore. In this example, we can restore the page from the link titled “Radio Streaming AAC – Flash player..” To do this, open this page in a separate window and you will notice in the address bar that the link to the page is actually a directory of /radio-streaming-aac/. So you can create a directory with the same name and copy the file named as index.html if you like.
As you can see this can be tedious work to restore a complete site with dozens of pages. But there are other ways to restore this even faster with a really neat tool. I will post that in my next blog post. Best of luck!