If you’re trying to restore a site with less than 50 pages you can do it by hand most of the time using Google’s cache, bing or archive.org. However if its larger than 50 pages you might want to consider using the tool called Warrick website restoration. This project was started by two students at …. and they built this tool to automatically restore a damaged or lost site.
A couple of things. To get started you will need a machine running linux. I would recommend Ubuntu since its one of the best distributions and pretty easy to install and setup. The instructions are pretty easy to setup and install. If you have questions on how to do this I would recommend reading the instructions at warrick.cs.odu.edu.
Some of the sites I’ve purchased over the years are from people who’s sites have disappeared and become inactive. So I’m presented with the problem of how I restore these sites. It’s actually quite easy to do it. All you have to do is use a tool like archive.org otherwise known as the WayBackMachine. The first step is to type in the domain name you want to restore. If it exists in the archive thats a great first step.
After you type in the domain you will get a page that shows the number of samples taken during the year the site was active.
Start by clicking on the yellowed out boxes, where you see the most samples (black graphs). You will then see a blue dot on the calendar date the spider for archive.org went to grab the site. Click on the blue dot to pull up a copy of your site.
Once you click on that blue dot the site you want to restore comes up. However there are a few things that have been added that you will have to be aware of.
How to strip out the Wayback Headers
When the Wayback machine displays a sample of your site it includes some additional HTML and modifies links of your site. It’s actually pretty easy to identify these. But first we must do a file save as to save our first sampled page.
Open up a text editor and open the page you just saved. The front part of the page should look like this.
If you noticed you will see several links that precede the links of the site you are trying to restore. These need to be removed using your text editor. Use your text editor to remove these by copying the links above and replacing them with nothing (NULL). After you are done with that you can now remove the Wayback header HTML from the page. This is done by removing the text between text that looks like this:
<!-- BEGIN WAYBACK TOOLBAR INSERT -->BLA BLAH BLA <!-- END WAYBACK TOOLBAR INSERT -->
With your text editor start with the first <!– and delete everything up to “–>”
That should now leave your restored page without any traces from archive.org.
How To Name Restored Pages
This can get tricky because the first page that gets restored is usually the front page which usually is named something like index.html, index.htm or other name. However to restore the other pages of the site will require you to name them correctly.
Restoring Remaining Pages of a Site
To restore these pages one at a time, I would recommend that you click on the link of the page you want to restore. In this example, we can restore the page from the link titled “Radio Streaming AAC – Flash player..” To do this, open this page in a separate window and you will notice in the address bar that the link to the page is actually a directory of /radio-streaming-aac/. So you can create a directory with the same name and copy the file named as index.html if you like.
As you can see this can be tedious work to restore a complete site with dozens of pages. But there are other ways to restore this even faster with a really neat tool. I will post that in my next blog post. Best of luck!
As you know I have dozens of sites and most are coded in raw HTML back in the day when I created them. Now that WordPress has come along it offers a compelling case to convert these sites over. Here are a few of them:
1) WordPress makes it easy to update content and add content as needed without the use of conventional layout packages such as Dreamweaver and NVU.
2)There are many widgets and plug-ins that make it easy to port your site over to wordpress from a fixed url site.
How to get started.
The first step in converting an old HTML site to WordPress is the get a listing of all of the files on the current site. So for example, if you have a site with five pages then list out like this:
Next you want to open up each of these files and copy the content (not menu or navigational content).
Create a new page under the Dashboard menu of wordpress.
Name the page so that its permalink matches the old page. So for example number one might be widgets-to-buy.
Save that page and start again until all of the pages are complete. Add the new address of the redirected page beneath the page in the list so it will look like this.
After the pages are all added the next step is to add in 301 redirects so you can redirect traffic from the old pages to the new wordpress pages. I download the “simple 301” redirect plugin for wordpress.
What you do here is for each page add on the left part of the menu
Just repeat this for each page and soon your site will be ported over.
Now that all the pages have been imported, the next step is to add a navigational menu or sidebar. I recommend using a menu and this makes it easy to add drop down menus to your site so that people can get to the areas of your site.
This is done in the “Appearance” section of the site and under menus. I’ll talk about that a little later.