PSA: Preserve Your Website

To my website-running friends:

I bet you’ve made some amazing content over the years. But wouldn’t you like to save it for posterity? Sadly, you may not be doing so, because of a file called “robots.txt”. This file keeps “robots” (e.g. the Googlebot, the software that allows Google to create search results) from indexing certain files (using the Google example, keeps them from listing your site in search results). This can be very good in some cases, but for one particular robot, it can keep your stuff from being archived.

That robot is known as the “Internet Archive”.

This robot crawls the Internet and saves web pages for posterity, and is an invaluable resource for historians and ordinary viewers alike. However, it respects the “robots.txt” file and will even keep past archives of your site from being shown because of it.

So what can you do?

If you have a robots.txt file in your website files right now, check and see if it has something akin to the following:

User-agent: *
Disallow: /

If you have that, either remove it (if you don’t care about robots accessing your site) or add the following below that:

User-agent: ia_archive
Disallow:

This will allow the Internet Archive to keep your website for generations to come!

Now, if you don’t know if your website has such a file, simply go to https://archive.org/web and type your web address into the bar at the top, and click on “Browse History”. If it comes back with an error that says “Page cannot be crawled or displayed due to robots.txt.” then your site is set up to block it. (For those of you who use WordPress: you should be good!)

Thank you all for helping to keep the Internet’s history alive.

Jacob T.

(Also, you should consider donating to them here if you like what they do!)

Share this:

Published by Jacob Turner