Log in

No account? Create an account

Previous Entry | Next Entry

Robots.txt vs the Internet Archive...

Over at the Waxy.org post about the sudden disappearance of Suck.com, there was a question to why the Internet Archive doesn't display the Suck.com archives. As per their policies, they do not archive any site where a Robots.txt file disallows web crawers. But in addition, such a newly placed Robots.txt will signal to the archives to block access to the entire history of the site. This is done for the benefit to the website operators who do not learn of the archival until after the fact. While this is a laudable gesture on their part, this presumes steady ownership and control over the domain, which the Suck hijacking shows that this isn't an accurate assumption to make. Frankly, I believe now every website operator knows of the existance of the Archives. Therefore, I think if a website operator wishes old, already archived content to be removed or purged, he or she should write a letter to the Archive stating so. Robots.txt could still be used to prevent archival of current texts. (Or course, whether blocking content like this to the Archive is in the best interests of the public is a separate question entirely.)