Sterling Ambivalence (sterlingnorth) wrote,
Sterling Ambivalence

Robots.txt vs the Internet Archive...

Over at the post about the sudden disappearance of, there was a question to why the Internet Archive doesn't display the archives. As per their policies, they do not archive any site where a Robots.txt file disallows web crawers. But in addition, such a newly placed Robots.txt will signal to the archives to block access to the entire history of the site. This is done for the benefit to the website operators who do not learn of the archival until after the fact. While this is a laudable gesture on their part, this presumes steady ownership and control over the domain, which the Suck hijacking shows that this isn't an accurate assumption to make. Frankly, I believe now every website operator knows of the existance of the Archives. Therefore, I think if a website operator wishes old, already archived content to be removed or purged, he or she should write a letter to the Archive stating so. Robots.txt could still be used to prevent archival of current texts. (Or course, whether blocking content like this to the Archive is in the best interests of the public is a separate question entirely.)
Tags: internet
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.