A Look Behind the Curtain: Web Archiving at Purdue University

In celebration of World Digital Preservation Day, here is a quick look at the Purdue University Archives and Special Collections (ASC) web archiving program.

Why do we want to preserve the internet?

The internet is an intrinsic part of our lives that provides valuable insight into our culture, history, and society. For many people, it is the primary medium through which they communicate and learn about the world.  Since the internet is such an important part of our society, we sometimes take it for granted that information on the internet will remain accessible forever. But the internet is constantly changing, and it is not guaranteed that a website or content on a website will be there in the future.  As a result, many archives and institutions have turned to web archiving to ensure continual access to internet content. Web archiving is the process of harvesting or crawling web content and preserving its original functionality and form.

Purdue ASC Web Archiving Program

In 2009, President Cordova recognized the need to preserve Purdue University history by signing a resolution charging the Archives and Special Collections (ASC) with responsibility for stewarding Purdue’s historical record. In fulfillment of its charge, the Archives and Special Collections implemented a web archiving initiative in June 2012. This initiative allowed for the preservation and ongoing accessibility of websites that align with ASC’s collecting areas: Purdue University History, Women’s History, Flight and Space Exploration, and Psychoactive Substances Research.

To fulfill this initiative ASC uses a web archiving platform called Archive-It, which was developed by the Internet Archive.  Archive-It enables ASC to crawl/harvest websites and verify that the archived websites has been properly captured and preserved. It also allows archived websites to be accessed online for anyone to view.

What have we accomplished?

Since the program’s inception, ASC has archived over 3.5 Terabytes of website content. Some of ASC’s archived websites, such as the personal website of ethnomycology researcher John Allen, are no longer available on the internet.

John Allen’s Archived Website

MSP 73, John Allen digital images and writings, Purdue University Archives and Special Collections, Purdue University Libraries.

The Purdue Alumnus is a magazine that has been published by the Purdue Alumni Association since 1907.  The Purdue Alumnus is valuable resource for researchers interested in Purdue’s history.  Recently, the Purdue Alumni Association decided that the Purdue Alumnus would no longer be printed and instead only be published online. This is a common pattern with a lot of documents and records at Purdue that used to be available in print but are now only available online through websites or social media. When it was announced that the Purdue Alumnus would transition to online only, ASC was able to continue preserving new issues of the Purdue Alumnus through our web archiving program. As more printed resources migrate to the internet, ASC’s web archiving program will continue to grow.

Purdue Alumnus Archived Website

What’s Happening Now?

In the past couple of months, ASC undergraduate assistant, Max Splaine, and Archivist for Digital Preservation, Ben Parnin, completed a reassessment of ASC’s web archives. This assessment was started to verify that websites were being properly crawled and to identify important websites that were not being crawled. During this reassessment project redundant crawls were eliminated and over 90 new websites were added to the ASC web archives. The assessment also revealed ASC’s current web archiving workflow needs to be updated to account for the increased numbers of websites that have been added to the ASC’s web archives. We are currently in the process of evaluating and restructuring our web archiving workflows to ensure that the internet continues to be preserved and accessible.

The entire ASC web archive collection is accessible online.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.