A Preservation Primer

As we discuss at length in our white paper “A Scalable and Sustainable Approach to Open Access Publishing and Archiving for Humanities and Social Sciences”, preservation of the scholarly record should be at the heart of any scholarly communication enterprise — and academic and research libraries have an important role to play in ensuring that content is preserved and migrated forward for future access.

This primer on preservation, graciously provided by Portico for us to share here, summarizes the issues and outlines possible steps an organization might take to begin planning for long-term digital preservation of its content.

A Preservation Primer

Provided courtesy of Portico

In the last several decades there has been tremendous growth in the amount of digitized content created by libraries, publishers, and cultural institutions in an effort to make content broadly available. There are great benefits to having content available in digital form, however, unlike print objects — which, when they have been printed on acid-free paper and are held in reasonable conditions, can last for many decades with only minimal attention — digital objects can be extremely short-lived unless preservation attention is regularly provided. The community is now beginning to understand that the substantial and growing investment that publishers and libraries are making in creating digital objects must be met with a commitment to protect this content for the long term. Producers of digital content may find themselves wondering how to protect their content, and may be asking questions such as the following:

  • If the IT department backs up the server, is that sufficient?
  • If the high-resolution files are on an external drive in Joe’s office, are we okay?
  • If we make a tape backup every few months, are we covered?
  • What can we do to make a digital collection “safe enough”?

There is no one answer to these questions. Rather, answers are dependent upon the type and needs of the collection, the content owners, the users, and the organization. However, these questions provide a useful starting point to consider various short and long-term preservation options, which can be placed along a continuum that begins with near-term protection and concludes with full preservation and long-term protection.

A. Near-Term Protection: Backup

Backup — when content is copied and stored in multiple locations to create readily available data replacements in case of equipment failure or other catastrophe — is understood to be a requirement for protection of near-term access. Proper backup of electronic assets is imperative for business continuity and necessary to ensure that access to content will not be interrupted in the near term. A well-managed backup system can quickly resolve problems with content needed this week or next month, but not over the long term. Backup is typically implemented with commercial software, and, often, content may only be retrieved via the software with which it was originally backed up. If special software or hardware is required to access the content, the future long-term accessibility and authenticity of the content — key goals of digital preservation — cannot be assured.

B. Mid-Term Protection: Byte Replication

Byte replication is a process whereby multiple, identical copies of files are created. The copies may be written to other online computers or to offline media. These replicas are typically held in diverse geographic locations and specialized software is not needed to access the content. This diversity in copies and location, together with the lack of reliance on special software, ensures that byte replicas will provide content that is authentic and usable for as long as the file formats remain usable. However, simple byte replication includes no provision for ensuring the content is usable when the file formats are no longer current, nor is there any inherent provision for ensuring that the content remains discoverable. For example, if a series of book files are byte-replicated without accessible bibliographic information describing the intellectual content of the replica, there is no guarantee that an end user in the future will be able to find the specific content he or she needs.

C. Long-Term Protection: Managed Digital Preservation

Digital preservation is defined as a series of management policies and activities necessary to ensure the enduring usability, authenticity, discoverability, and accessibility of content over the very long term. The key goals of digital preservation include:

Usability: The intellectual content of the item must remain usable via the delivery mechanism of current technology.
Discoverability: The content must have logical bibliographic metadata so that the content can be found by end users through time.
Authenticity: The provenance of the content must be proven and the content must be an authentic replica of the original as deposited.
Accessibility: The content must be available for use to the appropriate community.

In order to successfully perform managed digital preservation, as defined here, an organization must have:

  • a mission to carry out preservation that provides an environment conducive to the specialized planning and infrastructure needed to support digital preservation,
  • a sustainable economic model to support the preservation activities over the required time period for the digital content,
  • clear legal rights to preserve the content,
  • a relationship with the content provider and/or copyright owner,
  • relationships with the users of the content to ensure that their needs are met,
  • a preservation strategy and policies consistent with best practices and a technological infrastructure that is able to support the selected preservation strategy, and
  • transparency in regards to its preservation services, strategies, customers, and content.

It is important to note here that backup and byte replication are required elements of long-term preservation and thus are appropriate first steps in protecting content through preservation.

What is the right choice for an organization’s preservation strategy?

For an organization that is just beginning to contemplate and plan for long-term digital preservation, it is possible to take an incremental approach. The most important initial actions include:

  1. Locate all content.
  2. Initiate regular backups.
  3. Test retrieval from backups.
  4. Develop a long-term preservation plan.

If an organization takes these steps, it will be on a path to implementing a long-term preservation plan that will ensure that users have access to the material in the future. Organizations can develop the ability to do long-term preservation themselves, develop this ability collectively, or partner with a third-party preservation service. The important thing is that an organization responsible for the creation or use of content understands the key issues, what is at stake, and the options for moving forward with an effective plan for preservation.