Digital Media Preservation - Why Media Organizations Can’t Archive
Earlier this year I was fortunate enough to lead a study that examined the challenges faced by media organizations in managing digital content - in particular digital archives.
The digital revolution has brought many advantages to media organizations and content producers, but managing file-based workflows, and in particular digital archives - has proved difficult for smaller organizations.
As part of the study, I visited nine media organizations across Nepal, Sri Lanka, Malaysia, and Thailand. The executive summary of the report I produced on behalf of Internews can be read below...
The complete report can be viewed online here at the Internews Web site.
Executive Summary
Over the past ten years, media organizations around the world have been migrating from traditional “tape-based” formats (both analogue and digital) to all-digital and completely “file-based” production systems. Such systems are giving content creators powerful production tools, more efficient workflows, and novel methods for distribution and content discovery. However, “going digital” has also presented a unique set of challenges, at both organizational and operational levels, particularly for the safe storage and preservation of completed work.
This study focuses on digital archive management and will review the current “state of play” in archive management for content creators. It includes both academic and industry research, as well as the results of site visits to nine media organizations located in Thailand, Nepal, Sri Lanka, and Malaysia.
The purpose of this study is twofold: Firstly, to present an introduction to digital archive management for media organizations and content creators, helping to build awareness and greater understanding of the topic as a whole; Secondly, to provide guidance and practical advice, in particular for small- to medium-sized media organizations, as well as for those operating with limited resources, or under challenging environmental and political circumstances.
The findings of this study can be broadly divided into two areas. The first relates to the safe storage of content and the associated challenges faced by organizations attempting to store large amounts of data. The second is the effective cataloguing and management of digital archives, allowing content within an organization to be described, discovered and re-used.
Safe Storage
In terms of safe storage, it is essential to understand that digital media is dependent upon a chain of intermediary hardware and software components in order to be viewed, or played. These intermediary components are often subject to license restrictions, and, over time, may become obsolete, suffer mechanical failure or simply lose the data contained within them. What’s more, risks associated with technological obsolescence, combined with the limited data life expectancy of current digital formats, means that no digital storage mechanism can be considered “archival” in the traditional sense.
The best any media organization can hope to achieve is the longest possible period between rotations from one media format to another. For smaller organizations, the problem of safe storage has been exacerbated by the rapid increase in the capacity of affordable hard disks (with terabyte-sized drives now available for less than 100 USD). The concentration of large amounts of data onto a single device substantially increases the risk that mechanical failure, or the loss of a device, will result in a correspondingly large loss of material. Large capacity hard disks – with their ever-increasing volumes of data – have created a need for more effective backup and safe storage strategies, in particular for organizations without dedicated IT resources.
Larger volumes of data have also made it more difficult to create backup strategies that include duplicate sets of data for off-site storage. While duplication and off-site storage of media are important practices for all organizations, they are particularly relevant for organizations with content that has educational, cultural, and social value, and especially for material that may be considered politically sensitive at both individual and organizational levels.
Catalogue and Archive Management
At its most basic, “cataloguing” a digital archive means creating a list that describes items in a collection. More sophisticated catalogue “management” systems allow items in a collection to be accessed from various contextual or referential perspectives, including the selection or filtering of a collection based on descriptive fields, or subject classification terms. A comprehensive electronic catalogue and archive management system also offers opportunities to discover and re-use digital content, as well as the opportunity to share, aggregate, or transfer content between systems and across organizations.
Effective catalogue and archive management strategies for organizations that create media for education, health and development purposes can also support and enhance the aims of such organizations by facilitating the dissemination and re-use of such material. This study describes the cataloguing and archive management process from both a library science point of view, as well as from the practical point of view of a typical media organization. From a library and information science perspective, cataloguing and archive management efforts typically require the use of formal methods and published standards designed to support the creation of scholarly and institutional archives. For such efforts, great importance is placed on the creation of verifiable and citable bibliographic records, as well as the provenance, accession, and authenticity of material in a collection.
From a media organization’s point of view, cataloguing and archive management is typically focused on providing the minimum amount of information required to support the re-use of archival material in new projects. The priority for most media organizations is the efficient production of content. As such, innovation – including vendor support – has tended to focus on providing solutions for efficient digital production and workflow, with cataloguing and archive management solutions often poorly integrated into the rest of the production process.
However, as media organizations have accumulated more content, the need for more effective cataloguing and archive management strategies is becoming apparent for two reasons. The first is that the longer material is kept in an archive, the greater the likelihood that material will begin to acquire historical value. The second is that media reaching the end of its effective shelf-life will require rotation onto new formats if it is to be preserved, and an effective catalogue and archive management system can help to support that process. It should be noted, however, that organizations with large quantities of shelf- and tape-based material will require substantial resources for the capture and transfer of media from tape, and that – even with the support of an archive management system – the cost, time and effort required to capture and catalogue such material means that few organizations have the resources required to do so.
While the needs of an institutional or academic archive may be very different from the needs of a production-focused media organization, this study concludes that media organizations can benefit from adopting some of the principles and open standards associated with library science-based archive management, helping to prevent “vendor lock-in” as well as creating additional opportunities for the discovery, re-distribution, and transfer of material within and across organizations.
Summary of Findings
The move from “tape-based” and other “playable” formats to “all-digital” file-based production systems has created significant challenges for small- and medium-sized media organizations, particularly for those without dedicated IT resources.
Noteworthy in the challenges faced by “going digital” is the fact that audio and video tape formats had previously provided a convenient format for safe storage (with up to a ten-year or greater shelf-life) as well as an easy to understand and use shelf-based catalogue management system. All-digital file-based systems, on the other hand, have introduced additional dependencies on software, hardware, as well as file management tasks, and have arguably increased the risk of material being lost as the result of the failure or loss of media-containing devices.
None of the organizations visited as part of this study (with the exception of Thai PBS) had effective backup strategies in place for their digital content. Nor were any of the organizations preparing duplicate or off-site data sets for safe storage. As a result, nearly all of these organizations had experienced data loss due to the mechanical failure of hard disks or the inability to read optical media.
It should be noted, however, that the challenges of implementing regular backup and safe storage systems are not unique to the organizations visited as part of this study. Anecdotal evidence suggests that most media organizations are struggling in this area, with many having also suffered significant data losses as a result. Nor are these challenges unique to the problem of archive management. The procedures and systems required to safely store archival material overlap with the procedures required for good data management practices in general. As such, media organizations are in need of robust systems designed to prevent data loss, as well as assistance in coping with data in quantities that were previously only found in professionally run data centers.
Organizations with requirements to safely store material beyond the production lifecycle are also in need of effective cataloguing and archive management solutions – solutions that will offer the maximum possible period between media rotations, as well as offer the advantages of being able to search, select, and re-use material. Ideally, such systems should also be based on published and open standards, and therefore able to support strategic initiatives – including the re-purposing, or re-distribution of content, as well as the transfer of content between organizations.
Education and awareness programmes will also form an important part of any digital archive management strategy. So too will guidance and practical suggestions that media organizations of all sizes can benefit from when attempting to implement an archive management solution.
Two solution-focused appendices have been provided at the end of this report, aiming to serve as signposts for the successful implementation of archive and data management systems for smaller organizations. Opportunities also exist for media organizations that belong to a network of organizations to co-operate and share experience and knowledge in the area of archive management, perhaps even through the creation of centralized and shared deposit facilities for off-site storage and data safety.
This study also suggests that agencies that fund media development and content creation have an opportunity (and possibly even an obligation) to provide support in the area of digital archive management, in particular for public media organizations, and especially where material of educational and social value is being produced.
However, while the findings of this study suggest that many organizations are struggling to create and maintain effective digital archive management systems, it is also important to ensure that any assistance offered to help develop such systems is supported by organizations or individuals with appropriate skills and expertise. Projects must be run ethically, making certain that material is handled with respect, and that appropriate measures are taken in order to prevent the accidental loss or damage of content. A digital archive management project should also have clearly defined objectives, and make an important distinction between shorter-, longer- and long-term archive requirements. Any attempt, or claim, to support the long-term preservation of “digital heritage” must ensure that a standards-based approach is followed, and that the systems and infrastructure required for such an effort are available, either from supporting organizations or through partnerships with institutions capable of providing such services.