Reflections from the Pratt Digital Preservation & Archives Fellow at NYPL: Part 2

By NYPL Staff
May 2, 2018

During my second semester as the Pratt Digital Preservation & Archives Fellow, I’ve been working with Digital Archivist Susan Malsbury in the Archives Unit of Special Collections, to tackle issues surrounding software preservation.

As a graduate student, when I think about digital archiving—especially in the context of the performing arts—I tend to think about audiovisual recordings, images, and ephemera, and the challenges surrounding their associated file formats (MOV, WAV, TIFF, etc.). These formats have specific needs in terms of preservation and access, but because they are commonly associated with archival collections, the digital preservation community has established a set of processes and best practices for caring for them.  

However, there are several collections held by the Library for the Performing Arts that contain unique file formats, which are dependent on their native software to be read, and whose preservation and access needs require further definition than more common format types. My project during the spring semester has been centered around one such group of files—those produced by the proprietary Lightwright software.

Screenshot of Lightwright Version 5 demo

Screenshot of Lightwright Version 5 (demo) GUI

Lightwright is a software that is widely used by lighting designers and production staff to manage theatrical lighting for plays and musicals, dance performances, and other live events. It combines a relational database with a graphical user interface that allows users to create lighting designs and related paperwork.
Since the debut of Lightwright in 1988, six versions of the software have been released, resulting in more than 15 associated file formats. Currently, six of these formats are represented in three separate collections at NYPL: the Tharon Musser designs and papers, the Jules Fisher and Peggy Eisenhauer papers and designs, and the Merce Cunningham Foundation records.

The first step in my work was to map out the current preservation environment for the Lightwright software, and identify any gaps present in the resources developed by the digital preservation community. In particular, I focused on registries that aggregate technical information about software products, their support lifecycles and requirements, and the file formats each product can read and write. My research led me to examine three registries:

Aside from the website maintained by the developer of Lightwright, there was very little information available on the software and its file formats. There were no entries in either PRONOM or the Archivematica FPR, and the information recorded in Wikidata and Wikipedia was incomplete.

The next step in my process was to fill these gaps through further research and contribution to the aforementioned registries. Beginning with the website and user manuals, I established a list of the different software versions, and their release dates, programming languages, and system requirements, as well as their associated file formats. I was also fortunate to interview the developer, John McKernon, who provided further insight into the software’s history and system requirements, and information on some of the lesser-used file formats.   

Developing a file format signature

At this point, I felt I was ready to begin developing file format signatures for submission to PRONOM. NYPL uses a file profiling tool called DROID to process digital files, and this tool references the PRONOM registry. Adding Lightwright signatures to PRONOM will help archivists at NYPL identify those formats in the future. I decided to focus my efforts around the Show File format, as it represents the majority of files in the NYPL collections.

There are a number of great resources that detail how to create a file format signature for beginners. Taking cues from a blog post by Jenny Mitcham at the University of York on creating a signature, and another by NYPL’s Head of Digital Preservation Nick Krabbenhoeft on using bash to help verify your signature hypotheses, I developed six separate signatures, one for the Show File format associated with each of the six Lightwright software versions.

Archivematica FPR

Next, I updated NYPL’s local Archivematica Format Policy Registry (FPR). NYPL uses Archivematica to help prepare digital files for preservation and eventual access. The FPR is a database which allows users to define format policies which Archivematica then queries when performing identification, characterization, and normalization steps on file formats. Artefactual hosts an FPR server that stores structured information about format policies for Archivematica, but users can also maintain local rules, add new formats, or customize the behaviour of Archivematica in a local FPR.

This proved to be useful for the files associated with the Lightwright software, as there were no entries for these formats in the FPR maintained by Artefactual. I again focused my efforts around the Show File format. Adding the Lightwright formats to NYPL’s local FPR was relatively simple, being that my format signatures had not yet been reviewed by PRONOM and the only available command was for Archivematica to identify the file formats by extension.  

Linked Data for Software Preservation

In addition to my work on the Lightwright file formats, my research produced significant information pertinent to the long-term preservation of the software itself. I wanted to make this information available to the software preservation community at large—Wikidata provided me one method by which to do this.

Wikidata is a knowledge base of structured data which, similar to Wikipedia, can be edited by anyone with access to the internet. A project at Yale University Libraries is spearheading an effort to employ Wikidata toward the capture and collation of metadata to describe file formats, software, operating systems, and hardware, and inform digital preservation work. Although the creation of records in Wikidata for this purpose is, in large part, carried out by experts, contribution by the broader community is encouraged.

Initially, Wikidata served to align the language versions of a given Wikipedia page; as such, most Wikipedia articles have associated Wikidata entries. At the time I began my project, there was already a Wikipedia article for Lightwright, so the software was represented in Wikidata, but the information recorded in both was incomplete.

Before editing or creating new records in Wikidata, I read through the resources provided by the WikiProject Informatics pages for file formats and software properties. Keeping these properties in mind, I mapped out the Lightwright file format family. This helped me establish which records needed to be created or edited, record key technical information, and define the relationships between software versions and their file formats within the file format family structure. Ultimately, I created Wikidata entries for the Lightwright File Format Family and each of the six Lightwright Versions, as well as their related Show File formats.

Looking forward

Upon completing these steps, I had managed to make significant progress toward the long-term preservation of the Lightwright software! Archivematica at NYPL will be able to identify the Lightwright Show File format using the local FPR and, upon review of my format signatures by PRONOM, by signature as well.
Additionally, such crucial information to the preservation of the Lightwright software as its versioning history, system requirements, and the file formats it can read and write—which had previously been spread across a number of resources—is now available to the software preservation community through Wikidata.

For more on Anne's projects, read Part 1 of her series as the Pratt Digital Preservation & Archives Fellow.