Step Two: Batch Conversion of Legacy MARC Records

Concurrently with, or after, migrating human workflows to native Linked Data operation, legacy MARC records must be converted to Linked Data graphs and stored in the new graph database. (As noted before, this database may not be strictly graph based, but the MARC records must be migrated to the new model regardless.) Automated transformation is made possible because needed URIs were added to MARC records during Phase One of this transition plan. This process will primarily involve technical staff, but libraries should expect to devote one cataloger familiar with both MARC and BIBRAME (or an alternate Linked Data model) to the effort in order to facilitate proper data mapping and to validate output.

Several viable tools are currently available for performing conversion of MARC records to Linked Data graphs.

Library of Congress Transformation Service:

Figure 15: Current Library of Congress MARC to BIBFRAME Transformation Service

The current release of the Library of Congress MARC to BIBFRAME Transformation Service is a web-based service suitable for testing conversion from MARC to BIBFRAME 1.0. The Library of Congress is currently working on an open source, BIBFRAME 2.0 version of the software that can be installed locally and used to transform MARC to BIBRAME 2.0, the latest BIBFRAME standard. This software is soon to be released. The MARC to BIBFRAME Transformation Service has undergone extensive testing at the Library of Congress and will provide excellent MARC to BIBFRAME transformation. The software runs efficiently and produces a minimal required storage footprint. Additionally, the transformation engine is highly flexible, using an XSLT transformation service to traverse a MARC-XML DOM and output data in any text-based format. The Library of Congress provides XSLT for MARC-BIBFRAME conversion only, but with custom developed XSLT services the software could export transformations using any single or combination of ontologies and frameworks and in any Linked Data serialization. As such it represents a good choice for libraries interested in producing strict BIBFRAME with few alterations and for libraries with in-house XSLT expertise that are interested in converting to frameworks other than or in combination with BIBFRAME.


Figure 16: MarcEdit

Most librarians are already familiar with Terry Reese’s MarcEdit software. An import feature of MarcEdit is its MARCNext component, which provides a collection of tools for manipulating MARC with an eye towards Linked Data transformation. Two particular tools are of use in this regard: 1) a highly configurable transformation service; and 2) the ability to export MARC records as a SQL database.

MarcEdit’s transformation engine is highly flexible, using an XSLT transformation service to traverse a MARC-XML DOM and output data in any text-based format. This could include RDF-XML, Turtle, or any other form of Linked Data representation. Using this system’s libraries, one can easily run multiple transformations on the same collection of MARC records. This allows libraries to produce specific outputs for specific uses. For example, a library could run transformation as BIBFRAME for interlibrary use and another as for search engine optimization. Additionally, Terry Reese also maintains a public forum where XSLT transformation scripts can be shared. This means that one library could use another library’s BIBFRAME transformation out of the box, or modify it for a particular purpose and share with other libraries.

MarcEdit’s ability to export MARC records as a collection of SQL scripts is also potentially quite useful. Exporting records to a SQL database opens the door for complex querying of data. Storing records in an accessible SQL database can simplify the transformation process for those libraries interested in writing their own, stand-alone transformation scripts or applications. All widely used scripting and programming environments have packages that provide easy access to a variety of SQL databases, simplifying the process of querying records as part of a transformation process.
MarcEdit provides a highly flexible platform for shared development of transformation script. As such, it is a good tool for libraries interested in performing multiple transformations and/or sharing in communal development of transformations. A potential drawback of the tool is that it is a Microsoft Windows only tool and can only be deployed on Windows based servers or desktops. As such, it is only a suitable option for those libraries that operate in a Windows environment.

Extensible Catalog:

Figure 17: Extensible Catalog

The XC Software Suite is a suite of web applications focused on performing various transformation and connectivity functions. Like MarcEdit, The XC Metadata Services Toolkit (MST) provides a flexible engine for transforming MARC records into other formats. Whereas MarcEdit uses XSLT to perform transformations, the MST connects with ILS through the OAI-PMH protocol and then exposes records in a desired format based on customized Javascript transformations. Like MarcEdit, a community repository of transformation scripts is available, and can facilitate co-creation of scripts that allow libraries to expose record data in multiple forms.

The MST is a web-application that runs as a Java Servlet under server engines such as Apache Tomcat or Jeti. Administrative users use a web interface to manage transformation “Services” that map identified record sets to the Java transformation scripts. A valuable feature of the MST is that Transformations can be run one time only; or, the service can poll the ILS for changes and execute the transformation as need to keep the graph representation synchronized with the MARC data store. Transformed data sets are made available through an API. The MST can be run on any system that supports Java Servlets. This includes Linux, Mac, Unix, and Windows.

The MST is good option for libraries with in-house server administration technical expertise and the computing infrastructure necessary to run a Java Servlet container. An ILS that supports OAI-PMH is also required, or the ability to install and maintain a service that uses APIs or exported MARC data to provide an OAI-PMH gateway. (The Extensible Catalog suite includes a MARC-XML to OAI-PMH gateway.) A particular disadvantage of Extensible Catalog’s MST is it requires significant physical storage. In order to provide its synchronized transformation service, it maintains a local copy (SQL) of the entire catalog as pulled using OAI-PMH. As such, a single pipeline of transformation from the ILS to BIBFRAME results in three complete instantiations of the catalog: The original in the ILS, a copy in the MST SQL database, and the exposed BIBFRAME version.

Custom Application:

For libraries with robust technical services departments who are familiar with the various APIs of their various ILS, building a custom conversion tool could be an option. Our initial testing indicates that it will typically take from one to three months of full-time programming to code and test a fully functioning, stand-alone, custom conversion tool. Building a custom tool offers few advantages. It can, however be useful in cases where the records being converted are stored in more than one system or when attempting to combine records of different formats that reference the same object. For example, a not uncommon situation is for libraries holding special collections to maintain both a MARC record and an EAD record for the same object. Linked Data offers the opportunity to combine these two records into a single graph. In such cases, a custom application designed to communicate with both the MARC and EAD systems would be more efficient than using existing tools to create separate graphs and then applying a post-creation system of combining the graphs.

Third Party Service:

Zepheira Inc. will work with your library to either assist with or completely handle a transformation process. To date, Zepheira has worked with the Library of Congress, a host of public libraries, and the American Antiquarian Society, to name a few, to convert their existing MARC records. It can be expected that other vendors will also move into this space as the number of libraries planning on transforming records increases. Third party conversion services could focus on conversion of individual libraries or, taking advantage of economies of scale, provide a common, shared point of conversion and distribution. Libraries currently participate in shared cataloging through OCLC. A similar vendor service (OCLC is a natural point of service) that performs batch conversion and distributes converted records to libraries is a natural extension of the services that are already employed at libraries.

<<  Step One: Staged Transition to Linked Data Native Cataloging Step Three: Iterative Conversion of Non-Catalog Library Systems  >>
Return to BIBFLOW Roadmap Table of Contents