Transitioning to Linked Data is not a data transformation activity. Libraries have extensive experience transforming data from one format to another. While crosswalk processes can be cumbersome and time consuming, they are well understood and we are quite good at them. Transitioning to Linked Data, however, requires more than simply mapping fields across data models and performing necessary data reformatting to comply with the specifications of the new model. Transitioning to Linked Data requires adding new data to each record, data that can often be difficult to disambiguate by machine. Specifically, a successful transition to a Linked Data ecosystem requires adding numerous shared, publicly recognized unique identifiers (a Uniform Resource Identifiers, or URI) to each record at the time of transformation.
URIs form the backbone of the Linked Data ecosystem. The fundamental concept is to provide a unique, machine actionable identifier for all entities in a graph. Thus, for example, whereas a human might say:
Figure 2: Human readable triple
A Linked Data representation of the same statement would look like:
Figure 3: Machine readable triple
When we refer to URIs as “machine actionable” or “machine traversable,” we mean to say that an identifier is uniformly recognized by independent computing systems, allowing them to use it to link things being said about the same entity by different people or telling it about a relationship that can be used to control function and output. For example, if you have a collection of records that says that “Shakespeare wrote Hamlet” and I have a collection of records that says “Shakespeare wrote Romeo and Juliet,” adding URIs to our records allows a computer to infer that “Shakespeare wrote Hamlet and Romeo and Juliet.” Similarly, if we used URIs to identify Hamlet and Romeo and Juliet, the computer could search across the network for things that others have said about each of these plays.
Figure 4: Dynamic Linked Data graph
The above figure shows a partial graph of relationships between Hamlet and Romeo and Juliet that was dynamically created, with no human intervention, by traversing URI based statements about the two plays that are currently available as Linked Data on the internet.
For a full discussion of the function and benefits of Linked Data see the “Why Linked Data” section of this report. For the present purposes, what concerns us is the role that URIs serve in the Linked Data universe. A Linked Data graph is only as good as its URIs. If two individuals use two different URIs for the same entity, William Shakespeare for example, then to the computer there are two William Shakespeares. As such, proper URI management is essential to the Linked Data effort.
Several organizations, such as Getty, the Library of Congress, OCLC, and VIAF, currently make available Linked Data gateways that provide URIs for entities and controlled vocabularies widely used by libraries and cultural heritage organizations. Using these resources, organizations can lookup shared URIs for entities (people, organizations, subjects, etc.) Similarly, BIBFRAME defines a set of relationships for which public URIs have also been minted.
From a data perspective, the primary obstacle to transitioning to Linked Data is associating the literal representation of entities in MARC records (Shakespeare, William, 1564-1616) with machine actionable URIs (http://viaf.org/viaf/96994048). This association must be backward implemented on all legacy records (a daunting task) and library systems must be updated to create the association when dealing with new records or editing existing ones (a potentially difficult task since most libraries rely on vendor software over which they have little control to perform this work.)
In addition to the technical problems presented by conversion of data, transitioning to Linked Data also brings with it a host of potential systems and workflow issues. Current library operations rest on workflows designed for and performed by staff with specialized and advanced training and knowledge. Changing the required output of these workflows could potentially have dramatic effects on the workflows that create it. Section VII of this document discusses these changes in depth.
Finally, transitioning our data and workflows will also necessarily impact library systems and information flow. The figure below is a diagram of the numerous systems in place at the UC Davis library that communicate either directly or by association with our library catalog:
Figure 5: Library systems diagram
As depicted in the above diagram, 40 different systems connect either directly or indirectly with our library catalog. Each of these connections represents a potential point of failure during a Linked Data Transition, further complicating any imagined or real transformation process.