II. Why Linked Data

In 1998 the World Wide Web Consortium (W3C) published Tim Berners-Lee’s Semantic Web Road Map. In this essay, Berners-Lee lays out an “architectural plan” that, to this day, provides the foundation of the Linked Data ecosystem. According to Berners-Lee, “The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help.” This remains the fundamental ethos of the drive towards Linked Data—the idea that we can and should structure our data such that machines, without the aid of human readers, can follow threads of communication, building ever-deepening networks of knowledge.

A simple example will serve to clarify this concept. You’re watching movie version of The Lord of the Rings and you become interested in what influences might have inspired Tolkien in the writing of the book; so you turn to Google and search for “The Lord of the Rings Influences.” Here you find a Wikipedia page on Tolkien that tells you that he was a Catholic, and a student of Norse and Germanic mythology. You also see that Tolkien wrote many other works in addition to The Lord of the Rings and that all seem to be bear the influence of Tolkien’s study of both contemporary and ancient religions. You see also that both Neil Gaiman and Ursula K. Le Guin (among others) were heavily influenced by Tolkien. You know Neil Gaiman as a contemporary fantasy author, but you aren’t familiar with Ursula Le Guin, so you click on the link to her Wikipedia Page. Here you find out that she is a Science Fiction and Fantasy author whose works were, like Tolkien, heavily influenced by Norse Mythology and Anthropology. This prompts you to think about the relationship between Science Fiction and Fantasy as Genres, so you visit several websites devoted to the history of each; and, at each, you find that Tolkien occupies an important place in the lineage of both traditions.

The above is representative of how human readers traverse complex webs of information on a regular basis. At each stage in the traversal our reader could have followed multiple paths through the information web. The Wikipedia article on The Fellowship of the Rings alone contains 1,698 links to other sources of information. Our reader’s decisions to traverse particular paths are rooted in formal semantics, the ability to use context to determine which paths are most likely related to the information retrieval task at hand. The choice to investigate the Fantasy Genre in the example above was rooted in the knowledge that the tree of Tolkien’s influence included multiple relationships between Science Fiction and Fantasy authors.

Linked Data has one primary purpose: to allow machines to traverse the vast web of networked information with the same facility as human readers. Given a starting record or text, the computer, like our TV viewer above, should be able to identify webs of connectivity and traverse particular paths based on semantic decision making. The Non-Linked Data web does not allow for this kind of machine traversal. Linked Data does.

The image below is a rendering of a portion of an information graph created by a computer by traversing information about The Fellowship of the Rings and its relations across the various Linked Data resources already publicly available on the internet.

Figure 1: The Lord of the Rings seen as Linked Data

Here we see a vast network, or graph, of information surrounding an item of interest that the computer is able to generate using Linked Data. Graphs grow through iterations of traversal, starting with a core node, each of which reveals a new branch, or edge in the graph. These branches can be contained by limiting the number of traversal iterations, but they are theoretically infinite.

This is not the case for the Machine Readable Cataloging (MARC) standards upon which current library catalogs are built. Certainly, MARC can, and has for some time, been used to link various knowledge repositories. When we search for J. R. R. Tolkien in a library catalog and receive a list of works written by and about the author, the computer has enacted a kind of linking around the name J. R. R. Tolkien. MARC’s ability to facilitate this linking is, however, extremely limited for a variety of reasons.

MARC records are based on a complex data standard as currently defined and documented by the Library of Congress at https://www.loc.gov/marc. A key differentiator between MARC and Linked Data cataloging frameworks is that MARC is based on records whereas Linked Data is based on graphs. Unlike knowledge graphs, which are theoretically infinite, records have a fixed number of fields and subfields. MARC Authority records, for example, are composed of 183 fields. An individual cataloger cannot extend this structure vertically or laterally, which to say that one cannot add new fields to the system nor posit new relationships between fields. The standard’s field/subfield structure insures that relational knowledge can only extend two iterations from the object defined, and it also limits the things that can be said at each iteration. The only way to extend the framework is through a complex, top-down driven process of discussion and adoption involving many institutions and governing bodies, followed by the reprogramming of all software systems that deal with the records.

Graph based knowledge systems are not subject to any of the above limitations. They simultaneously strengthen the ability to describe objects using reputable controlled vocabularies while at the same time providing an extensibility that allows users to add new knowledge nodes (fields) to their descriptive graphs. One can capture all of the fields currently represented in a MARC record using references to the same controlled vocabularies (when applicable) and add additional information as appropriate.

<<  Introduction Transition Fundamentals  >>
Return to BIBFLOW Roadmap Table of Contents