1. Purpose and Scope

The purpose of this report is to provide a preliminary assessment of the ability of the Kuali Foundation’s Open Library Environment (OLE) Library Management System (LMS) to function in a BIBFRAME ecosystem. The Library of Congress will soon complete its implementation of the new Resource Description and Access (RDA) framework and will subsequently replace MARC as its standard for bibliographic description with the BIBFRAME model and vocabulary. The primary focus of the IMLS funded BIBFLOW project is to “develop a roadmap that the library community can reference for planning investments and changes over the coming years” as necessitated by the forthcoming changes in both bibliographic description and data transfer mechanisms. This report makes no assessment of OLE’s overall functionality as an LMS. It’s limited and specific focus is on its ability to function in a library that has migrated to a BIBFRAME/RDA ecosystem.

2. Summary of Findings

In its present release (OLE 1.0), OLE is not a linked data oriented product and is firmly rooted in an ontological universe that considers data as a collection of “fields” rather than semantic statements (the backbone of the the Linked Date ecosystem.) However, OLE’s bibliographic database (DocumentStore) is robust, extensible, and capable in its current form of dealing with linked data. As such the software’s barrier to linked data entry is driven by user interface and data processing logic limitations and not by the data storage architecture. This is advantageous from a development perspective. However, at present the only option available for deploying OLE in a linked data ecosystem is to use external tools to convert inward flowing data from BIBRAME (or whatever triplets-based data model defines the ecosystem) to MARK-XML and to likewise convert outward flowing data from MARC-XML back into a triplet representation. In other words, OLE must live behind a triplestore/MARC-XML conversion gateway.

3. OLE System Architecture

The architecture of a complete, functioning OLE installation has two main components: The OLEFS and OLE-DocumentStore [1]. The OLEFS handles financial and library management functionality while the DocumentStore handles the cataloguing, indexing, and searching of bibliographic, holding, and item records. Both of these packages are required for OLE to function properly; however, as the focus of the current study is the flow of bibliographic, holding, and item data and their associated workflows in a linked data environment, the focus of this report is the OLE-DocumentStore [2].

4. DocumentStore Overview

The OLE-DocumentStore is “a content management system with features like check-in, checkout, versioning, locking etc. for library records such as Bibliographic, Instance (Holdings and Items), Patron, License etc” [3]. It is a Java web application designed to run under a Java Servelet container such as Apache Tomcat [4]. It relies on two other primary Java applications, Apache Jackrabbit [5] and Apache Solr [6].
Apache Jackrabbit is a well supported, open source content repository with a strong development community and it is a Top Level Project of the Apache Software Foundation. Jackrabbit provides an “hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more” [7]. Considered on its own, Jackrabbit is an excellent choice for the storage, management, querying, and delivery of Linked Data.
DocumentStore uses Apache Solr to provide indexing services of the data (structured and unstructured) stored in its Jackrabbit instance. Solr is the industry standard for robust, flexible, and scalable document and data indexing. It provides “powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search” [8]. Apache Solr is the goto indexing platform for the Linked Data community. As such, considered on its own, Solr is an excellent choice for an indexing, search, and browse gateway for Linked Data data stores.
As the data storage and indexing bedrock of DocumentStore is well suited to function in a Linked Data ecosystem at the data storage, management, indexing, and retrieval level DocumentStore is well situated to handle Linked Data. However, the farther one moves in the application architecture from these core libraries, the more the application both requires and treats data as a static collection of pre-defined fields. In this way, the application is fundamentally MARC-like. The DocumentStore does advance and improve upon MARC’s flat-file (index card) rigidity by situating MARC fields in a relational, associative context. But, in its present form, it is not functionally equipped to handle semantically structure Linked Data.

5. Required OLE Enhancements for Linked Data Optimization

A significant investment in application development will be required to make OLE fully functional in a Linked Data Environment. First, new classes would need to be developed to interface with Jackrabbit and Solr using a triples based query system. Where as the current, relational model clusters all data related to a record around the records unique identifier, a triples based system must be able to follow chains of nth degree nested association in order to construct a complete record. This type of traversal is not possible given OLE’s current, system defined hierarchy [9]. Another way to put this is that OLE is currently not equipped to follow semantic connections in data.
As an example of the above problem, consider two Works about Queen Elizabeth, Work A and Work B. Work A is subject tagged “Early Modern” while Work B is subject tagged “Renaissance.” In a relational database model, a user searching for “Early Modern” would find Work A but not Work B. Likewise, a user searching for “Renaissance” would find Work B but not Work A. In a relational data model, such as the one employed by OLE, the only way to connect these two records so that they will both be returned is to: a) mutually add the other subject to each entry; b) normalize all subject entries; or c) add another associating table to the database schema and then populate it such that the two subjects “Early Modern” and “Renaissance” become equivalent through association.
In a native linked data system, the above described type of data normalization is not required. A user could simply enter a new sematic triplet stating equivalence: “Early Modern : Is : Renaissance.” A semantically aware triplets engine would then automatically associate all Early Modern and Renaissance tagged items, not through manipulation of data but through semantic processing. Because the OLE data store is currently not semantically structured, the application as a whole contains no semantic analysis processing classes. As such, conversion of the application to a native linked data triplestore would involve not only porting exiting user interfaces to a new data structure, but the development of semantic processing routines to stand between the user interface and the data store. Without this, the product runs the risk of adopting a triplestore data structure but forcing it functionally to operate as a relational data store.

Notes:

[1] See http://shrub.appspot.com/maven.kuali.org/release/org/kuali/ole/olefs-webapp/1.0.0/ and http://shrub.appspot.com/maven.kuali.org/release/org/kuali/ole/ole-docstore-webapp/1.0.0/.

[2] Note that BIBFLOW study participants unanimously agree that both financial and library management data could (and ultimately should) play an import role in a completely linked data ecosystem, providing valuable data to, for example, scholars of book history interested in either the economics of the book trade or in patterns of circulation and readership; however, this faced of the linked data problem is beyond the scope of the present study.

[3] Subrahmana, Peri, and Kathleen Gerdnik. “OLE DocumentStore – Kuali OLE – Kuali Wiki.” OLE DocumentStore – Kuali OLE – Kuali Wiki. The KUALI Foundation, n.d. Web. 22 Aug. 2014. https://wiki.kuali.org/display/OLE/OLE%2BDocumentStore.

[4] http://tomcat.apache.org/.

[5] http://jackrabbit.apache.org/.

[6] http://lucene.apache.org/solr/.

[7] “Welcome to Apache Jackrabbit.” Welcome to Apache Jackrabbit. The Apache Software Foundation, n.d. Web. 22 Aug. 2014. http://jackrabbit.apache.org/.

[8] “Apache Solr.” Apache Lucene -. The Apache Software Foundation, n.d. Web. 27 Aug. 2014. http://lucene.apache.org/solr/.

[9] See above Subrahmana, Peri, and Kathleen Gerdnik.