Building an open archive union catalog for digital archives
Shien-Chiang Yu
Department of Information and Communications,Shin-Hsin University,
Taipei,Taiwan,Republic of China,and Hsueh-hua Chen and Huai-wen Chang
Department of Library and Information Science,National Taiwan University,
Taipei,Taiwan,Republic of China
Purpose –In January 2002,the National Science Council of Taiwan launched a National Digital Archives Program (NDAP)and has proceeded with the implementation of a system related to the open archives initiative (OAI)framework.This paper aims to introduce the protocol and the prototype system of the project.
Design/methodology/approach –A general review of the project.
Findings –The OAI interoperability framework has received much attention from scholars of library and information sciences.In Europe and North America,many academic organizations and universities have undertaken theoretical studies,system design,and implementation of the OAI framework.In January 2002,the National Science Council of Taiwan launched a NDAP,and the institutional project of the National Taiwan University is one of its institutional projects.Now,the project has proceeded with the implementation of a system related to the OAI framework.
Originality/value –Provides information of value to information professionals.
Keywords Digital libraries,Taiwan
Paper type General review
Introduction The lack of interoperability is one of the significant issues that digital libraries (DLs)currently face.The inability to federate,filter and provide value-added services for remote content limits DLs to covering local holdings.One of the reasons is that each DL is aimed at the needs of a particular community (Suleman and Fox,2001).The open archive initiative (OAI)is one major effort to address technical interoperability among distributed archives (Liu et al.,2001).In essence it supports a system of interconnected components,where each component is a DL.OAI wa
s born in the meeting of Universal Pre-print Service that took place in October 1999in Santa Fe with Paul Ginsparg,Rick Luce,and Herbert Van de Sompel.The OAI referred to the Harvest system (Bowman et al.,1995).The motivation for this creation arose because different databases and systems were not interoperable.Therefore,related data or data from different fields of science were stored in different locations and were not integrated,which made the flow of data imperfect.Representatives participating in the meeting regarded it as necessary to develop an interoperable standard for academic electronic pre-print and related digital archives.Thus,OAI was established (Ginsparg et al.,1999).And in January 2001,OAI announced the open archives initiative protocol for metadata harvesting (OAI-PMH)to provide a feasible solution for the interoperability of network resources (Sompel and Lagoze,2000).
The Emerald Research Register for this journal is available at
The current issue and full text archive of this journal is available aldinsight/aldinsight/0264-0473.htm TEL
The Electronic Library
q Emerald Group Publishing Limited
DOI 10.1108/02640470510611472
In Europe and North America,many academic organizations and universities have undertaken theoretical studies,system design,and implementation of the OAI framework.In January 2002,the National Science Council (NSC)of Taiwan launched a National Digital Archives Program (NDAP)(w/).This is a major policy of the Taiwan government concerning digital content resulting from the proposal to create a knowledge based economy.Many universities and research organizations participate in this program,and the institutional project of National Taiwan University is one of its institutional projects.With seven sub-projects,this project is urgently attempting to build an interoperable mechanism to share and conserve all valuable collections,retrieve the digital collection
s of these content holders via a union interface,and allow the general public to access the digital collections.For these reasons,this project will utilize the OAI-PMH to carry out related studies and implement a union catalog system.This paper will introduce the concept of OAI,and discuss experiences in planning and building the union catalog of the institutional project of National Taiwan University.
The concept of the OAI-PMH
OAI after a period of testing announced the OAI-PMH in January 2001(OAI v1.0).In July 2001the revised version 1.1was issued and OAI-PMH 2.0was the latest and formal version which was published in June 2002(Sompel and Lagoze,2001a,b;Lagoze and Sompel,2002).OAI-PMH 1.0introduced the unqualified Dublin Core element set as a baseline for metadata interoperability.It focuses on facilitating the discovery of document-like objects.OAI-PMH 1.1was a revision of the 1.0specification taking account of changes to the emerging XML Schema specification.Both v1.0and 1.1were experimental in nature.OAI-PMH 2.0is a stable protocol,and no longer experimental.Once again the focus of the protocol expanded;now it was said to be concerned with the recurrent exchange of metadata about resources between systems.OAI has already submitted the OAI-PMH to the World Wide Web Committee (W3C),hoping it will become the international standard for metad
ata sharing.Through its independent platform and mutual operation,it can provide and promote the efficient dissemination of content.The intentions of the OAI-PMH include exposing and harvesting,which are defined in the OAI protocol as:
(1)defining a data provider which can expose its metadata through the
HTTP-based protocol;and
(2)defining a mechanism for metadata harvesting from repositories.
According to the different tasks in the OAI organization,there are two groups –data providers and service providers.Participants must in terms of their types of service,register for one of two roles (Figure 1).Figure 1.
The roles of data provider
and service
Building an open archive union catalog 411
Data provider.Maintains one or more repositories (web servers)that support the OAI-PMH as a means of exposing metadata..Service provider.Issues the OAI-PMH requests to data providers and uses the metadata harvesting from data providers as a basis for building value-added services.
The primary purpose of the OAI-PMH is incremental bulk transfer of metadata
(harvesting).There is no remote search facility.Instead,a provider of services acquires data from a dat
a provider,stores and processes it locally,and then supplies services to users based on that data (Suleman and Fox,2003).Besides the data and service providers,it includes server components to manipulate the data.
.Set.A set is an optional construct for grouping items for the purpose of selective harvesting.Repositories may organize items into sets.Set organization may be fla simple list or hierarchical.Multiple hierarchies with distinct,independent top-level nodes are allowed.
.Record.The OAI framework defines a record,which is an XML-encoded byte stream that serves as a packaging mechanism for harvested metadata.
Because each institution has its respective digital archive system with individual search interface,data structure,communication protocol,management policy and so on,there is an inability to federate,communicate and share data with each other transparently.For the purpose of interoperability the establishment and achievement of a union catalog will be the key,to providing the user with the ability to search all of the collected records from a single search interface.OAI-PMH provides the minimum complexity,but maximum convenience to fulfill interoperability,thus keeping the balance between functional enhancement and developmental simplification.This was the reason why
this project used OAI-PMH.Some of the advantages of adopting the OAI-PMH interoperability framework are listed below.
(1)It provides a new model for scholarly communication.The OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content.Furthermore,by adopting the metadata harvesting method it can cover a variety of media formats,data types and contents,etc.
(2)It is easy to implement.The OAI-PMH were designed to be very simple and efficient.By avoiding complexity,enabling an existing information repository to function as an OAI-compliant data provider is a relatively simple process (Breeding,2002).Protocol requests and responses,defined in the OAI-PMH only include six verbs (OAI 2.0):
.GetRecord:to retrieve an individual metadata record from a repository;.Identify:to retrieve information about a description of archive repository-standards and protocols implemented;
.ListIdentifiers:to retrieve record identifiers,optionally corresponding to a specified set or data range;
.ListMetadataFormats:to retrieve the supported metadata formats available from a repository;
TEL 23,4412
.ListRecords:to harvest records corresponding to a specified metadata format from a repository;
.ListSets:to retrieve the sets and subsets from a repository.
(3)It is open.Everyone can apply the framework of OAI-PMH to build various of
data provider or service provider.
(4)It adopts the web standard.The OAI-PMH uses current standards wherever
applicable on the internet.All data that is transferred in response to a request is encoded in an XML format defined using XML Schema and transmitted on HTTP.Taking advantage of these standards lets the OAI solve problems such as crossing platforms etc.
Institutional project of National Taiwan University
The Institutional Project of National Taiwan University is one of the NDAP’s institutional projects.There are seven institutions participating in the project, including the NTU Library,Departme
nts of Botany,Entomology,Geosciences, Anthropology and Zoology and the Computer and Information Networking Center.Its primary research scope contains the following:
.to understand the history and features of collections;
.to study various metadata formats both domestically and internationally;
.to understand relations among the metadata,the database and the system framework;and
.to understand the information demand and retrieval behavior of potential users During thefirst few years,the main task of the project has been to develop a management system capable of handling various types of metadata and to implement it in each institution(Yu et al.,2003),but not to integrate it among these institutions. The major reason for this is each institution has a specific application domain.These metadata records and digital image,picture,and voice)were edited or made by each institution,and different science domains have relations of differing frequencies,such as the geographic influence of ecological distribution(geosciences and zoology).Therefore,the following project is set to build a union catalog to integrate metadata records that harvest from various digital collection institutions.As with a digital collection portal,users,especially researchers,do not individually query databases and can directly fetch all of the related data.
The NDAP addressed the idea of creating a union catalog of National Digital Archives.A union catalog can be built based on two models,a collective union catalog or a distributed virtual union catalog.The former has the advantage of offering better search results,but has the disadvantage of a high construction cost.The advantage of adopting a virtual union catalog is the low construction cost,but it offers poor search results.In order to maintain the advantages of both models and avoid their drawbacks, the project is designed to adopt the OAI-PMH framework,with the program office playing the role of a service provider.National Taiwan University,Academia Sinica, the National Palace Museum,and the National Museum of Natural Science will participate in the initial phase to form the OAI test-bed team.They will build the union catalog of the national digital archives with the OAI-PMH to automatically harvest Building an open archive union
metadata from each repository periodically.When the test-bed team achieves its goal,
more repositories will participate in the system.
Implementing the union catalog
register forThe digital collection institutions should share resources with each other and provide users with a transparent access channel.To achieve this,the institutional project of National Taiwan University has made use of the OAI-PMH to create an interoperable system.Using this protocol,it will facilitate communication between service providers and data providers,and the data of digital collections can keep its original metadata structures or Dublin Core format.Besides,users can search and access resources conveniently through the OAI-based system.
Figure 2shows the initial system structure of the National Taiwan University,according to OAI-PMH.The operations of each component are described below.Data provider
This converts the metadata records from data providers to XML format in batches,and maps them to the Dublin Core metadata.The OAI-PMH only allows for the retrieval of identifiers and associated records from the remote repository.The data provider could base the response on current machine load or limit the frequency at which requests will be serviced.The records required for harvesting from the data provider to service provider consists of two parts,the identifier and datestamp,and the XML metadata record in the request format.Delivery data must be encapsulated by this format.Service provider
This provides for storing the responded metadata records from data providers and recording the related attributes,including the update time,the original system number,and the source of data,etc.It defines contents by index arguments to establish index files which facilitate the function of searching.It also creates and maintains the interfaces for user authorization and system administration and supplies web services to search and browse via internet.
Handles the recording and administrating of the metadata records and low resolution digital objects which are harvested from data providers.It also has the index tables for function of information retrieval and the parameters for system administration.It also provides the mapping arguments of metadata and the definitions of XML schema.The final objective in this union catalog project is depicted in Figure 3.Because of the OAI-PMH based on data and service providers having two intentions the exposing Figure 2.
The main structure of the
TEL 23,4414
and harvesting of metadata,the system should combine both of them not only through integrating the metadata records from each repository in the project but also through the OAI service provider to harvest the metadata records from worldwide OAI data providers.By integrating the two resources,the system can play the role of a metadata portal and provide users with a variety of metadata records from different data providers.
To sum up what has been analyzed above,before developing the system,several actions must be performed first:
.collecting the depth of data from each data provider,including the structure,syntax,semantics of the metadata and the relationships between the metadata and digital objects,and determining the extent and range of data which the repositories allowing to provide;
.sorting the metadata mapping tables for converting formats among metadata records,and establishing the XML schema definition of the metadata;.defining the access points of the metadata scheme in order to allow the system to access similar fields among different metadata;and
.creating the authority control of the metadata,which allows the correlation of one record field with another.
Issues and solutions
There are several issues when implementing the system.
(1)Metadata.Most data providers (institution in NTU project)can provide complete records,but a few data providers,because of the consideration of property rights and access restrictions,can only expose partial metadata Figure 3.
The final system structure
of the NTU
Building an open
archive union catalog 415