Metadata and Resource Exchange Using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

Simeon Warner


This tutorial will provide an overview of best practices for OAI-PMH data providers and how they relate to the expectations and needs of service providers. We use practical experience from harvesting projects and the validation service to highlight pitfalls and ways to avoid them. We will also review best practices for the production of shareable metadata including character encoding, namespace and XML schema issues, and the support of multiple metadata formats.

These ideas will be extended with a discussion of OAI-PMH applications beyond bibliographic metadata exchange, with a focus on resource harvesting. We will describe how representations of resources using complex object formats (such as MPEG-21 DIDL, METS and SCORM) fit into the OAI-PMH data model. We will show how they can provide the basis for a robust resource harvesting framework using the OAI-PMH.

Target Audience

Those implementing the OAI-PMH. Attendees should already be familiar with OAI protocol basics and underlying technologies such as XML, W3C XML Schema and metadata formats. Experience with either data provider or service provider implementations will be of benefit.


Simeon Warner is a Research Associate in Computing and Information Science at Cornell University. He is one of the developers of the arXiv e-print archive and his research interests include web information systems, interoperability, and open-access scholarly publishing. He has been actively involved with the Open Archives Initiative (OAI) since its inception and was one of the authors of the OAI Protocol for Metadata Harvesting. He worked at Los Alamos National Laboratory before moving with arXiv to Cornell in 2001. Prior to working on arXiv, he worked in the Physics Department at Syracuse University in computational physics, a discipline in which arXiv has eclipsed conventional journals as the preferred means of scholarly communication.