Afternoon Tutorials

Tutorial 7: Introduction to RelaxNG

David Durand

Abstract

This tutorial presents an introduction to RelaxNG, the “other” XML Schema language. RelaxNG is a powerful way to describe the permissible structures of an XML file, and extracts maximum descriptive power, from a minimum feature count. This is a powerful XML schema language that you can learn in a day. In this half-day tutorial we will examine _every_ feature of RelaxNG in sufficient detail to understand it, and we will look at examples of how it can be applied to the tricky problems of rigorously describing and controlling documents and other XML structures that are more to the “semi-” end of the semi- structured data continuum.

Target Audience

The level of this course is Introductory to Intermediate. Students should be familiar with XML. Basic knowledge of regular expressions is a plus, but not essential.

Presenter

David Durand has been working with markup systems since the birth of SGML in the 1980’s. He served on the W3C XML and XLink working groups, and has contributed to the Text Encoding Initiative’s work on schema languages and Hypertext markup. He is CEO of Tizra Inc. and is an Adjunct Associate Professor at Brown University, where he teaches a course on Document Engineering. With Steven J. DeRose, he is co-author of Making Hypermedia Work: A User’s Guide to HyTime.

Back to the top

Tutorial 8: The Fedora Service Framework: Advanced Applications

Carl Lagoze, Peter Murray*, Sandy Payette and Andrew Treloar

Abstract

This tutorial will focus on three application areas in which Fedora is being applied. The first application area is institutional repositories, with a focus on why a service-oriented architecture approach is desirable, as well as issues around workflow and access control. The second area is contextualized digital libraries, with a focus on the use of RDF to create rich information landscapes the push beyond the standard “search and access” paradigm. The third area is the emerging area often characterized as “e-research” or “e-scholarship” with a focus on data modeling, workflows for complex scholarly objects, and features of advanced scholarly publication systems. Presenters will discuss the challenges inherent in each scenario, and how Fedora has been used to address these challenges.

Target Audience

Information science specialists, including technically-oriented librarians and archivists, information technology specialists, and digital library architects, who wish to understand how Fedora can be applied to solve specific advanced problems. It is recommended that attendees have either a general familiarity with Fedora architectural basics, or that they have attended the Fedora introductory tutorial that precedes this tutorial.

Presenters

Carl Lagoze, Senior Research Association, Cornell Information Science

Peter Murray*, Assistant Director, Multimedia Systems, OhioLINK

Sandy Payette, Co-Director of Fedora Project, Cornell Information Science

Andrew Treloar*, Project Director/Technical Architect, ARROW, Monash University

* Denotes tentative presenters; may have alternates based on travel availability

Back to the top

Tutorial 9: Thesauri and ontologies in digital libraries: Structure and use in knowledge-based assistance to users

Dagobert Soergel

Abstract

The tutorial provides a bridge by presenting methods of subject access as treated in an information studies program for those coming to digital libraries from other fields. It will elucidate through examples the conceptual and vocabulary problems users face when searching digital libraries. It will then show how a well-structured thesaurus / ontology can be used as the knowledge base for an interface that can assist users with search topic clarification (for example through browsing well-structured hierarchies and guided facet analysis) and with finding good search terms (through query term mapping and query term expansion — synonyms and hierarchic inclusion). It will touch on cross-database and cross-language searching as natural extensions of these functions. It will also mention the use of more richly structured ontologies, including Semantic Web applications. The tutorial will cover the thesaurus structure needed to support these functions: Concept-term relationships for vocabulary control and synonym expansion, conceptual structure (semantic analysis, facets, and hierarchy) for topic clarification and hierarchic query term expansion). It will introduce a few sample thesauri and ontologies and some thesaurus-supported digital libraries and Web sites to illustrate these principles.

Presenter

Dagobert Soergel, College of Information Studies, Univ. of Maryland, College Park, MD 20742
Office:(301) 405-2037 Fax (301) 314-9145 Cell 703-585-2840
dsoergel@umd.edu www.dsoergel.com

Back to the top

Tutorial 10: Semantic Digital Libraries

Sebastian Ryszard Kruk, Bernhard Haslhofer, Erich J. Neuhold, Predrag Kneževic and Kerstin Zimmermann

Abstract

We will start by defining problems in the domain of semantic digital libraries and present solutions that provide building blocks for semantic digital libraries, such as WordNet, DMoz, and SKOS. We will discuss in detail the problems and solutions for bibliographic metadata management and interoperability. We will discuss the future of federations of digital libraries in the context of the Semantic Web and Web 2.0 Internet. We will present three initiatives that adhere to the idea of a semantic digital library: SIMILE which leverages and extends DSpace by enhancing its support for arbitrary schemata and metadata; Corrib.org, which delivers semantic aware digital library components like JeromeDL, MarcOnt, FOAFRealm, and HyperCuP; and BRICKS, the largest cultural heritage project in the EU’s 6th framework program. The tutorial will be followed by a hands-on session where participants will be able to try out SIMILE-PiggyBank, JeromeDL and BRICKS.

Target Audience

Researchers and computer scientists from (digital) library, semantic web, distributed systems and knowledge management communities; with an introductory or intermediate level of experience in the presented topics.

Back to the top

Tutorial 11: Exploiting open source tools to create, maintain, and disseminate XML content

Eric Lease

Abstract

XML is quickly becoming the means of marking up data for the purposes of transmitting information from one computer to another. While XML can be created by hand, the process is tedious and not necessarily scalable. Software systems can address this problem, and this tutorial enumerates, describes, and demonstrates ways open source software can be used to create, maintain, and disseminate XML. The goal of this tutorial is to increase participants’ knowledge of these tools and to demonstrate how to take advantage of them in everyday digital library work and software development.

Target Audience

Software engineers and librarians/intermediate

Presenter

Eric Lease Morgan is the Head of the Digital Access and Information Architecture Department at the University Libraries of Notre Dame. He considers himself to be a librarian first and a computer user second. His professional goal is to discover new ways to use computers to provide better library service. Some of his more well-known investigations and implementations include MyLibrary and the Alex Catalogue of Electronic Texts. An advocate for open source software and open access publishing, Morgan has been freely distributing his software and publications for years before the terms “open source” and “open access” were coined. Morgan also hosts his own Internet domain, infomotions.com.

Back to the top

Tutorial 12: Metadata and Resource Exchange Using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

Simeon Warner

Abstract

This tutorial will provide an overview of best practices for OAI-PMH data providers and how they relate to the expectations and needs of service providers. We use practical experience from harvesting projects and the validation service to highlight pitfalls and ways to avoid them. We will also review best practices for the production of shareable metadata including character encoding, namespace and XML schema issues, and the support of multiple metadata formats.

These ideas will be extended with a discussion of OAI-PMH applications beyond bibliographic metadata exchange, with a focus on resource harvesting. We will describe how representations of resources using complex object formats (such as MPEG-21 DIDL, METS and SCORM) fit into the OAI-PMH data model. We will show how they can provide the basis for a robust resource harvesting framework using the OAI-PMH.

Target Audience

Those implementing the OAI-PMH. Attendees should already be familiar with OAI protocol basics and underlying technologies such as XML, W3C XML Schema and metadata formats. Experience with either data provider or service provider implementations will be of benefit.

Presenter

Simeon Warner is a Research Associate in Computing and Information Science at Cornell University. He is one of the developers of the arXiv e-print archive and his research interests include web information systems, interoperability, and open-access scholarly publishing. He has been actively involved with the Open Archives Initiative (OAI) since its inception and was one of the authors of the OAI Protocol for Metadata Harvesting. He worked at Los Alamos National Laboratory before moving with arXiv to Cornell in 2001. Prior to working on arXiv, he worked in the Physics Department at Syracuse University in computational physics, a discipline in which arXiv has eclipsed conventional journals as the preferred means of scholarly communication.

Back to the top