ICADL International Digital Libraries Conference

ICADL 2010 Proceedings available online now.

Papers

Session 1b - Digital Libraries of Heritage Materials

A Visual Dictionary for an Extinct Language (Short Paper)

Kyle Williams, Sanvir Manilal, Lebogang Molwantoa, Hussein Suleman

Abstract. Cultural heritage artefacts are often digitised in order to allow for them to be easily accessed by researchers and scholars. In the case of the Bleek and Lloyd dictionary of the xam Bushman language, 14000 pages were digitised. These pages could not be transcribed, however, because the language and script are both extinct. A custom digital library system was therefore created to manage and provide access to this collection as a purely .visual dictionary.. Results from user testing showed that users found the system to be interesting, simple, efficient and informative.

A Scalable Method for Preserving Oral Literature from Small Languages (Full Paper)

Steven Bird

Abstract. Can the speakers of small languages, which may be remote, unwritten, and endangered, be trained to create an archival record of their oral literature, with only limited external support? This paper describes the model of "Basic Oral Language Documentation", as adapted for use in remote village locations, far from digital archives but close to endangered languages and cultures. Speakers of a small Papuan language were trained and observed during a six week period. Linguistic performances were collected using digital voice recorders. Careful speech versions of selected items, together with spontaneous oral translations into a language of wider communication, were also recorded and curated. A smaller selection was transcribed. This paper describes the method, and shows how it is able to address linguistic, technological and sociological obstacles, and how it can be used to collect a sizeable corpus. We conclude that Basic Oral Language Documentation is a promising technique for expediting the task of preserving endangered linguistic heritage.

Digital Folklore Contents on Education of Childhood Folklore and Corporate Identification System Design (Full Paper)

Ya-Chin Liao, Kuo-An Wang, Po-Chou Chan, Yi-Ting Lin, Jung-I Chin, Yung-Fu Chen

Abstract. Digital artifacts preserved in digital repositories of museums are mostly static images. However, the artifacts may be lost, degraded, or damaged no matter how well the preservation and exhibition environments have been controlled, which makes the artifacts difficult to recover. Furthermore, if not properly inherited, information regarding making, function, and usage of an artifact might be lost after several generations. Hence, in addition to digitizing folklore artifacts, we have also digitized the crafts in how to make them and skills and rituals in how to use them to be recorded in videos. With abundant digitized collections, the repository website is becoming more and more popular for teachers and students, especially in kindergartens and elementary schools, to extract and create useful teaching materials for folklore education. Recently, folklore contents have been encouraged to be applied in the education of English as second language (ESL), social work, and mathematics. In this study, we applied the digital folklore contents for developing story books to be used in childhood folklore education and for instructing students to design corporate identification system (CIS) as a class exercise. Technology acceptance model (TAM) was used to evaluate perceived usefulness (PU), perceived ease of use (PEU), and behavior intention (BI) in using these digital contents to accomplish their tasks. The results show that the scores of PU, PEU, and BI are all greater than 3 (5-point Likert scale) indicating usefulness and ease of use of the contents and website, as well as a positive attitude toward continuous use of the contents in various educational areas.

Ancient-to-modern Information Retrieval for Digital Collections of Traditional Mongolian Script (Short Paper)

Biligsaikhan Batjargal, Garmaabazar Khaltarkhuu, Fuminori Kimura, Akira Maeda

Abstract. This paper discusses our recent improvements to the traditional Mongolian script digital library (TMSDL), which can be used to access ancient historical documents written in traditional Mongolian using a query in modern Mongolian. The results of the experiment show that the percentage of successfully retrieved queries was improved.

Session 2b - Annotation and Collaboration

A Collaborative Scholarly Annotation System for Dynamic Web Documents - a Literary Case Study (Full Paper)

Anna Gerber, Andrew Hyland, Jane Hunter

Abstract. This paper describes ongoing work within the Aus-e-Lit project at the University of Queensland to provide collaborative annotation tools for Australian Literary Scholars. It describes our implementation of an annotation framework to facilitate collaboration and sharing of annotations within research sub-communities. Using the annotation system, scholars can collaboratively select web resources and attach different types of annotations (comments, notes, queries, tags and metadata), which can be harvested to enrich the AustLit collection. We describe how rich semantic descriptions can be added to the constantly changing AustLit collection through a set of interoperable annotation tools based on the Open Annotations Collaboration (OAC) model. RDFa enables scholars to semantically annotate dynamic web pages and contribute typed metadata about the IFLA FRBR entities represented within the AustLit collection. We also describe how the OAC model can be used in combination with OAI-ORE to produce scholarly digital editions, and compare this approach with existing scholarly annotation approaches.

The Relation between Comments inserted onto Digital Textbooks by Students and Grades earned in the Course (Full Paper)

Akihiro Motoki, Tomoko Harada, Takashi Nagatsuka

Abstract. When students read textbooks in the classroom, they usually apply active reading. The practice of marking in university textbooks is a familiar one. They scribble comments on the margin, highlight elements, underline words and phrases, and correlate distinct parts to foster critical thinking. While the use of annotations during active reading supports the students themselves, these can also be useful for other readers. Investigations were carried out to evaluate the comments inserted by students onto their digital textbooks and how this relates to their eventual grade earned at the end of course. The results of our study highlight two main factors influencing students; eventual grade, quantity and quality of annotation. Students who wrote a lot of comments and focused upon the more important keywords in the text trend to receive a higher grade. Accordingly, our analysis was based on number and quality of text word selection.

Visualizing and Exploring Evolving Information Networks in Wikipedia (Full Paper)

Ee-Peng Lim, Agus Trisnajaya Kwee, Nelman Lubis Ibrahim, Aixin Sun, Anwitaman Datta, Kuiyu Chang, Maureen

Abstract. Information networks in Wikipedia evolve as users collaboratively edit articles that embed the networks. These information networks represent both the structure and content of community's knowledge and the networks evolve as the knowledge gets updated. By observing the networks evolve and finding their evolving patterns, one can gain higher order knowledge about the networks and conduct longitudinal network analysis to detect events and summarize trends. In this paper, we present SSNetViz+, a visual analytic tool to support visualization and exploration of Wikipedia's information networks. SSNetViz+ supports timebased network browsing, content browsing and search. Using a terrorism information network as an example, we show that different timestamped versions of the network can be interactively explored. As information networks in Wikipedia are created and maintained by collaborative editing efforts, the edit activity data are also shown to help detecting interesting events that may have happened to the network. SSNetViz+ also supports temporal queries that allow other relevant nodes to be added so as to expand the network being analyzed.

Session 3b - Mobility and Migration

Do Games Motivate Mobile Content Sharing? (Full Paper)

Dion Hoe-Lian Goh, Chei Sian Lee, Alton Yeow-Kuan Chua

Abstract. Indagator (Latin for explorer) is a game which incorporates multiplayer, pervasive gaming elements into mobile content sharing. Indagator allows users to annotate real world locations with multimedia content, and concurrently, provide opportunities for play through creating and engaging interactive game elements, earning currency, and socializing. A user study of Indagator was conducted to examine the impact of the usability of Indagator's content sharing and gaming features, as well as demographic profiles on participants' motivation to use the application. Participants felt that the features in Indagator were able to support the objectives of content sharing and gaming, and that the idea of gaming could be a motivator for content sharing. In terms of motivation to use, usability of Indagator's gaming features, gender and participants. familiarity with mobile gaming emerged as significant predictors. Implications and future research directions are discussed.

A Multifaceted Approach to Exploring Mobile Annotations (Full Paper)

Guanghao Low, Dion Hoe-Lian Goh, Chei Sian Lee

Abstract. Mobile phones with capabilities such as media capture and location detection have become popular among consumers, and this has made possible the development of location-based mobile annotation sharing applications. The present research investigates the creation of mobile annotations from three perspectives: the recipients of the annotations, the type of content created, and the goals behind creating these annotations. Participants maintained a two week-long diary, documenting their annotation activities. Results suggest that range of motivational factors, including those for relationship maintenance and entertainment. Participants were also more inclined to create leisure-related annotations, while the types of recipients were varied. Implications of our work are also discussed.

Model Migration Approach for Database Preservation (Full Paper)

Arif Ur Rahman, Gabriel David, Cristina Ribeiro

Abstract. Strategies developed for database preservation in the past include technology preservation, migration, emulation and the use of a universal virtual computer. In this paper we present a new concept of "Model Migration for Database Preservation". Our proposed approach involves two major activities. First, migrating the database model from conventional relational model to dimensional model and second, calculating the information embedded in code and preserving it instead of preserving the code required to calculate it. This will affect the originality of the database but improve two other characteristics: the information considered relevant is kept in a simple and easier to understand format and the systematic process to preserve the dimensional model is independent of the DBMS details and application logic.

Session 3c - Natural Language Processing

Automated Processing of Digitized Historical Newspapers beyond the Article Level: Finding Sections and Regular Features (Full Paper)

Robert B. Allen, Catherine Hall

Abstract. Millions of pages of historical newspapers have been digitized but in most cases access to these are supported by only basic search services. We are exploring interactive services for these collections which would be useful for supporting access, including automatic categorization of articles. Such categorization is difficult because of the uneven quality of the OCR text, but there are many clues which can be useful for improving the accuracy of the categorization. Here, we describe observations of several historical newspapers to determine the characteristics of sections. We then explore how to automatically identify those sections and how to detect serialized feature articles which are repeated across days and weeks. The goal is not the introduction of new algorithms but the development of practical and robust techniques. For both analyses we find substantial success for some categories and articles, but others prove very difficult.

Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing (Full Paper)

Mikalai Krapivin, Aliaksandr Autayeu, Maurizio Marchese, Enrico Blanzieri, Nicola Segata

Abstract. In this paper we use Natural Language Processing techniques to improve dirent machine learning approaches (Support Vector Machines (SVM), Local SVM, Random Forests) to the problem of automatic keyphrases extraction from scientific papers. For the evaluation we propose a large and high-quality dataset: 2000 ACM papers from the Computer Science domain. We evaluate by comparison with expert-assigned keyphrases. Evaluation shows promising results that outperform state-of-the-art Bayesian learning system KEA improving the average F-Measure from 22% (KEA) to 30% (Random Forest) on the same dataset without the use of controlled vocabularies. Finally, we report a detailed analysis of the ect of the individual NLP features and data set size on the overall quality of extracted keyphrases.

Measuring Peculiarity of Text using Relation between Words on the Web (Short Paper)

Takeru Nakabayashi, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi, Kazutoshi Sumiya

Abstract. We define the peculiarity of text as a metric of information credibility. Higher peculiarity means lower credibility. We extract the theme word and the characteristic words from text and check whether there is a subject-description relation between them. The peculiarity is defined using the ratio of the subject-description relation between a theme word and characteristic words. We evaluate the extent to which peculiarity can be used to judge by classifying text from Wikipedia and Uncyclopedia in terms of the peculiarity.

Imitating Human Literature Review Writing: An Approach to Multi-Document Summarization (Short Paper)

Kokil Jaidka, Christopher Khoo, Jin-Cheon Na

Abstract. This paper gives an overview of a project to generate literature reviews from a set of research papers, based on techniques drawn from human summarization behavior. For this study, we identify the key features of natural literature reviews through a macro-level and clause-level discourse analysis; we also identify human information selection strategies by mapping referenced information to source documents. Our preliminary results of discourse analysis have helped us characterize literature review writing styles based on their document structure and rhetorical structure. These findings will be exploited to design templates for automatic content generation.

Session 4b - Metadata

A Study of Users' Requirements in the Development of Palm Leaf Manuscripts Metadata Schema (Full Paper)

Nisachol Chamnongsri, Lampang Manmart, Vilas Wuwongse

Abstract. This paper presents the users' behavior, their needs and expectations with respect to palm leaf manuscripts (PLMs) which are ancient Thai documents.. We focus on access tools, access points and how users select PLMs. The data were collected by in-depth interviews of 20 users including researchers, local scholars and graduate students who are working on research in the field and using PLMs for information and knowledge resources. The research results present two important characteristics of user behaviors: previous knowledge of items, and exploratory searches. Users adopt a 4-step pattern in searching for the PLMs. Finally, we discuss the important information in searching for the PLMs and we compare this with the frequently consulted bibliographic elements and Dublin Core elements.

Landscaping Taiwan's Cultural Heritages- The Implementation of TELDAP Collection-Level Description (Full Paper)

Hsueh-Hua Chen, Chiung-Min Tsai, Ya-Chen Ho

Abstract. This paper depicts the implementation process of the collection-level description of TELDAP. Our study looks into collection-level description in order to eliminate problems users might encounter when accessing and retrieving resources caused by having only large amounts of item-level metadata. The implementation process is divided into five stages. In order to facilitate the application of collection-level description, we have put forth revised schema for the usage of currently available description standards. In the future, we intend to fortify relationships between item-level and collection-level metadata, and provide versions in different languages, expanding the accessibility of valuable resources to more users.

GLAM Metadata Interoperability (Short Paper)

Shirley Lim, Chern Li Liew

Abstract. Both digitised and born-digital images are a valuable part of cultural heritage collections in galleries, libraries, archives and museums (GLAM). Efforts have been put into aggregating these distributed resources. High quality and consistent metadata practice across these institutions are necessary to ensure interoperability and the optimum retrieval of digital images. This paper reports on a study that involves interviews with staff members from ten institutions from the GLAM sector in New Zealand, who are responsible for creating metadata for digital images. The objective is to understand how GLAM institutions have gone about creating metadata for their image collections to facilitate access and interoperability (if any) and the rationale for their practice, as well as the factors affecting the current practice.

Metadata Creation: Application for Thai Lanna Historical and Traditional Archives (Short Paper)

Churee Techawut

Abstract. This paper describes the process of metadata creation of the Thai Lanna historical and traditional archives (shortened to the Lanna Archives) by applying the Singapore Framework for Dublin Core Application Profiles. Its metadata model for scholarly works based on the Functional Requirements for Bibliographic Records (FRBR) is adapted to create a data model and metadata scheme for the Lanna Archives. The proposed metadata scheme provides the level of detail which describing digital Lanna Archives require and also supports information consistency and information sharing.

Session 4c - Usability and Navigation

A User-Centric Evaluation of the Europeana Digital Library (Full Paper)

Milena Dobreva, Sudatta Chowdhury

Abstract. Usability of digital libraries is an essential factor for the user attraction. Europeana, a digital library which is built around the idea to provide a single access point to the European cultural heritage, is paying special attention to the user needs and behaviour. This paper presents user-related outcomes addressing the dynamics of user perception from a study which involved focus groups and media labs in four European countries. While Europeana was positively perceived by all groups in the beginning of the study, some groups were more critical after performing a task which involved eight types of searches. The study gathered opinions on the difficulties encountered which help to understand better users' expectations within the content and functionality domains of digital libraries which would be of possible interest to all stakeholders in digital library projects.

Digital Map Application for Historical Photos (Full Paper)

Weiqin Chen, Thomas Nottveit

Abstract. Although many map applications are available for presenting, browsing and sharing photos over the Internet, historical photos are not given enough attention. In addition, limited research efforts have been made on the usability and functionalities of such map applications for photo galleries. This paper aims to address these issues by studying the role of digital maps in presenting, browsing and searching historical photos. We have developed a map application and conducted formative evaluation with users focusing on usability and user involvement. The evaluation has shown positive responses from users. The search and navigation functions in the map application were found especially useful. The map was found to be important in involving users to share local knowledge about historical photos.

Supporting Early Document Navigation with Semantic Zooming (Full Paper)

Tom Owen, George Buchanan, Parisa Eslambolchilar, Fernando Loizides

Abstract. Traditional digital document navigation found in Acrobat and HTML document readers performs poorly when compared to paper documents for this task. We investigate and compare two methods for improving navigation when a reader first views a digital document. One technique modifies the traditional scrolling method, combining it with Speed-Dependent Automatic Zooming (SDAZ). We also examine the effect of adding "semantic" rendering, where the document display is altered depending on scroll speed. We demonstrate that the combination of these methods reduces user effort without impacting on user behaviour. This confirms both the utility of our navigation, and the minimal use information seekers use of much of the content of digital documents.

Session 5b - Knowledge Structures

PODD: An Ontology-driven Data Repository for Collaborative Phenomics Research (Full Paper)

Yuan-Fang Li, Gavin Kennedy, Faith Davies, Jane Hunter

Abstract. Phenomics, the systematic study of phenotypes, is an emerging field of research in biology. It complements genomics, the study of genotypes, and is becoming an increasingly critical tool to understand phenomena such as plant morphology and human diseases. Phenomics studies make use of both high- and low-throughput imaging and measurement devices to capture data, which are subsequently used for analysis. As a result, high volumes of data are generated on a regular basis, making storage, management, annotation and distribution a challenging task. Sufficient contextual information, the metadata, must also be maintained to facilitate the dissemination of these data. The challenge is further complicated by the need to support emerging technologies and processes in phenomics research. This paper describes our effort in designing and developing an ontology-driven, open, extensible data repository to support collaborative phenomics research in Australia.

A Configurable RDF Editor for Australian Curriculum (Full Paper)

Diny Golder, Les Kneebone, Jon Phipps, Steve Sunter, Stuart A. Sutton

Abstract. Representing Australian Curriculum for education in a form amenable to the Semantic Web and conforming to the Achievement Standards Network (ASN) schema required a new RDF instance data editor for describing bounded graphs—what the Dublin Core Metadata Initiative calls a 'description set'. Developed using a 'describe and relate' metaphor, the editor reported here eliminates all need for authors of graphs to understand RDF or other Semantic Web formalisms. The Description Set Editor (ASN DSE) is configurable by means of a Description Set Profile (DSP) constraining properties and property values and a set of User Interface Profiles (UIP) that relate the constraints of the DSP to characteristics of the user interface. When fully deployed, the editor architecture will include a Sesame store for RDF persistence and a metadata server for deployment of all RESTful web services. Documents necessary for configuration of the editor including DSP, UIP, XSLT, HTML, CSS, and JavaScript files are stored as web resources.

Thesaurus Extension using Web Search Engines (Full Paper)

Robert Meusel, Mathias Niepert, Kai Eckert, Heiner Stuckenschmidt

Abstract. Maintaining and extending large thesauri is an important challenge facing digital libraries and IT businesses alike. In this paper we describe a method building on and extending existing methods from the areas of thesaurus maintenance, natural language processing, and machine learning to (a) extract a set of novel candidate concepts from text corpora and (b) to generate a small ranked list of suggestions for the position of these concept in an existing thesaurus. Based on a modification of the standard tf-idf term weighting we extract relevant concept candidates from a document corpus. We then apply a pattern-based machine learning approach on content extracted from web search engine snippets to determine the type of relation between the candidate terms and existing thesaurus concepts. The approach is evaluated with a large-scale experiment using the MeSH and WordNet thesauri as testbed.

Session 6b - Images and Retrieval

Preservation of Cultural Heritage: From Print Book to Digital Library - A Greenstone Experience (Short Paper)

Henny M. Sutedjo, Gladys Sau-Mei Theng, Yin-Leng Theng

Abstract. We argue that current development in digital libraries presents an opportunity to explore the use of DL as a tool for building and facilitating access to digital cultural resources. Using Greenstone, an open source DL, we describe a 10-step approach in converting an out-of-print book, 'Costumes through Times', and constructing a DL creation of costumes.

Improving Social Tag-Based Image Retrieval with CBIR Technique (Short Paper)

Choochart Haruechaiyasak, Chaianun Damrongrat

Abstract. With the popularity of social image-sharing websites, the amount of images uploaded and shared among the users has increased explosively. To allow keyword search, the system constructs an index from image tags assigned by the users. The tag-based image retrieval approach, although very scalable, has some serious drawbacks due to the problems of tag spamming and subjectivity in tagging. In this paper, we propose an approach for improving the tag-based image retrieval by exploiting some techniques in content-based image retrieval (CBIR). Given an image collection, we construct an index based on 130-scale Munsell-based colors. Users are allowed to perform query by keywords with color and/or tone selection. The color index is also used for improving ranking of search results via the user relevance feedback.

Identifying Persons in News Article Images Based on Textual Analysis (Full Paper)

Choochart Haruechaiyasak, Chaianun Damrongrat

Abstract. A large portion of news articles contains images of persons whose names appear in the news stories. To provide image search of persons, most search engines construct an index from textual descriptions (such as headline and caption) of images. The index search approach, although very simple and scalable, has one serious drawback. A query of a person name could match some news articles which do not contain images of the target person. Therefore, some irrelevant images could be returned as search results. Our main goal is to improve the performance of the index search approach based on the syntactic analysis of person name entities in the news articles. Given sentences containing person names, we construct a set of syntactic rules for identifying persons in news images. The set of syntactic rules is used to filter out images of non-target persons from the results returned by the index search. From the experimental results, our approach improved the performance over the basic index search by 10% based on the F1-measure.

Kairos: Proactive Harvesting of Research Paper Metadata from Scientific Conference Web Sites (Full Paper)

Markus H�anse, Min-Yen Kan, Achim P. Karduck

Abstract. We investigate the automatic harvesting of research paper metadata from recent scholarly events. Our system, Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled with fields of metadata that correspond to individual papers. Using event date metadata extracted from the conference website, Kairos proactively harvests metadata about the individual papers soon after they are made public. We use a Maximum Entropy classifier to classify uniform resource locators (URLs) as scientific conference websites and use Conditional Random Fields (CRF) to extract individual paper metadata from such websites. Experiments show an acceptable measure of classification accuracy of over 95% for each of the two components.

Session 9b

Oranges Are Not the Only Fruit: An Institutional Case Study Demonstrating Why Data Digital Libraries Are Not the Whole Answer to E-research (Full Paper)

Dana McKay

Abstract. Data sharing and e-research have long been touted as the future of research, and a general public good. A number of studies have suggested data digital libraries in some form or another as an answer to a perceived data deluge, and the focus in Australia is very much on digital libraries. Moreover, the Australian National Data Service positions the institution as the core unit for setting data policy and doing initial data management. In this paper we present the results of an institution-wide survey that shows that data digital libraries cannot be the only answer to the question of research data, at least at an institutional level, and that the current focus on digital libraries may actively alienate some researchers.

Open Access Publishing: an Initial Discussion of Income Sources, Scholarly Journals and Publishers (Short Paper)

Panayiota Polydoratou, Margit Palzenberger, Ralf Schimmer, Salvatore Mele

Abstract. The Study for Open Access Publishing (SOAP) project is one of the initiatives undertaken to explore the risks and opportunities of the transition to open access publishing. Some of the early analyses of open access journals listed in the Directory of Open Access Journals (DOAJ) show that more than half of the open access publishing initiatives were undertaken by smaller publishers, learned societies and few publishing houses that own a large number of journal titles. Regarding income sources as means for sustaining a journal's functions, "article processing charges", "membership fee" and "advertisement" are the predominant options for the publishing houses; "subscription to the print version of the journal", "sponsorship" and somewhat less the "article processing charges" have the highest incidences for all other publishers.

The Role of Digital Libraries in a Time of Global Change