(Nominated for Best Student Paper)
Today's digital libraries (DLs) archive vast amounts of information in the form of text, videos, images, data measurements, etc. User access to DL content can rely on similarity between metadata elements, or similarity between the data itself (content-based similarity). We consider the problem of exploratory search in large DLs of time-oriented data. We propose a novel approach for overview- first exploration of data collections based on user-selected metadata properties. A 2D layout representing entities of the selected property are laid out based on their similarity with respect to the underlying data content. The display is enhanced by compact summarizations of underlying data elements, and forms the basis for exploratory navigation of users in the data space. The approach is proposed as an interface for visual exploration, leading the user to discover interesting relationships between data items relying on content-based similarity between data items and their respective metadata labels. We apply the method on real data sets from the earth observation community, showing its applicability and usefulness.
Distributed collections are made of metadata entries that contain references to artifacts not controlled by the collection curators. These collections often have limited forms of change; for digital distributed collections, primarily creation and deletion of additional resources. However, there exists a class of digital collection that undergoes additional kinds of change. These collections consist of resources that are distributed across the Internet and brought together via hyperlinking. Resources in these collections can be expected to change as time goes on. Part of the difficulty in maintaining these collections is determining if a changed page is still a valid member of the collection. Others have tried to address this by defining a maximum allowed threshold of change, however, these methods treat change as a potential problem and treat web content as static despite its intrinsic dynamicism. Instead we acknowledge change on the web as a normal part of a web document and determine the difference between what a maintainer expects a page to do and what it actually does. In this work we evaluate options for extractors and analyzers from a suite of techniques against a human-generated ground-truth set of blog changes. The results of this work show a statistically significant improvement over traditional threshold techniques for our collection.