panel 1 | panel 2

Panel 1

Panel Name: Big Data Is Already Here, and It’s Not Always What We Think

Panel Date: Monday, June 11, 2012 / 1:30 pm -3:00 pm


Libraries have over 20 years experience in managing large numbers of digital files, and indexing catalog records and full-text documents. In the last several years, research libraries in general have been facing a growing expectation that large digital library collections and record sets can be mined and analyzed for research purposes.  Many Libraries now have several collections of unstructured or semi-structured content that can be measured in hundreds of terabytes.  This panel will present use cases from the Library of Congress that address the use of Library collections as research data, and the introduction, adoption and maturity of tools and services and configurations of technical environments to support processing, management, mining, indexing, and analysis of large volumes of unstructured digital content.


Leslie Johnston, Chief of Repository Development Library of Congress (Moderator)

Jane Mandelbaum, Special Projects Manager, Information Technology Services, Library of Congress

Trevor Owens, Digital Archivist, NDIIPP, Library of Congress
James Snyder, Senior Systems Administrator, National Audiovisual Conservation Center, Library of Congress


Panel 2

Panel Name: The Digging into Data Challenge: A Roundtable Discussion

Panel Date: Tuesday June 12, 2012 / 10:30 - 12:00


The Digging into Data Challenge is an international grant competition funded by eight leading research agencies from the US, the UK, Canada, and the Netherlands. The idea behind the Digging into Data Challenge is to address how "big data" changes the research landscape for the humanities and social sciences. Now that we have massive databases of materials used by scholars in the humanities and social sciences -- ranging from digitized books, newspapers, and music to transactional data like web searches, sensor data or cell phone records -- what new, computationally-based research methods might we apply? As the world becomes increasingly digital, new techniques will be needed to search, analyze, and understand these everyday materials. Digging into Data challenges the research community to help create the new research infrastructure for 21st century scholarship.

Digging into Data has made two rounds of awards (2009 and 2011) to teams of scholars, librarians, and computer and information scientists. Please join us for a roundtable that brings together representatives from four of the funding agencies (NEH, IMLS, NSF, and JISC), three of the PIs, as well as the President of CLIR, who will be telling us about a brand-new report his organization is about to release, One Culture: Computationally Intensive Research in the Humanities and Social Sciences, A Report on the Experiences of First Respondents to the Digging Into Data Challenge.

The panel will focus on moderated discussion and audience participation rather than presentations. We hope to discuss some of the issues that were surfaced during the competition, such as: What is "big data" in a humanities or social science context and how does it change research methods? What are the practical challenges of gaining access to large-scale digital data? What are the practical challenges of working in large-scale, international teams involving scholars, librarians, scientists? What is the role of the librarian or data provider when it comes to big data research? What is the state of big data in the humanities or social sciences - that is, what kinds of data cleaning and preparatory work had to be done prior to doing research? What are the major barriers to this kind of research (e.g. intellectual property, copyright, dirty OCR, etc.)? What is a vision for the future of big data research for the humanities and social sciences?

Roundtable Participants:

Brett Bobley, Director, Office of Digital Humanities, NEH (moderator)

Stuart Dempster, Director, The Strategic Content Alliance, JISC

E. Thomas Ewing, Professor of History, Virginia Tech, NEH PI for An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic

Chuck Henry, President, CLIR

Ray Larson, Professor, School of Information, UC Berkeley, IMLS PI for Integrating Data Mining and Data Management Technologies for Scholarly

Jennifer Serventi, Senior Program Officer, Office of Digital Humanities, NEH

Cassidy R. Sugimoto, Assistant Professor, School of Library and Information Science, Indiana University, NSF PI for Cascades, Islands, or Streams? Time, Topic, and Scholarly Activities in Humanities and Social Science Research

Chuck Thomas, Senior Library Program Officer, IMLS

Elizabeth Tran, Associate Program Officer, NSF