Excited to be taking in part in the upcoming MLA convention in Boston! Our special session, “Scaling and Sharing: Data Management in the Humanities,” will be part of a series of conversations about all things lang, lit, and (open) access–in keeping with the convention theme. Here’s the submitted abstract, which provides a good overview of what will be on offer during the session:
In 2006 the American Council for Learned Societies released a report titled Our Cultural Commonwealth summarizing the promises and challenges of “big data” within the humanities and social sciences. The radical growth of computing, networking, and digital storage promised (or at least prefaced) a new era of “cumulative, collaborative, and synergistic” scholarship. And as we’ve seen in the half-dozen years since the report was issued, much of this promise has been realized. Examples include inter-institutional projects like those sponsored by the Digging into Data program, administered by the NEH’s Office of Digital Humanities; the Mellon-funded Project Bamboo, designed to become a content management and collaboration hub for IT and humanities researchers; and massive data collection undertakings like the Shoah Foundation’s Visual History Archive, a collection of nearly 52,000 testimonies from Holocaust and other genocide survivors.
Of course, most humanities research datasets don’t begin to approach this kind of scale. Single researchers and research teams working with local materials, databases, and storage are still very much the norm. The questions that this roundtable focuses on, then, are: How do we define and support good humanities data practices at the individual and local level? What options exist for managing and distributing data? And how can we ensure that individual datasets are open, interoperable, and accessible by as many researchers as possible? Presenters will address these questions from a number of angles, considering everything from the theoretical redefinitions of what counts as data in humanities research to the tools and expertise necessary for capturing and distributing useful data.
As of a term of art, data has the tendency to conjure up hard stats and bean counting inimical to the “softer side” of the hermeneutic process. However, rather than simply replicating this divisive approach to datasets, or urging humanistic practice to shed its reliance on close reading and take up with cold, hard positivism, this panel seeks to query precisely what data can be and do in a humanities context. One particular type of standardized information set familiar to humanities scholars is the research bibliography. Panelists will discuss how shared bibliographies can become not simply the intertextual foundations of good research, but data points for the creation of new knowledge. In particular, Spencer Keralis, Digital Scholarship Research Associate and CLIR Postdoctoral Fellow at the University of North Texas, will discuss his collaboration on the Susanna Rowson Digital Compendium. The Compendium is designed to pioneer the use of bibliographic data as a core information set for research. It is precisely because bibliographies adhere to specified formats that the data they express can be used for GIS mapping, temporal animations, network visualization, and other digital (and analog) heuristics. Keralis will demonstrate the potential value of reconceiving the bibliography as an information set to drive new work in literary history and history of the book.
Data management has long been a priority for many grant programs offered by the National Endowment for the Humanities (NEH). 2011 became a banner year for formal data management plans. Along with the National Science Foundation (NSF), programs offered by the NEH Office of Digital Humanities began requiring separate data management plans for all grant submissions to programs like the popular Digital Humanities Start-Up Grant program and the new Digital Humanities Implementation Grant program. These plans must describe how the project team will manage and disseminate data generated or collected by the project. Jason Rhody, Senior Program Officer for the Office of Digital Humanities at NEH, will discuss elements now considered fundamental to the data management plan: descriptions of data types, how data will be managed and made available to others, the formal mechanisms and software for data sharing, legal and ethical restrictions relevant to the dataset, and metadata standards to be employed. Rhody will also address the overall decision to require formal data management plans in its humanities grants and the results of the first full year of their inclusion in the submission process.
Presenters are also deeply invested in helping new forms of humanistic data find accessible formats. The use of standard lexical and structural markup, tagging, and other kinds of descriptive indentifiers has the potential to move data from the siloed and idiosyncratic space of the individual research project into the realm of inter-institutional and interdisciplinary collaboration. Michael Ullyot, Assistant Professor of English at the University of Calgary, will describe his endeavors to apply structural metadata to the 865,185 words of Shakespeare’s complete works. In doing so, Ullyot is working to create a training set for Natural Language Processing (NLP) algorithms to automate the markup of all early modern English. In a similar vein, Lisa Rhody, Ph.D. Candidate in English at the University of Maryland, will reflect on the process of building a data collection and management plan from the ground up. Her project, “Review, Revise, Requery: New Methods for Studying Ekphrasis”–supported by a Maryland Institute for Technology in the Humanities (MITH) Winnemore Dissertation Fellowship–required collaborating with the MITH team to engineer an approach to tagging, markup, and version control with the specific aim of building a digital collection of modern poetry suitable for text analysis, including topic modeling and classification.
Finally, through XML, metadata standards, and version control systems like Git, panelists examine how metadata can be deployed to make data accessible to diverse end users. Matt Burton, Ph.D. Candidate in the School of Information at the University of Michigan, and Korey Jackson, Digital Publishing Coordinator and CLIR Postdoctoral Fellow also at Michigan, will examine Git and the popular software development community platform GitHub. Their talk explores Git and GitHub from the perspective of the humanities scholar, asking how Git might best be applied to digital humanities projects and what kinds of training are necessary to make version control a commonplace practice in humanities research.
Overall, in outlining the migration from individual project to scalable dataset, this roundtable explores “big data” not simply as a matter of size or number, but more importantly as a process of granting researchers and educators access to shared information resources.
Hope to see many DH (and good ol’ H) friends there!