Abstracts

The Textual Communities Transcription workspace: a poster and demonstration

July 17, 2013, 15:30 | Centennial Room, Nebraska Union

This poster will present the beta version of the Textual Communities transcription tool, describing its underlying principles, its innovative structure, and its functionality.

Study of literary works that exist in many different forms is one of the most important and difficult tasks in the humanities. The number of forms a work may have—eighty-four fifteenth-century manuscripts and printed texts of Chaucer’s Canterbury Tales, more than eight hundred manuscripts of Dante’s Comedia, and five thousand manuscripts of the Greek New Testament — is both testimony to their significance and a challenge to scholars. In order to understand these texts and how they relate, we have to discover as much as we can of how they came to be written and disseminated. Only then can we seek to establish how they might best be read and prepare texts (in the form of scholarly editions) for scholars to use.

The work of building archives of primary materials for this kind of work is daunting and prohibitive in the old lone scholar method. At the same time that scholars working on large editing projects have opted for a team approach, several projects internationally have demonstrated the tantalizing possibilities of crowd-sourcing for processing large amounts of textual data. The challenge it to coordinate this work in a way that produces high-quality, useful results.

The work of building archives of primary materials for this kind of work is daunting and prohibitive in the old lone scholar method. At the same time that scholars working on large editing projects have opted for a team approach, several projects internationally have demonstrated the tantalizing possibilities of crowd-sourcing for processing large amounts of textual data. The challenge it to coordinate this work in a way that produces high-quality, useful results.

The Textual Communities workspace is a tool, in the first instance, for defining, transcribing, and linking textual materials for a digital archive or edition and for marshaling and managing a community of participants with an array of community building tools.

There are many transcription tools under development and a few that are already functional. There are three defining features that make this tool different than the rest: its integrated participant and document management systems, and its mapping of fundamental document-entity structure. These features correspond to two underlying principles: that the work of amassing large corpora of textual materials is best accomplished by a well-managed community of interested participants from within, but also potential from outside the academy; and that for the resulting materials to be useful, their relationships must be clearly articulated.

As its name suggests, the Textual Communities tool is designed for gathering and organizing multiple participants around a common editorial project. It supports a wide range of relational structures, from a carefully crafted team to ad hoc community built on crowd-sourcing. Crucially, it enables definition of roles in the project with varying degrees of access to project materials, and authority to do the work of pressing these materials, and oversight over other participants. It is also built on a data structure that uses RDF files built on the FRBR ontology to identify and relate the produced transcriptions (“texts”), the exemplars they derived from (“documents,” usually in the form of a digital image of a particular witness), and the intellectual construct they instantiate (the “work," or our preferred term, "entity"). Thus anyone interested in John Donne’s poem “The Good Morrow” will find various “texts” (transcriptions) of this work as found in the extant “documents” (the poem as it is found in each of the manuscripts and printed books that contain it).

The tool itself enables uploading of digital images of primary documents in jpeg, tiff, or pdf, and linkage of these images with a transcription space. The user supplies information for each document, which produces an RDF file that defines the text that is to be transcribed and its relationship to the source document and entity. The user also defines the structure of the document, which is rendered behind the visible transcription in TEI conformant XML. The transcription area, which is automatically linked to the source image, can also support any XML markup that is desired or required for intelligent transcription of the source document.

This open-source tool will be freely available free of charge for use and adaptation by anyone anywhere. Development of this tool is funded by a generous grant from the Canadian Foundation for Innovation with the support of the Digital Research Centre at the University of Saskatchewan.

This poster will be accompanied by a live demonstration of the transcription workspace.