Five desiderata for scholarly editions in digital form

July 19, 2013, 08:30 | Long Paper, Embassy Regents C

Scholarly editions have received considerable attention from the digital humanities over the last decades. There are several reasons for this: the fundamental place in the academy occupied by scholarly editions, with many humanities disciplines basing their work on texts which require establishment; the amenability of scholarly editions to computer methods, in terms of the digital representations of primary sources from which editions derive, and in terms of the highly-structured nature of the editions themselves, lending itself naturally to complex computer encoding. Accordingly, we have seen many digital editions made, in many different forms, and we are beginning to see too the first attempts towards a theory of digital editions, as a phenomenon distinct from print editions (which are, indeed, the subject of many theoretical debates): thus articles by Kiernan, Gabler, Pierazzo, Siemens and Robinson.

This paper will use the knowledge we have gained from our experiences, and the first competing theoretical discussions of digital editions, towards the statement of a set of desiderata, expressed as five propositions, collectively declaring the principles upon which scholarly editing in the digital medium should proceed. At the least, these will serve as starting points for useful discussion.

Proposition 1: A digital edition should encode both the text of the document, line by line and page by page, and the text of the work which the document text instances, chapter by chapter, paragraph by paragraph (or, poem by poem, line by line). One should be able to examine the text of the document, a page at a time; one should be able to read the text of the work, a chapter at a time, a poem at a time. This might seem obvious: yet several recent editions (for example, Sutherland’s edition of the Jane Austen manuscripts online), the thirty or so online transcription tools listed by Ben Brumfield, and the genetic transcription system proposed by the Text Encoding Initiative (Burnard et al), all assert that one need encode only the disposition of the text on the document. Thus, not one of the separate works contained in the Austen manuscripts is accessible as a work in the Sutherland edition. It is a mistake not to encode both document and work. It is certainly more difficult (because of the all-too-familiar overlapping hierarchy problem) to encode both. But it can be done, and it should be done.

Proposition 2: Every act of editing in a digital edition should be attributed explicitly to the person who did it. Any act of editing, in any medium, requires knowledge and effort: an edition is made from thousands, millions of such acts. Every such act should be recognized, and explicitly linked to the person who did it. Our confidence in editions comes from knowing who was responsible for each act. In the digital medium, as in the print medium, attribution is everything. But in the digital medium, we can go further: we can label who did what. And we should.

Proposition 3: Everything in digital editions should, by default, be made available under a Creative Commons Attribution Share-alike licence. Editing in the digital medium is profoundly collaborative. Even if one scholar does all the work of transcription, encoding, interface design and publication on his or her own: others will want to take elements of that edition and reuse it in ways that the original scholar could never anticipate. Further, we may expect, as the movement to ‘social editing’ gathers pace, that we will see more and more informed readers becoming editors: contributing transcriptions, identifying documents, enriching their encoded texts by labelling persons, places, events. It is not quite true to say that digital editions belong to us all: but it is nearly true enough for us to make us much as possible as free as possible to all: hence, the Attribution Share-alike licence. We should not impose the ‘non-commercial’ restriction (which is too often a back-door way of maintaining the old worst habits of academic culture, to reward our friends and punish our enemies). Indeed, we should welcome and encourage commercial interests to provide the best interfaces they can to our editorial materials; the ‘share-alike’ provision will foreclose any commercial attempt to monopolize the text. Nor should we require the ‘no-derivative works’ restriction: we should welcome the scholar who wants to take what we did and use it as the starting point for his or her work — so long as this scholar acknowledges our work.

Proposition 4: All the materials in a digital edition should be available independent of any one interface. It should be possible (for example) for a scholar interested in the Greek New Testament to take the text of the transcription of Codex Sinaiticus given on the British Library website, combine it with other texts taken from other places, and present it in a distinct interface, offering tools and facilities nowhere else available. To make this possible, it will not be enough for editors to provide text: they must provide the facility (whether through metadata or an ontology or an API) to allow that text to be taken up and given out through an interface completely independent from the original digital publication. Of course, this cannot happen unless the materials are free of any restrictive licence, as argued in proposition 3.

Proposition 5: All the materials in a digital edition should be held in a long-term sustainable data store. Large scale data storage facilities, maintained in perpetuity as part of an institution’s core mission (“in perpetuity” that is, as long as the institution lasts), have become commonplace in the last decade, thanks to the success of the institutional repository movement. Yet, very few editions ground their data in an institutional repository or similar facility. Institutional repositories are mature, well able to dispense scholarly edition materials, of every kind, and are increasingly recognized by universities and other memory institutions as the digital equivalent of a print library. We can and should use them.

A survey of existing digital editions against these propositions would yield interesting reflections. It appears that not one of the many digital editions so far made satisfies all five propositions; and rather many fail to satisfy even one. It is possible, of course, that these five propositions are flatly wrong. It is also possible that much of what we have been doing, under the dizzying spell of the technologies that press upon us, is wrong, and needs to change.


Brumfield, W. B. (2012). Crowdsourced Transcription Tool List. Blog entry for April 11, 2012. http://manuscripttranscription.blogspot.co.uk
Burnard, L., F. Iannidis, E. Pierazzo, and M. Rehbein (n.d.) An Encoding Model for Genetic Editions. http://www.tei-c.org/Activities/Council/Working/tcw19.html.
Codex Sinaiticus Online. http://www.codex-sinaiticus.net/en/.
Gabler, H. W. (2007). The Primacy of the Document in Editing. Ecdotica, 4. 197–207.
Gabler, H. W. (2010). “Theorizing the Digital Scholarly Edition.” Literature Compass, 7. 43-56.
Kiernan, K. (2006). “Digital facsimiles in Editing.” In Burnard, L., K. O’Brien O’Keeffe and J. Unsworth (eds.) Electronic Textual Editing. New York: Modern Language Association of America, 262-268.
Mingana Collection online. http://vmr.bham.ac.uk/Collections/Mingana/
Pierazzo, E. (2011). A Rationale of Digital Documentary editions. Literary and Linguistic Computing, 26. 463-477.
Robinson, P. M. W. (2013). “Towards a theory of digital editions.” Variants 10. 105-132.
Siemens, R., M. Timney, C. Leitch, et al. forthcoming. Toward Modeling the Social Edition: An Approach to Understanding the Electronic Scholarly Edition in the Context of New and Emerging Social Media. Literary and Linguistic Computing.
Sutherland, K. (2010). Jane Austen’s Fictional Manuscripts Digital Edition. http://www.janeausten.ac.uk/index.html