Counting Words with Henry James: Towards a Quantitative Hermeneutics

July 18, 2013, 13:30 | Long Paper, Embassy Regents C

Her function was to sit there with two young men—the other telegraphist and the counter-clerk; to mind the ‘sounder,’ which was always going, to dole out stamps and postal-orders, weigh letters, answer stupid questions, give difficult change and, more than anything else, count words as numberless as the sands of the sea […] [1]

The anonymous telegrapher in Henry James 1898 London-based novella In the Cage has drawn significant attention in recent decades from literary critics and media historians. These scholars have read her story—of projecting herself into the domestic drama of the aristocrats who use her office—for its lessons about the anxieties of surveillance, the instability of public and private domains, the regulation of gendered information workers, and the impact of telegraphic mediation on discourse including James’s own style. [2] But while In the Cage is rich with possible readings for the work of media and cultural history, our own critical reflex to do ‘readings’—to suspect the text for what aesthetic, cultural, and historical lessons it encodes or conceals—overlooks the story’s own emphasis on a very different mode of textual encounter: counting words. By the light of recent interest in quantitative literary analysis, we can see the textual transactions of In the Cage from other angles: as a historical signal for when ‘distant reading’ (as we now call it) may have become necessary, and as a provocation to leave our conventional hermeneutics for the reflective reading that counting words might actually facilitate.

‘Counting Words with Henry James’ undertakes to demonstrate how the digital humanities participates in the recent turn in literary theory from the ‘hermeneutics of suspicion’ to what Rita Felski calls post-critical or ‘reflective reading.’ [3] I argue that critiques of the digital humanities as being anti- or even non-theoretical fundamentally misrecognize its alliance with such recent theoretical initiatives. Instead, the self-critical methodologies of digital humanities also manifest the ‘the intricate play of perception, interpretation, and affective orientation’ that characterizes critical reading after suspicion. [4] Thus, I ultimately hope to move beyond the unfortunate associations of quantitative literary analysis with ‘not reading’ or data crunching by offering a different theoretical vocabulary for its continuing work in partnership with literary theory. At the same time, I propose that the emerging theoretical program of ‘reflective reading’ can address methodological insufficiencies in how quantitative analysts move from data to interpretation or from signal to concept. [5]

My paper will share the results of ongoing text analysis and topic modeling trials of the works of Henry James at multiple levels of address. Using off-the-shelf tools like the text analysis suite Voyeur and the topic modeling toolkit MALLET, I undertake a series of experiments at an increasing scale, beginning with the text of In the Cage and scaling up through James’s entire collected works and prefaces. Quantitative approaches to James are not new, nor are these tools necessarily cutting edge, but my argument is instead about the appropriateness and timeliness of their application for ongoing critical discussions: about In the Cage as well as possible affiliations of surface and distant reading. [6] In the Cage uniquely warrants such treatment, in two senses: first, because of its own conspicuous thematics of textual abundance, word counting, and interpretation; and second, because of James’s own editorial efforts to control interpretations of In the Cage as the story soon joins another massive textual corpus: his collected works for the New York edition.

Throughout his novella, James describes late-nineteenth-century telegraphy with an arithmetical lexicon; the telegrapher is constantly counting, figuring, adding, calculating, working out meaning in the margins, and building a hypothesis about the relations of Captain Everard and Lady Bradeen from her process. Because, in the story, the telegrapher ultimately misreads these relations, James seems to deprecate her quantitative methods compared with his own elaborate narrative procedures. In his later preface to In the Cage, James describes its interest in the ‘wonderment’ of the telegraph office’s cacophonous information field, its investigation into ‘the question of what it might ‘mean’.’ In other words, the story is about challenges to reading practices and, more abstractly, about the methodologies of information processing that might lead to ‘meaning.’ And as the telegraphist tries to deal with the problem of texts at scale, she occupies a similar position of alienated curiosity with respect to digital humanists and large data sets. [7]

How would the telegrapher-as-quantitative-analyst read her own story? Or all of James’s works? The telegraphist is faced with proliferating telegraphic fragments and ‘words as numberless as the sands of the sea.’ These messages are more than merely fragments to reconstruct: they are also units of scalable information. Their textual compression and transcoding make possible the ‘massive addressability’ that characterizes large collections of digitized text. [8] Text analysis and topic modeling lets us further identify different levels by which to approach meaning in the text—e.g. the word, the genre, the topic cluster, the historical trend—as well as in larger corpora in which the text signifies. Those methods have reinvigorated questions familiar to the telegraphist of how to read, count, and interpret.

In addition to thematizing problems of literary interpretation across different textual scales, In the Cage invites them through its own revision history. Shortly after completing the novella, James began collecting his texts into a massive New York edition, revising his works and writing new prefaces which aim to direct our readings of their individual and collective significance. While James writes in his prefaces about what scale of textual address we should use to find meaning, this novella—from its oral composition and transcription, to its narrative about fragments and questions of meaning, to its tenuous status relative to James’s oeuvre—also invites us to reconsider the interpretive possibilities of words and clusters at scale. In a sense, In the Cage embodies the problem of textual addressability.

My reading of the novella aims to link the telegraphist’s wonderment, the ‘hypothesis-testing mode’ characteristic of recent work in quantitative literary analysis, and emergent forms of post-critical reading in literary theory. [9] In effect, post-critical reading offers a recuperative vocabulary for counting words, rescuing the telegrapher from Jamesian suspicion and perhaps bolstering the claims of quantitative literary analysis. Its recursive processes, I suggest, likewise draws upon what Felski calls ‘the intricate play of perception, interpretation, and affective orientation that constitutes aesthetic response.’ [10] Ultimately, this paper argues that the digital humanities are not post-theoretical, but they may be productively post-critical in generating a reflexive, quantitative hermeneutics.


Anon. (2013). Surface Reading/ Machine Reading: New Approaches to Texts and Data Available from: http://raley.english.ucsb.edu/wp-content/surface-reading_flyer.jpg (Accessed 5 March 2013).
Clayton, J. (1997). The Voice in the Machine: Hazlitt, Hardy, James, in Language Machines: Technologies of Literary and Cultural Production. New York: Routledge. 209–232.
Felski, R. (2009). After Suspicion. Profession. 28–35.
Flanders, J. (2009). The Productive Unease of 21st-century Digital Scholarship. Digital Humanities Quarterly. 3(3). Available from: http://digitalhumanities.org/dhq/vol/3/3/000055.html (Accessed 10 December 2009).
Heuser, R. & L. Le-Khac (2011) Learning to Read Data: Bringing out the Humanistic in the Digital Humanities. Victorian Studies. 54(1). 79–86.
Hoover, D. (2007) Corpus Stylistics, Stylometry, and the Styles of Henry James. Style. 41 (2). 174–203.
James, H. (2002) 'In the Cage', in Wegelin, C., & H. B. Wonham (eds.) Tales of Henry James. 2nd edn. New York: W. W. Norton & Company.
Keep, C. (2011) Touching at a Distance: Telegraphy, Gender, and Henry James’s In the Cage, in Media, Technology, and Literature in the Nineteenth Century: Image, Sound, Touch. Surrey, England: Ashgate. 239–255.
Liu, A. (2009) Digital Humanities and Academic Change. English Language Notes. 47(1). 17–35.
Marvin, C. (1988). When Old Technologies Were New: Thinking About Electric Communication in the Late Nineteenth Century. New York: Oxford University Press.
Menke, R. (2000). Telegraphic Realism: Henry James’s In the Cage. PMLA: Publications of the Modern Language Association of America. 115(5). 975–990.
Stauffer, A. (2011) Introduction: Searching Engines, Reading Machines. Victorian Studies. 54.1. 63–68.
Witmore, M. (2010) Text: A Massively Addressable Object. Wine Dark Sea [online]. Available from: http://winedarksea.org/?p=926 (Accessed 8 August 2011).


1. James, 2002, p.229.

2. Notable examples include Clayton, 1997; Keep, 2011; Menke, 2000; Marvin, 1988.

3. Felski, 2009.

4. 2009, p.31.

5. For an accessible review of this problem, see Stauffer, 2011.

6. For an example of a quantitative approach, see Hoover, 2007. A recent conference at New York University was devoted to such issues: ‘Surface Reading / Machine Reading: New Approaches to Texts and Data’, 2013.

7. For reports of this phenomenon, see Liu, 2009; Flanders, 2009; Heuser and Le-Khac, 2011.

8. Witmore, 2010.

9. Heuser and Le-Khac, 2011, p.85.

10. 2009, p.31.