Approaching Algorithmic Media Analysis in the Humanities: An Experimental Testbed
July 18, 2013, 08:30 | Short Paper, Embassy Regents F
With all of the growth and advancement in text analysis and visualization over the past several years [1] , more attention is being paid both to analyzing Humanities data as well as to exploring the relationship the interpretive mind of the Humanist has with the quantitative evidence gathered by machines. Recently, Steve Ramsay has called for more attentive development of an “algorithmic criticism” for literary analysis, where a reader uses computers to reveal patterns not easily evident or retrievable without the computational power a machine can offer (34). The key for Ramsay, however, is that the computer itself struggles with interpretation, and it behooves the literary critic to read the patterns as one might read a text to find meaning rather than rely on the data as a sort of factual proof of any Humanistic argument. And while, as Ramsay has pointed out, this sort of computational reading is not without conceptual or methodological obstacles (not to mention the technical ones that new tools, projects, and ideas constantly encounter), there is enormous value in “allow[ing] computer-assisted criticism to be situated within the broader context of literary study” (13). In fields ranging from literary studies (consider Franco Moretti’s concept of distant reading) to computer science (such as Ben Shneiderman’s work on information visualization) to predictive cultural analysis (exemplified in today’s popular culture by Nate Silver and his arguments for more work in computer modeling of complex systems), the act of harnessing the power of algorithms to help read “text” demonstrates how vital it is for the digital humanities to avoid the pitfalls that come from crunching words without an ability to also understand them.
It’s our contention that something is often missing from the conversation, however—algorithmic media analysis. If Digital Humanists want to more fully make sense of the relationship between meaning and computation, investigating quantitative reading of media is a far too under developed endeavour. The immediate initial reaction is often one of skepticism—after all, even when focusing solely on textual data, computer-assisted analysis is making strides but is still in its infancy in terms of being able to offer patterns to read that move beyond linguistic models or sentiment analysis. Yet outside of the Digital Humanities an ever-growing number of projects and computational tools from big names in “big data” are exploring the application of traditional text analysis approaches within a wider variety of media, ultimately allowing researchers to discover how multimedia might aid in the evolution of tools and paradigms for making more of quantitative analysis, [2] and we see an algorithmic approach from the Humanistic perspective a significant complement to what might be happening in the commercial enterprise R&D labs.
Of course, we’re not in any way claiming that no work on machine-based analysis of multimedia data has ever taken place in the Humanities; quite the opposite, actually. There has always been a small but steady attention paid to how digital humanists might utilize new models for computationally approaching complex textual narratives, semantic relationships, image corpora, audio, and other such media [3] ; we might look to the work ongoing at the University of California, San Diego’s Software Studies Initiative (such as their “FilmHistory.viz” project generated with the CineMetrics software tools); Lev Manovich’s recent writings on visualization of visual media; the ShotLogger project, and Jason Mittell’s media studies theories of what he calls complex television. But nothing has ever taken off in the way that text analysis has recently, so a question arises; how might the digital humanist more fully embrace multimedia modes of expression as a valid, if not even more informative, subject of algorithmic criticism? What would, for example, a text analysis model for video look like, and might a quantitative approach to something like television or film yield a better understanding of some core Humanistic concepts such as story, character, or emotion? What sorts of quantitative analysis models might we apply to a medium such as television as a gateway for a deeper understanding of humanistic inquiry? And how useful might those models be, given some of the fundamental differences between text or language and newer communicative media of the past century?
In this short presentation, I will lay out this conceptual conversation and demonstrate some of this historical work that exists as examples of attempts to develop analytical tools for quantitative media reading. We are excited about how we might design our experiments to expand the realm that quantitative analysis of video might explore. I will then offer some of the initial trials we are undertaking to explore more robust approaches to an algorithmic criticism of multimedia within the digital humanities. Our preliminary trials are designed to explore what sorts of tools and visualizations might be truly useful for something as complex as a television episode or feature film. For example, whereas a digitized text might have only words to offer, a digital video object will have a text (the transcript), an audio stream (which may overlap with the text but which also includes points of analysis such as intensity, tone, pitch, speed, background music, ambient noise, etc.), and a linear sequence of images (which may be analyzed separately or in relation to other frames). Our various proof of concept models place all of these modes of information in conversation with each other, and we will begin to theorize the types of things that humanists might glean from further development of our initial experiments. For example, we look at what sorts of patterns might arise, from a visualization of a script placed in juxtaposition with a spectrogram of the audio track; how might the spatial and temporal difference between pixel color from one frame to the next as it might relate to a motion analysis.
As I report on our successes and failures with various quantitative models, seeing them truly as preliminary experiments that we hope will lead to tools, we will close with a case study that we hope to use as a demonstration of the real humanistic value of an algorithmic approach to multimedia. Given the underlying principle that algorithmic criticism is only useful inasmuch as a reader can discover patterns and apply semantics to those patterns, we might theorize that an effective way to evaluate a quantitative analytical tool would be to compare the patterns it reveals to patterns discovered manually by critics external to our experiments. We have taken a collection of pilot episodes of 30 American television series from the past 20 years, and and have given them to several different content experts at our university — media studies scholars, literary scholars specializing in narrative principles, contemporary American Studies scholars, etc. We have asked them to generate simple narrative maps of the episodes (i.e. periods of exposition, rising action, location of the climax, scenes depicting various emotions, etc.), as well as to generate some discussion as to what sorts of narrative features the various episodes have in common. Our purpose is to apply quantitative analysis approaches to the digital videos as well, with the ultimate goal being a theory as to what sorts of data-driven calculations, based on observation of technical details of digital video, might correlate best with the manual patterns that our scholars discover with traditional techniques. Can we use changes in audio intensity to help us better understand moments of conflict in a multimedia narrative? Can we find relationships between motion analysis and changing plot action?
Computers are still quite far from being able to perform tasks such as “understanding” nuance, tone, humor, irony, and other components of full semantic comprehension of literature. And we’re not in any way claiming that our analytical approaches are a silver bullet in machine learning of humanities text. But we’re arguing that they are the next step; that the digital humanities can advance the utility of algorithmic analysis by driving full-force in making multimedia the object of our study.
References
Notes
1. I most certainly recognize that text analysis is far from a new field, and has been an activity at the intersection of technology and the Humanities since the days of Father Busa. What I’m referring to, however, is the central primacy that text analysis and similar methodologies have recently acquired in the Digital Humanities discourse.
2. For example, Microsoft (“Multimedia Search and Mining”), IBM (“Semantic Learning and Analysis of Multimedia”), and Google (“Google X Laboratory”) are all heavily invested in automated processing of multimedia for the purpose of pattern recognition.
3. In addition to more comprehensive work, a cursory scan reveals that last year’s Digital Humanities conference offered presentations that dealt with computational analysis (or related strategies such as data mining, visualization, etc.) of acoustic ecology, literary genres, spatial readings, user generated content, aural and prosodic patterns in text, and computational narratology.