A Humanist Perspective on Building Ontologies in Theory and Practice
July 18, 2013, 08:30 | Long Paper, Burnett 115
150 Word Summary: Bowker and Star (1999) warn of the shaping force of classification systems, the power of their apparent naturalization as part of infrastructures, and the consequences thereof. The rise of the semantic web and its grounding in the use of ontologies threatens to explode the scale of this issue. This paper explores the practical implications and theoretical limits of using ontologies, offering a survey of best practices and ongoing areas of active research. The survey shows that the process of ontology construction and integration in the semantic web does have the potential to restrict human thought. It is then argued that certain features of the semantic web make it possible to avoid this detrimental outcome but only if a critical mass of people—the community of humanists might be enough—actively use these features to keep the web a place of flexible ontologies and humane classification.
Introduction
Creating an ontology amounts to creating a classification system and asserting a system of logic with it. There are inherent consequences to doing this that are both “real world” and potentially serious. Bowker and Star make an assessment of these, arguing that, “it is politically and ethically crucial to recognize the vital role of infrastructure in the ‘built moral environment.’ Seemingly technical issues like how to name things and how to store data in fact constitute much of human interaction and much of what we come to know as natural” (Bowker and Star 1999, 326). The semantic web promises to explode this observation as the interplay of ontology proliferation and convergence begins to create the largest classification system in human history. Insofar as using an ontology tends to establish certain actions, things, and dispositions as natural, this same tendency leads to problems of commensurability that intersect with seminal work by prominent philosophers such as Quine, Wittgenstein, Foucault, Kuhn, and the poststructural movement as a whole (cf. Cope, Kalantzis, and Magee 2011). Bowker and Star advocate the use of flexible or “living” classifications as the approach to solving these deep issues (Star 2010; Bowker and Star 2011), but this is only a prescription and it is left open how the medicine should be delivered.
In the semantic web, ontologies provide the vocabularies and the schemas that tell the machines used to crawl the web just what to do with data when it is found, a view that is supported by the number of introductions to the semantic web that treat it in a purely mechanical way. From such perspectives ontologies bring the promise of inferential reasoning on content and an open environment where information is more easily exchanged (W3C). Along with these benefits come potential problems, including the rise of near silent normalizations around how we think about the world that we live in and the very real possibility of incommensurability among ontologies. Clearly there is a large divide between benefits and concerns, one that is extended through the contrasts between practice and theory, stability and flexibility, mechanical and human(e).
This paper explores possibilities for traversing this divide and considers the extent to which it can be narrowed by considering both the practical implications and theoretical limits of using ontologies, offering a survey of best practices and ongoing areas of active research. From this basis, it argues that the process of ontology construction and integration currently taking place around the semantic web amounts to a massive act of world construction with the potential to severely impact human life through rigidly framing human thought. Certain features of the semantic web make it possible to avoid this, but only if a critical mass of people---the community of humanists might be enough---actively use these features to keep the web a place of flexible ontologies. [1]
This argument and the accompanying assessments of the state of the semantic web today emerge from work that we have done around building a web-based editor to allow scholars with limited technical expertise in either XML or RDF to contribute or enhance online scholarly materials in these data formats. [2] Enabling scholars to use these tools to connect with the open web through a system of references and annotations has meant that the various research groups we are involved with (CWRC, INKE, and the Text Mining & Visualization for Literary History Project) are grappling with ontology construction and integration. Continuing from preliminary work presented at the NeDiMAH workshop at DH2012 (Brown et. al. 2012), this paper will cover the following:
- 1 Review of practical approaches to ontology integration
- 2 Current ontology use: summary and analysis
- 3 Philosophical issues at the core of ontology creation, selection, integration, and use
Review of practical approaches to ontology creation and integration
The challenge of creating ontologies goes back at least to Aristotle’s Categoriae. Problems surrounding such classification systems were given life in the digital world with the creation of the first databases in the 1960s and issues of structuring and integrating data have been of interest to the computer science and business communities since then. The emphasis from these communities has mostly been on the pragmatics of making the related systems work. Particular attention has been given to the challenge of combining ontologies (for extensive examples of this see Noy et. al. 2005 and Klein 2001), a particularly complicated problem that has roots reaching into problems of normalization and commensurability. In the interest of expanding the repertoire of both conceptual and practical tools available to humanists working with ontologies, we share the results of a survey of these approaches to ontology integration, highlighting those from which humanists stand to benefit most particularly. The survey covers:
- ontology conformity, as seen in projects like Pelagios and NINES;
- mediating ontologies/schemas;
- super ontologies such as SUMO (Suggested Upper Merged Ontology);
- simple ontology joins such as those allowed by owl:import;
- ε-Connections (Cuenca Grau et al. 2011);
- ontology design patterns (Presutti and Gangemi, 2008).
Summary and analysis of current ontology use
Some general statistics are available for ontology use (semanticweb.org 2012; UMBC). These reveal that the Dublin Core (DC) namespace is referenced by 82% of semantic websites, with Friend of a Friend (FOAF) a distant second at 38%. While such figures suggest convergence around particular ontologies, they are also clearly influenced by the nature of existing semantic web sites, in which archival institutions such as a libraries have exposed large quantities of material. Fully understanding the implications of such figures requires further inquiry, such as what elements of the DC namespace are in use, what elements of the schemas these sites are leveraging from each namespace, and what semantic web elements sites have created themselves. This information is not readily obtained, so we have built a web crawler to collect it. By the time of DH2013 we aim to have a representative sample from which to report on the state of the semantic web in terms of ontology use at a more granular level.
We will combine the results of this web-crawling endeavour with the results of an analysis and assessment of the leading namespace options available. We will highlight probable points of failure when using ontologies in combination, including type propagation and the misuse of symmetric predicates such as subPropertyOf. Our focus, in terms of capturing relations, properties, and spatial-temporal locations, will be on the following ontologies:
Philosophical issues raised by ontology creation, selection, integration, and use
Building an ontology might start out as a simple matter of enabling certain features or capacities but doing so raises philosophical issues that humanists are experienced at recognizing and considering. The most significant of these is the social construction of reality, with the predicted increase in the prevalence of ADHD via the introduction of new criteria for diagnosis in the DSM-5 (Ghanizadeh 2012; Whitely 2010) standing as a tangible example. Closer to home for digital humanists, the DSM-5 also recommends that a new condition known as “Internet Use Disorder” be the subject of further study so that it may be determined whether to fully include in the future (Walton 2012).
The semantic web was in its fetal stages when Bowker and Star worried that, “In the past 100 years, people in all lines of work have jointly constructed an incredible, interlocking set of categories, standards, and means for interoperating infrastructural technologies. We hardly know what we have built. No one is in control of infrastructure; no one has the power centrally to change it” (Bowker and Star 1999, 319). Now that the semantic web is more established it is possible to see that there are opportunities to leverage its features to address such concerns and to hope that a spillover effect might occur. The particular features that we have in mind are the tolerance of the semantic web to ontologies that are incomplete, contradictory, or otherwise ‘imperfect’, the various levels of granularity that are possible using tools like RDF and OWL, and the transparency that linked online ontologies necessarily provide. We consider these features in light of the insights of the previous sections and conclude that while there is still reason for concern, mindfulness, and caution in constructing ontologies for the semantic web, it still holds promise for cultivating humane practices of classification.
References
Notes
1. While this critical mass might be drawn from any discipline, humanists most regularly make a practice of attempting to understand the relationships between humans, cultures, representations and artifacts, all of which are deeply intertwined with the semantic web. This understanding is born from a practice of critical reflection that is importantly stretched between the practical and the theoretical, two poles which must be reconciled for the web to avoid the potential problems raised here.
2. The tool we are developing is the CWRC-Writer. More information about it may be found at http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/cwrc-writer-an-in-browser-xml-editor/ .