Encoding historical dates correctly: is it practical, and is it worth it?

July 19, 2013, 10:30 | Short Paper, Embassy Regents E

From the middle ages to the early 20th century, a bewildering variety of calendars and dating methods was in use across Europe. This presentation will address issues involved in encoding historical dates during the early modern period, and look at strategies for enhancing computability and interoperability in date-encoding.

Although other calendars are in use across the world, nearly all societies have adopted the Gregorian calendar to some degree; wherever you go in the modern world, the date and time are generally uncontroversial. In past centuries, this was far from true. Jardine (2009) cites the case of William of Orange, whose invasion fleet left the Netherlands on November 11, 1688, but landed in England on November 5, having negotiated not only the English Channel but also the ten-day discrepancy between the Gregorian calendar in use in the Netherlands and the Julian still used in England. The complicated history of the adoption of the new Gregorian calendar across Europe, between its introduction by Pope Gregory in 1582 and its final consolidation in the early part of the 20th century, is familiar to scholars (see for instance Cheney 2000 and Duncan 1998).

Our project, the Map of Early Modern London (MoEML), falls squarely within the period of maximum calendar confusion for England. The two principle sources of difficulty are the discrepancy between the Gregorian and Julian calendars (ten days during most of the period) and the change in New Year's Day, which from the 12th century onward was March 25 rather than January 1. The resulting ambiguity in actual dating is compounded by the ways in which writers, both in the period and after it, have chosen to deal with the difficulty. The terms "Old Style" and "New Style" (O.S. and N.S.) have been variously used to indicate the start-of-year convention in use, the leap-day adjustment between Julian and Gregorian calendars, or both; so if we see, for instance, "February 11 1650 New Style", it is by no means clear whether the date referred to would have been viewed by contemporaries as February 1 1650 (calendar adjustment), February 11 1649 (start-of-year adjustment), or February 1 1649 (both).

These issues of dating have been a major challenge for MoEML's encoders. Our collection of TEI born-digital documents and historical texts includes dates from many calendars. When we encode dates in historical texts, we first must determine which dating method was used by the clerk or author. Dates are given in terms of many possible systems: regnal years, papal years, mayoral years, legal terms, years calculated from a particular feast day, and even Anno Mundi figures (years since the creation of the world); we follow Cheney (2000) and Fryde (1986) in parsing these references and converting them to Julian dates. We give a few examples.

Julian date from primary source: "Alfred king of the weſt Saxons, in the yere 886..." (Stow 1598 8)

regnal date: “a par[l]iament being holden at Carlile in the [...] 35. of Edwarde the firſt" (Stow 1598 11)

Anno Mundi dates, combined with proleptic Julian: "[...] Eneas, the ſonne of Venus, daughter of Iupiter, aboute the yeare of the world 2855. the yeare before Chriſtes natiuitie, 1108. builded a Citie [...]" (Stow 1598 1)

We also use a wide variety of modern sources such as the Oxford Dictionary of National Biography. The ODNB has precise methods of expressing uncertainties in dates and date ranges, but does not specify how uncertainties arise, which means we can determine precision from the ODNB but not accuracy.

We have attempted to discover whether other projects are concerned with calendar issues, and if so, whether they have adopted similar encoding methods. A brief survey of the projects listed on the TEI Projects page (http://www.tei-c.org/Activities/Projects/) shows, at the time of writing, 152 projects in total. Of those, 68 projects could be expected to contain materials in the historical range that concerns us. We were able to retrieve XML from 19, 16 of which contained encoded dates that would be subject to calendar issues. Only three of those projects appear to have taken account of calendars in their encoding. The prevalent view is well expressed by Godfried Croenen (personal communication, 2012-10-05):

"All medievalists use Julian dates to refer to any date before the introduction of the Gregorian calendar, and so all the dates before the 16th century I have ever encoded into XML TEI documents are in Julian dates. I never felt it would be useful to convert these dates to Gregorian dates, as nobody would know what I was referring to."

Croenen also expresses doubt as to whether it is practical or useful to attempt date conversion between calendars. Others have pointed out that, where date encoding involves only the year, there is no reason to worry about the calendar, and one might as well encode using @when (whose datatype is explicitly Gregorian) with a Julian date. However, in the case of England between the 12th century and 1752, assuming Julian years amounts to an allowance that nearly one in four dates is likely to be wrong, because of the New Year issue. Another objection to regularizing all date-encoding to use Gregorian is that it is unconventional to use the Gregorian calendar proleptically. However, it is a long-standing practice to use the Julian calendar proleptically, referring to dates in antiquity — in fact, Stow does this in one of the examples above, in which he glosses the Anno Mundi date 2855 as 1108 BC.

In our encoding of dates for the MoEML project, we have two major concerns: that dates be as accurate as possible so that we know when an event occurred (or at least that the source and scope of inaccuracy be clearly expressed), and that they be computable. We are constructing an eventography, and we want to be able to plot event sequences on timelines. We would also like to be able to integrate our data with that of other early modern projects, many of which will have data from countries whose calendar usage varies substantially from English practice. As a result of these concerns, we are early adopters of some recently-added features of TEI that are intended to formalize accurate encoding of dates from differing calendars.

The original P5 attributes for encoding dates included two distinct classes: att.datable.w3c, and att.datable.iso. These two classes allow slightly different forms of date encoding (derived from XML Schema datatypes, and ISO 8601 respectively), but both are explicitly based on the Gregorian calendar. In other words, it is clearly wrong to encode a Julian date using one of these attributes:

*<birth when="1566" calendar="#julianEngland">1566</birth>

The recently-added att.datable.custom class remedies this deficiency by providing a full suite of dating attributes designed for non-Gregorian calendars, along with the @datingMethod attribute through which the calendar used can be specified (@calendar refers to the calendar used in the text content of a dating element, not its attributes). We can now encode a date with these attributes and Julian dates :

<birth when-custom="1566" datingMethod="#julianEngland" calendar="#julianEngland">1566</birth>

These tags show that both the attribute value and the text date use the Julian calendar. Given that our purpose is computability, though, the question arises: why not simply convert all our dates to (proleptic) Gregorian before encoding them? If we take the example above, this conversion would be the result:

<birth notBefore="1566-04-04" notAfter="1567-04-03" calendar="#julianEngland">1566</birth>

The conversion, in accounting for the New Year issue and the leap day discrepancy, becomes a rather unwieldy range. Moreover, this conversion is itself computable, so it is unnecessary to impose this burden on our encoders. Instead, we encode Julian dates using @when-custom. On the website, we generate tooltips for all such dates showing the equivalent date or date-range in Gregorian. For the purposes of interoperability, the same conversion could be used to insert Gregorian dating attributes.

Until now, the encoding of historical dates in TEI projects appears to have been haphazard, for a variety of reasons, including the lack of adequate encoding mechanisms, academic convention, and historical practice. However, we now have a set of attributes that enable us to be more precise, and we can easily create conversion functions between (for instance) Julian and Gregorian dating systems. Moreover, as we begin to integrate data from different projects, and create timelines and event sequences that require accurate dating, there is more reason than ever for developing and propagating good practice in date encoding; we do not want to end up creating inter-project timelines in which (for example) the invasion force of William of Orange arrives in England several days before it sets off from the Netherlands. In presenting our date-encoding practices and the issues we have encountered, we hope to stimulate a discussion on accurate date encoding that will encourage those working on projects involving non-Gregorian calendars to be aware of the issues, and to collaborate in creating methods for encoding and interchange that will obviate these problems.


Cheney, C. R. (2000). A Handbook of Dates for Students of British history. Revised by Michael Jones. Cambridge: Cambridge University Press.
Duncan, D. E. (1998). Calendar: Humanity’s Epic Struggle to Determine a True and Accurate Year. New York: Avon Books.
Duncan, D. E. (1999). Calendar. Smithsonian 29 (11). 48-58.
Fryde, E. B., D. E. Greenway, S. Porter, and I. Roy (1986). Handbook of British Chronology. 3rd edn. London: Offices of the Royal Historical Society.
Jardine, L. (2009). Another point of view. London: Preface.
Stow, J. (1598). A SVRVAY OF LONDON. London: John Windet for John Wolfe.