Quantitative Methods (History)
The use of quantitative methods is an intrinsic part of historical research but has not always met with blanket favour over the years in all areas of the discipline. In a recent essay, William Thomas (Thomas, W. (2004), ‘Computing and the Historical Imagination’, in Schreibman, S., Siemens, R., Unsworth, J., (eds), A companion to Digital Humanities, (pp. 56 - 68)) cites the controversy over the publication in 1974 of Robert Fogel and Stanley Engerman’s Time on the Cross: The Economics of American Negro Slavery, which (in amongst a vast amount of other detail) posited the claims that only two percent of the value of income produced by slaves was expropriated by their masters and that the typical enslaved person received less than one whipping per year. These arguments were based on economic models and mathematical methods of analysis (cliometrics) and unsurprisingly met with fierce criticism from many quarters. Despite this sort of opposition, however, the use of statistical techniques is deeply embedded into economic and social history research, as well as being a central component of data mining. In journals such as: Social Science History; Social History; and Economic History Review, accounts of the methodological processes employed in the various analyses featured are subordinate to the factual historical substance of the article and may therefore be giving a misleading impression of the acceptance and use of these techniques in various sections of the historical research community.
The most widely used statistical tools are borrowed from the Social Sciences where there seems to be a consensus that the SPSS system, first developed in 1968, continues to offer the correct balance of usability and functionality for most forms of analysis. Minitab software is also widely referenced by course descriptions for academic departments offering applied statistics modules, as is SAS, despite the fact that there are also numerous freeware and open source options available, many of which claim to focus on specific areas of functionality. The ways in which statistics packages can be used is a complex and highly specialised field of study in its own right, but Boonstra et al propose the following methods as holding ‘great promise for future historical research.’
- Logistic Regression – a model for predicting a dual category variable (e.g. married/unmarried, live/die, north/south
- Multilevel Regression – an analysis that includes hierarchical data (e.g. data about patients, the doctors treating them, the hospitals they work in, and the region the hospital is located)
- Event History Analysis – a study of the independent variables which may contribute to the likelihood of an event occurring
- Ecological Inference – reconstructing individual behaviour from aggregate data where there is a paucity of information relating to the individual
- Time Series Analysis – a range of techniques for studying change over periods of time
Boonstra et al contend that the adoption of these techniques across all areas of historical study would enable interesting and productive research.
A related area where quantitative tools are widely used is in the construction of simulations, another technique that has been in existence for a very long time but has not been taken up consistently across the community. The purpose of this technique is to analyse behaviour and events in the context of historically given variables in order to gain a better understanding of cause and effect within a defined system. Very old models include the semi-computerised attempt to simulate the outbreak of World War One by Hermann and Hermann (1967), and also SOCSIM (1970 onwards), a demographic micro-simulation program that enabled probabilistic analysis of marriage and fertility patterns within a closed society. Despite the lengthy history of their use in historical research, the application of current simulation techniques largely appear to be associated with risk assessment for businesses, environmental planning and scenario reconstructions for military situations.
Markov Chain sequences, which are widely used in simulation models and are relevant to determining the probability of an event occurring based on the occurrence of a previous event, have also been very influential in the field of Linguistics where much research has focused on probability models for automatically tagging large corpora. It should be apparent that the potential for corpus linguistics techniques to impact on historical research is significant and that problems such as the disambiguation of pre-modern-era spelling variants (which is the focus of the VARD project at the University of Lancaster and a component part of the Historical Thesaurus of English) should not require duplicate research effort outside of the Linguistics community.