Linguistics

tool: Concordance

Purpose: 

A software tool for performing concordance – the analysis of a set of words within its immediate context - on a body of text. The tool performs full concordance, reading and analysing each and every word in a text. It was initially written for the analysis of English texts, but has since been extended to cater for other Western languages. Limited support is also provided for text in East Asian scripts, such as Chinese and Korean.

Features: 
  • Index and word list creation
  • Word frequency count
  • Word usage comparison
  • Keyword analysis
  • Phrase and idiom discovery
A&H use case 1 description: 
The Historical Corpus of the Welsh Language 1500-1850 project used Concordance to analyse samples of Welsh text of different stylistic levels and varying geographic provenance that were created between 1500-1850.
Creator: 
R.J.C. Watt
Publisher: 
R.J.C. Watt
Specifications: 
Data capture: 
Software/programming languages used: 
Discipline: 
Data structuring and enhancement: 
Alternate tool(s): 
Licence: 
lifecycleStage: 

tool: CLAWS Tagger

Purpose: 

A software tool for performing Parts-of-Speech (POS) tagging - the classification of words into one or more categories based upon its definition, relationship with other words, or other context - on a body of text. CLAWS (Constituent Likelihood Automatic Word-tagging System) uses several methods to identify parts of speech, most notably a system called Hidden Markov models (HMMs) which involve counting cases and making a table of the probabilities of certain sequences of words.

Features: 
  • Parts-of-Speech tagging with an accuracy rate of approximately 96-97 percent for text analysed
  • Template tagging
A&H use case 1 description: 
The NECTE project amalgamated two separate corpora of recorded speech collected from local people on Tyneside in the UK. It has used the CLAWS tagging service to create part-of-speech tags within the corpus.
A&H use case 2 description: 
The 'Grammatical change in recent English (1961-1991)' project employed the CLAWS tagger to investigate recent changes in English grammar during the period 1961-1991.
Publisher: 
University Centre for Computer Corpus Research on Language (UCREL), University of Lancaster
Creator: 
University Centre for Computer Corpus Research on Language (UCREL), University of Lancaster
Specifications: 
Software/programming languages used: 
Discipline: 
lifecycleStage: 

project: High Throughput Humanities e-Research (HiTHeR) and FReSH (Forging Restful Services for e-Humanities)

High Throughput Humanities e-Research (HiTHeR) aimed to create a prototype system for analysing the Nineteenth Century Serials Edition (NCSE) corpus. The NCSE contains around 430,000 articles that originally appeared in roughly 3,500 issues of six 19th Century periodicals. The project investigated the use of grid technologies and high throughput computing to provide more intuitive ways of searching the NCSE’s large corpus. Specifically, the project set up a prototype campus grid and used it for carrying out text processing on this corpus. [read more]

This is a reminder that our survey on the use of GIS and other spatial technologies in the humanities will close on November 30th. We would be very grateful for your responses to this, it should only take 10-15mins. If you have not already done so please complete it at:

http://www.surveymonkey.com/s.aspx?sm=9dDdjhT28poQUCah5efugw....

Taking a cue from the social media trend, "Arcade" is a new website designed to serve as a social and scholarly community for those with an interest in humanities research. The interactive and multimedia site is sponsored by Stanford's Division of Literatures, Cultures, and Languages (DLCL), but is open for use by visitors from around the globe.

The site abounds with interactive elements including blogs by 25 different contributors, virtual seminars and online forums, making Arcade the first widely accessible and interactive platform for intellectual networking in the humanities.

project: DARIAH: Digital Research Infrastructure for the Arts and Humanities

Supporting and enhancing digitially enabled research. The Digital Research Infrastructure for the Arts and Humanities (DARIAH) aims to develop and maintain an infrastructure in support of ICT-based research practices across the arts and humanities, acting as a trusted intermediary between disciplines and domains. [read more]

project: Nineteenth Century Serials Edition

A three year Arts and Humanities Research Council (AHRC) funded project, ncse seeks to achieve two key objectives: First the ncse project responds to the pressing need to republish these fragile printed items in ways which maintain their integrity. As physical collections are often incomplete, and deteriorating quality hampers access, electronic editions offer new opportunities to re-present such material in a way that is, for the first time online, comprehensive and freely available meaning that the material can be used in entirely novel ways. [read more]

Pages