CLAWS Tagger
Purpose:
A software tool for performing Parts-of-Speech (POS) tagging - the classification of words into one or more categories based upon its definition, relationship with other words, or other context - on a body of text. CLAWS (Constituent Likelihood Automatic Word-tagging System) uses several methods to identify parts of speech, most notably a system called Hidden Markov models (HMMs) which involve counting cases and making a table of the probabilities of certain sequences of words. For example, if an article and verb appear together, the next word is more likely to be a preposition, article, or noun, rather than another verb.
Features:
- Parts-of-Speech tagging with an accuracy rate of approximately 96-97 percent for text analysed
- Template tagging
A&H use case 1 description:
The NECTE project amalgamated two separate corpora of recorded speech collected from local people on Tyneside in the UK. It has used the CLAWS tagging service to create part-of-speech tags within the corpus.
A&H use case 2 description:
The 'Grammatical change in recent English (1961-1991)' project employed the CLAWS tagger to investigate recent changes in English grammar during the period 1961-1991.
Publisher:
University Centre for Computer Corpus Research on Language (UCREL), University of Lancaster
Creator:
University Centre for Computer Corpus Research on Language (UCREL), University of Lancaster
lifecycleStage:
Specifications:
Licence:
Discipline:
Platform:
Software/programming languages used: