Part of speech tagging is a fairly well-defined process. Festival includes a part of speech tagger following the HMM-type taggers as found in the Xerox tagger and others (e.g. DeRose88). Part of speech tags are assigned, based on the probability distribution of tags given a word, and from ngrams of tags. These models are externally specified and a Viterbi decoder is used to assign part of speech tags at run time.
So far this tagger has only been used for English but there
is nothing language specific about it. The module
assigns the tags. It accesses the following variables for
NILno part of speech tagging takes place.
pos_mapshould be a a list of pairs consisting of a list of tags to be mapped and the new tag they are to be mapped to.
Note is it important to have the part of speech tagger match the tags used in later parts of the system, particularly the lexicon. Only two of our lexicons used so far have (mappable) part of speech labels.
An example of the part of speech tagger for English can be found in `lib/pos.scm'.