A number of basic data manipulation tools are supported by Festival. These often make building new modules very easy and are already used in many of the existing modules. They typically offer a Scheme method for entering data, and Scheme and C++ functions for evaluating it.
Regular expressions are a formal method for describing a certain class of mathematical languages. They may be viewed as patterns which match some set of strings. They are very common in many software tools such as scripting languages like the UNIX shell, PERL, awk, Emacs etc. Unfortunately the exact form of regualr expressions often differs slightly between different applications making their use often a little tricky.
Festival support regular expressions based mainly of the form used in
the GNU libg++
Regex class, though we have our own implementation
of it. Our implementation (
EST_Regex) is actually based on Henry
Spencer's `regex.c' as distributed with BSD 4.4.
Regular expressions are represented as character strings which are interpreted as regular expressions by certain Scheme and C++ functions. Most characters in a regular expression are treated as literals and match only that character but a number of others have special meaning. Some characters may be escaped with preceeding backslashes to change them from operators to literals (or sometime literals to operators).
a-zfor all lower case characters. If the first character of the range is
^then it matches anything character except those specificed in the range. If you wish
-to be in the range you must put that first.
?etc to operate on more than single characters.
Note that actuall only one backslash is needed before a character to escape it but becuase these expressions are most often contained with Scheme or C++ strings, the escpae mechanaism for those strings requires that backslash itself be escaped, hence you will most often be required to type two backslashes.
Some example may help in enderstanding the use of regular expressions.
aand ending with a
The Scheme function
string-matches takes a string and
a regular expression and returns
t if the regular
expression macthes the string and
One of the basic tools available with Festival is a system for building and using Classification and Regression Trees (breiman84). This standard statistical method can be used to predict both categorical and continuous data from a set of feature vectors.
The tree itself contains yes/no questions about features and ultimately provides either a probability distribution, when predicting categorical values (classification tree), or a mean and standard deviation when predicting continuous values (regression tree). Well defined techniques can be used to construct an optimal tree from a set of training data. The program, developed in conjunction with Festival, called `wagon', distributed with the speech tools, provides a basic but ever increasingly powerful method for constructing trees.
A tree need not be automatically constructed, CART trees have the advantage over some other automatic training methods, such as neural networks and linear regression, in that their output is more readable and often understandable by humans. Importantly this makes it possible to modify them. CART trees may also be fully hand constructed. This is used, for example, in generating some duration models for languages we do not yet have full databases to train from.
A CART tree has the following syntax
CART ::= QUESTION-NODE || ANSWER-NODE QUESTION-NODE ::= ( QUESTION YES-NODE NO-NODE ) YES-NODE ::= CART NO-NODE ::= CART QUESTION ::= ( FEATURE in LIST ) QUESTION ::= ( FEATURE is STRVALUE ) QUESTION ::= ( FEATURE = NUMVALUE ) QUESTION ::= ( FEATURE > NUMVALUE ) QUESTION ::= ( FEATURE < NUMVALUE ) QUESTION ::= ( FEATURE matches REGEX ) ANSWER-NODE ::= CLASS-ANSWER || REGRESS-ANSWER CLASS-ANSWER ::= ( (VALUE0 PROB) (VALUE1 PROB) ... MOST-PROB-VALUE ) REGRESS-ANSWER ::= ( ( STANDARD-DEVIATION MEAN ) )
Note that answer nodes are distinguished by their car not being atomic.
The interpretation of a tree is with respect to a Stream_Item The FEATURE in a tree is a standard feature (see section 14.6 Features).
The following example tree is used in one of the Spanish voices to predict variations from average durations.
(set! spanish_dur_tree ' (set! spanish_dur_tree ' ((R:SylStructure.parent.R:Syllable.p.syl_break > 1 ) ;; clause initial ((R:SylStructure.parent.stress is 1) ((1.5)) ((1.2))) ((R:SylStructure.parent.syl_break > 1) ;; clause final ((R:SylStructure.parent.stress is 1) ((2.0)) ((1.5))) ((R:SylStructure.parent.stress is 1) ((1.2)) ((1.0))))))
It is applied to the segment stream to give a factor to multiply the average by.
wagon is constantly improving and with version 1.2 of the speech
tools may now be considered fairly stable for its basic operations.
Experimental features are described in help it gives. See the
Speech Tools manual for a more comprehensive discussion of using
However the above format of trees is similar to those produced by many other systems and hence it is reasonable to translate their formats into one which Festival can use.
Bigram, trigrams, and general ngrams are used in the part of speech tagger and the phrase break predicter. An Ngram C++ Class is defined in the speech tools library and some simple facilities are added within Festival itself.
Ngrams may be built from files of tokens using the program
ngram_build which is part of the speech tools. See
the speech tools documentation for details.
Within Festival ngrams may be named and loaded from files
and used when required. The LISP function
takes a name and a filename as argument and loads the Ngram
from that file. For an example of its use once loaded see
Another common tool is a Viterbi decoder. This C++ Class is defined in the speech tools library `speech_tooks/include/EST_viterbi.h' and `speech_tools/stats/EST_viterbi.cc'. A Viterbi decoder requires two functions at declaration time. The first constructs candidates at each stage, while the second combines paths. A number of options are available (which may change).
The prototypical example of use is in the part of speech tagger which using standard Ngram models to predict probabilities of tags. See `src/modules/base/pos.cc' for an example.
The Viterbi decoder can also be used through the Scheme function
Gen_Viterbi. This function respects the parameters defined
in the variable
get_vit_params. Like other modules this
parameter list is an assoc list of feature name and value. The
parameters supported are:
ngram.load) to be used as a "language model".
wfst.load) to be used as a "language model", this is ignored if an
ngramnameis also specified.
Here is a short example to help make the use of this facility clearer.
There are two parts required for the Viterbi decode a set of candidate observations and some "language model". For the math to work properly the candidate observations must be reverse probabilities (for each candidiate as given what is the probability of the observation, rather than the probability of the candidate given the observation). These can be calculated for the probabilties candidate given the observation divided by the probability of the candidate in isolation.
For the sake of simplicity let us assume we have a lexicon of words to
distribution of part of speech tags with reverse probabilities. And an
pos-tri-gram over ngram sequences of part of
speech tags. First we must define the candidate function
(define (pos_cand_function w) ;; select the appropriate lexicon (lex.select 'pos_lex) ;; return the list of cands with rprobs (cadr (lex.lookup (item.name w) nil)))
The returned candidate list would look somthing like
( (jj -9.872) (vbd -6.284) (vbn -5.565) )
Our part of speech tagger function would look something like this
(define (pos_tagger utt) (set! get_vit_params (list (list 'Relation "Word") (list 'return_feat 'pos_tag) (list 'p_word "punc") (list 'pp_word "nn") (list 'ngramname "pos-tri-gram") (list 'cand_function 'pos_cand_function))) (Gen_Viterbi utt) utt)
this will assign the optimal part of speech tags to each word in utt.
The linear regression model takes models built from some external
package and finds coefficients based on the features and weights. A
model consists of a list of features. The first should be the atom
Intercept plus a value. The following in the list should consist
of a feature (see section 14.6 Features) followed by a weight. An optional third
element may be a list of atomic values. If the result of the feature is
a member of this list the feature's value is treated as 1 else it is 0.
This third argument allows an efficient way to map categorical values
into numeric values. For example, from the F0 prediction model in
`lib/f2bf0lr.scm'. The first few parameters are
(set! f2b_f0_lr_start '( ( Intercept 160.584956 ) ( Word.Token.EMPH 36.0 ) ( pp.tobi_accent 10.081770 (H*) ) ( pp.tobi_accent 3.358613 (!H*) ) ( pp.tobi_accent 4.144342 (*? X*? H*!H* * L+H* L+!H*) ) ( pp.tobi_accent -1.111794 (L*) ) ... )
Note the feature
pp.tobi_accent returns an atom, and is hence
tested with the map groups specified as third arguments.
Models may be built from feature data (in the same format as `wagon' using the `ols' program distributed with the speech tools library.