24.6. Regular Expressions

Regular expressions are fundamentally useful in any text processing language. This is also true in Festival's Scheme. The function string-matches and a number of other places (notably CART trees) allow th eunse of regular expressions to matche strings.

We will not go into the formal aspects of regular expressions but just give enough discussion to help you use them here. See [regexbook] for probablay more information than you'll ever need.

Each implementation of regex's may be slightly different hence here we will lay out the full syntaxt and semantics of the our regex patterns. This is not an arbitrary selection, when Festival was first developed we use the GNU libg++ Regex class but for portability to non-GNU systems we had replace that with our own impelementation based on Henry Spencer regex code (which is at the core of many regex libraries).

In general all character match themselves except for the following which (can) have special interpretations

. * + ? [ ] - ( ) | ^ $ \

If these are preceded by a backslash then they no longer will have special interpretation.

.

Matches any character.

(string-matches "abc" "a.c") => t
(string-matches "acc" "a.c") => t

*

Matches zero or more occurrences of the preceding item in the regex

(string-matches "aaaac" "a*c") => t
(string-matches "c" "a*c") => t
(string-matches "anythingc" ".*c") => t
(string-matches "canythingatallc" "c.*c") => t

+

Matches one or more occurrences of the preceding item in the regex

(string-matches "aaaac" "a+c") => t
(string-matches "c" "a*c") => nil
(string-matches "anythingc" ".+c") => t
(string-matches "c" ".+c") => nil
(string-matches "canythingatallc" "c.+c") => t
(string-matches "cc" "c.+c") => nil

?

Matches zero or one occurrences of the preceding item. This is it makes the preceding item optional.

(string-matches "abc" "ab?c") => t
(string-matches "ac" "ab?c") => t

[ ]

can defined a set of characters. This can also be used to defined a range. For example [aeiou] is and lower case vowel, [a-z] is an lower case letter from a thru z. [a-zA-Z] is any character upper or lower case.

If the ^ is specifed first it negates the class, thus [^a-z] matches anything but a lower case character.

\( \)

Allow sections to be formed to allow other operators to affect them. For example the * applies to the previous item thus to match zero more occurrences of somethign longer than a single character

(string-matches "helloworld" "hello\\(there\\)*world") => t
(string-matches "hellothereworld" "hello\\(there\\)*world") => t
(string-matches "hellotherethereworld" "hello\\(there\\)*world") => t

Note that you need two backslashes, one to escape the other backslashes

\|

Or operator. Allows choice of two alternatives

(string-matches "hellofishworld" "hello\\(fish\\|chips\\)world") => t
(string-matches "hellochipsworld" "hello\\(fish\\|chips\\)world") => t

Note that you need two backslashes, one to escape the other backslashes