5.7. Making it better

The above walkthough is to give you a basic idea of the stages involved in building a limited domain synthesizer. The quality of a limited domain synthesizer will most likely be excellent in parts and very bad in others which is typical of techniques like this. Each stage is, of course, more complex than this and there are a number of things that can be done to improve it.

For limited domain synthesize it should be possible to correct the errors such that it is excellent always. To do so though requires being able to diagnose where the problems are. The most likely problems are listed here

  • Mis-labeling Due to lipsmacks, and other reasons the labeling may not be correct. The result may the wrong, extra or missing segments in the synthesized utterance. Using emulabel you can check and hand correct the labels.

  • Mis-spoken data The speaker may have made a mistake in the content. This can often happen even when the speaker is careful. Mistakes can be actual content (it is easy to read a list of number wrongly), but also hesitations and false starts can make the recording bad. Also note that inconsistent prosodic variation can also affect the synthesis quality. Re-recording can be considered for bad examples, or you can delete them from the etc/LDOM.data list, assuming there is enough variation in the rest of the examples to ensure proper coverage of the domain.

  • Bad pitchmarking Automatic pitchmarking is not really automatic. It is very worthwhile checking to see if it is correct and re-running the pitchmarking with better parameters until it is better. (We need better documentation here on how to know what "correct" is.)

  • Looking at the data There is never a substitute for actually looking at the data. Use emulabel to actually look at the recorded utterances and see what the labeling is. Ensure these match and files haven't got out of order. Look at a random selection not just the first example.

  • Improving the unit clustering The clustering techniques and the features used here are pretty generic and by no means optimal. Even for the simple example given here it is not very good. See Chapter 12 on unit selection for more discussion on this. Adding new features for use in cluster may help a lot.

The line between limited domain synthesis and unit selection is fuzzy. The more complex and varied the phrases you synthesize are, the more difficult it is to produce reliable synthesis.