The section contains a quick check list of the processes required to constructing a working diphone database. Each part is discussed in detail above.
Choose phoneset: Find an appropriate phoneset for the language, if possible using an existing standard. If you already have a good lexicon in the desired language, we recommend that you use that phone set.
Construct diphone list: Construct the diphone list with appropriate carrier words. Either using an existing list or generating one from the examples. Consider what allophones, consonant clusters, etc., you also wish to record.
Synthesize prompts: Synthesize prompts from an existing voice, if possible. Even when a few phones are missing from that voice it can still be useful to have the speaker listen to prompts as it keeps then focussed on minimal prosody and normalized vocal effort as well as reminding them what they need to say.
Record words: Record the words in the best possible conditions you can. Bad recordings can never be corrected later. Ideally, you would use an anechoic chamber with voice from close talking mike and larynograph channels.
Hand label/align phones: If you used prompts you can probably use the provided aligner to get a reasonable first pass at the phone labels. Alternatively, find a different aligner, or do it by hand.
Extract pitchmarks: Extract the pitchmarks from the recorded signal, either from the EGG signal, or by the more complicated approach of extracting them from the speech signal itself.
Build parameter files: If you don't have PSOLA, extract the LPC parameters and residuals from the speech signal, with power normalization if you feel its necessary.
Build database itself: Build the diphone index, correcting any obvious labeling errors then test the database itself. Running significant tests to correct any further labeling errors.
Test and check database: Systematically check the database by synthesizing the prompts again and synthesizing general text.