Functions

Table of Contents
Functions for Generating Frames
Linear Prediction functions
sig2lpc()
lpc2cep()
sig2lpc()
sig2ref()
Area Functions
ref2truearea()
ref2area()
ref2logarea()
lpc2ref()
ref2lpc()
lpc2lsf()
lsf2lpc()
Energy and power frame functions
sig2pow()
sig2rms()
Fast Fourier Transform functions
slowFFT()
FFT()
slowIFFT()
IFFT()
power_spectrum()
power_spectrum_slow()
fastFFT()
Frame based filter bank and cepstral analysis
sig2fbank()
sig2fft()
fft2fbank()
fbank2melcep()
make_mel_triangular_filter()
Frequency conversion functions
Hz2Mel()
Mel2Hz()
Functions for Generating Tracks
Functions for use with frame based processing
sig2coef()
sigpr_base()
power()
energy()
fbank()
melcep()
Delta and Acceleration coefficents
delta()
sigpr_delta()
sigpr_acc()
Pitch/F0 Detection Algorithm functions
pda()
icda()
default_pda_options()
srpd()
smooth_phrase()
smooth_portion()
Pitchmarking Functions
pitchmark()
pitchmark()
neg_zero_cross_pick()
pm_fill()
pm_min_check()
Spectrogram generation
raw_spectrogram()
scale_spectrogram()
Functions for Windowing Frames of Waveforms
class EST_Window{}
Func;
Functions for making windows.
make_window()
make_window()
creator()
Performing windowing on a section of speech.
window_signal()
window_signal()
window_signal()
window_signal()
Utility window functions.
description()
options_supported()
options_short()
Filter funtions
FIR filters
FIRfilter()
FIRfilter()
FIR_double_filter()
FIRlowpass_filter()
FIRlowpass_filter()
FIRhighpass_filter()
FIRhighpass_filter()
FIRhighpass_double_filter()
FIRhighpass_double_filter()
FIRlowpass_double_filter()
FIRlowpass_double_filter()
Linear Prediction filters
lpc_filter()
inv_lpc_filter()
lpc_filter_1()
lpc_filter_fast()
inv_lpc_filter_ola()
Pre/Post Emphasis filters.
pre_emphasis()
pre_emphasis()
post_emphasis()
post_emphasis()
Miscelaneous filters.
Filter Design
design_FIR_filter()
design_lowpass_FIR_filter()
design_highpass_FIR_filter()

Functions for Generating Frames

The following set of functions perform either a signal processing operation on a single frame of speech to produce a set of coefficients, or a transformation on an existing set of coefficients to produce a new set. In most cases, the first argument to the function is the input, and the second is the output. It is assumed that any input speech frame has already been windowed with an appropriate windowing function (eg. Hamming) - see \Ref{Windowing mechanisms} on how to produce such a frame. See also .

It is also assumed that the output vector is of the correct size. No resizing is done in these functions as the incoming vectors may be subvectors of whole tracks etc. In many cases (eg. lpc analysis), an {\bf order} parameter is required. This is usually derived from the size of the input or output vectors, and hence is not passed explicitly.

Linear Prediction functions

Including, generation of coefficients from the signal, reflection coefficients, line spectral frequencies, areas.

sig2lpc()

void sig2lpc(const EST_FVector &sig, EST_FVector &acf, EST_FVector &ref, EST_FVector &lpc)

Produce the full set of linear prediction coefficients from a frame of speech waveform.

Parameters
sig

the frame of input waveform

acf

the autocorrelation coefficients

ref

the reflection coefficients

lpc

the LPC coefficients The order of the lpc analysis is given as the size of the <parameter> lpc <parameter> vector - 1. The coefficients are placed in the locations 1 - size, and the energy is placed in location 0.

lpc2cep()

void lpc2cep(const EST_FVector &lpc, EST_FVector &cep)

Calulate cepstral coefficients from lpc coefficients.

It is possible to calculate a set of cepstral coefficients from lpc coefficients using the relationship:

The order of the cepstral analysis can be different from the lpc order. If the cepstral order is greater, interpolation is used (FINISH add equation). Both orders are taken from the lengths of the respective vectors. Note that these cepstral coefficients take on the assumptions (and errors) of the lpc model and hence will not be the same as cepstral coefficients calculated using DFT functions.

Parameters
lpc

the LPC coefficients (input)

lpc

the cepstral coefficients (output)

sig2lpc()

void sig2lpc(const EST_FVector &sig, EST_FVector &lpc)

Produce a set linear prediction coefficients from a frame of speech waveform. {\tt sig} is the frame of input waveform, and {\tt lpc} are the LPC coefficients. The {\bf order} of the lpc analysis is given as the size of the {\tt lpc} vector -1. The coefficients are placed in the locations 1 - size, and the energy is placed in location 0.

sig2ref()

void sig2ref(const EST_FVector &sig, EST_FVector &ref)

Produce a set of reflection coefficients from a frame of speech waveform. {\tt sig} is the frame of input waveform, and {\tt ref} are the LPC coefficients. The {\bf order} of the lpc analysis is given as the size of the {\tt lpc} vector -1. The coefficients are placed in the locations 1 - size, and the energy is placed in location 0.

Area Functions

Using the analogy of the lossless tube, the cross-sectional areas of the sections of this tube are related to the reflection coefficients and can be calculated from the following relationship:

ref2truearea()
void ref2truearea(const EST_FVector &ref, EST_FVector &area)

The area according to the formula

ref2area()
void ref2area(const EST_FVector &ref, EST_FVector &area)

An approximation of the area is calculate by skipping the denominator in the formula

ref2logarea()
void ref2logarea(const EST_FVector &ref, EST_FVector &logarea)

The logs of the areas

lpc2ref()

void lpc2ref(const EST_FVector &lpc, EST_FVector &ref)

Calculate the reflection coefficients from the lpc coefficients. Note that in the standard linear prediction analysis, the reflection coefficients are generated as a by-product. @see sig2lpc

ref2lpc()

void ref2lpc(const EST_FVector &ref, EST_FVector &lpc)

Calculate the linear prediction coefficients from the reflection coefficients. Use the equation:

lpc2lsf()

void lpc2lsf(const EST_FVector &lpc, EST_FVector &lsf)

Calculate line spectral frequencies from linear prediction coefficients. Use the equation:

lsf2lpc()

void lsf2lpc(const EST_FVector &lsf, EST_FVector &lpc)

Calculate line spectral frequencies from linear prediction coefficients. Use the equation:

Energy and power frame functions

Table of Contents
sig2pow()
sig2rms()

sig2pow()

void sig2pow(EST_FVector &frame, float &power)

Calculate the power for a frame of speech. This is defined as

sig2rms()

void sig2rms(EST_FVector &frame, float &rms_energy)

Calculate the root mean square energy for a frame of speech. This is defined as

Fast Fourier Transform functions

These are the low level functions where the actual FFT is performed. Both slow and fast implementations are available for historical reasons. They have identical functionality. At this time, vectors of complex numbers are handled as pairs of vectors of real and imaginary numbers.

What is a Fourier Transform ?

The Fourier transform of a signal gives us a frequency-domain representation of a time-domain signal. In discrete time, the Fourier Transform is called a Discrete Fourier Transform (DFT) and is given by:

where is the DFT (of order ) of the signal , where are the n complex nth roots of 1.

The Fast Fourier Transform (FFT) is a very efficient implementation of a Discrete Fourier Transform. See, for example "Algorithms" by Thomas H. Cormen, Charles E. Leiserson and Ronald L. Rivest (pub. MIT Press), or any signal processing textbook.

slowFFT()

int slowFFT(EST_FVector &real, EST_FVector &imag)

Basic in-place FFT.

There's no point actually using this - use \Ref{fastFFT} instead. However, the code in this function closely matches the classic FORTRAN version given in many text books, so is at least easy to follow for new users.

The length of real and imag must be the same, and must be a power of 2 (e.g. 128).

FFT()

inline int FFT(EST_FVector &real, EST_FVector &imag)

Alternate name for slowFFT

slowIFFT()

int slowIFFT(EST_FVector &real, EST_FVector &imag)

Basic inverse in-place FFT int slowFFT

IFFT()

inline int IFFT(EST_FVector &real, EST_FVector &imag)

Alternate name for slowIFFT

power_spectrum()

int power_spectrum(EST_FVector &real, EST_FVector &imag)

Power spectrum using the fastFFT function. The power spectrum is simply the squared magnitude of the FFT. The result real and imaginary parts are both set equal to the power spectrum (you only need one of them !)

power_spectrum_slow()

int power_spectrum_slow(EST_FVector &real, EST_FVector &imag)

Power spectrum using the slowFFT function

fastFFT()

int fastFFT(EST_FVector &invec)

Fast FFT An optimised implementation by Tony Robinson to be used in preference to slowFFT

Frame based filter bank and cepstral analysis

These functions are \Ref{Frame based signal processing functions}.

sig2fbank()

void sig2fbank(const EST_FVector &sig, EST_FVector &fbank_frame, const float sample_rate, const bool use_power_rather_than_energy, const bool take_log)

Calculate the (log) energy (or power) in each channel of a Mel scale filter bank for a frame of speech. The filters are triangular, are evenly spaced and are all of equal width, on a Mel scale. The upper and lower cutoffs of each filter are at the centre frequencies of the adjacent filters. The Mel scale is described under {\tt Hz2Mel}.

sig2fft()

void sig2fft(const EST_FVector &sig, EST_FVector &fft_vec, const bool use_power_rather_than_energy)

Calculate the energy (or power) spectrum of a frame of speech. The FFT order is determined by the number of samples in the frame of speech, and is a power of 2. Note that the FFT vector returned corresponds to frequencies from 0 to half the sample rate. Energy is the magnitude of the FFT; power is the squared magnitude.

fft2fbank()

void fft2fbank(const EST_FVector &fft_frame, EST_FVector &fbank_vec, const float Hz_per_fft_coeff, const EST_FVector &mel_fbank_frequencies)

Given a Mel filter bank description, bin the FFT coefficients to compute the output of the filters. The first and last elements of {\tt mel_fbank_frequencies} define the lower and upper bound of the first and last filters respectively and the intervening elements give the filter centre frequencies. That is, {\tt mel_fbank_frequencies} has two more elements than {\tt fbank_vec}.

fbank2melcep()

void fbank2melcep(const EST_FVector &fbank_vec, EST_FVector &mfcc, const float liftering_parameter, const bool include_c0 = false)

Compute the dicrete cosine transform of log Mel-scale filter bank output to get the Mel cepstral coeffecients for a frame of speech. Optional liftering (filtering in the cepstral domain) can be applied to normalise the magnitudes of the coefficients. This is useful because, typically, the higher order cepstral coefficients are significantly smaller than the lower ones and it is often desirable to normalise the means and variances across coefficients.

The lifter (cepstral filter) used is:

A typical value of L used in speech recognition is 22. A value of L=0 is taken to mean no liftering. This is equivalent to L=1.

make_mel_triangular_filter()

void make_mel_triangular_filter(const float this_mel_centre, const float this_mel_low, const float this_mel_high, const float Hz_per_fft_coeff, const int half_fft_order, int &fft_index_start, EST_FVector &filter)

Make a triangular Mel scale filter. The filter is centred at {\tt this_mel_centre} and extends from {\tt this_mel_low} to {\tt this_mel_high}. {\tt half_fft_order} is the length of a power/energy spectrum covering 0Hz to half the sampling frequency with a resolution of {\tt Hz_per_fft_coeff}.

The routine returns a vector of weights to be applied to the energy/power spectrum starting at element {\tt fft_index_start}. The number of points (FFT coefficients) covered by the filter is given by the length of the returned vector {\tt filter}.

Frequency conversion functions

These are functions used in \Ref{Filter bank and cepstral analysis}.

Hz2Mel()
float Hz2Mel(float frequency_in_Hertz)

Convert Hertz to Mel. The Mel scale is defined by

Mel2Hz()
float Mel2Hz(float frequency_in_Mel)

Convert Mel to Hertz.

Functions for Generating Tracks

Functions which operate on a whole waveform and generate coefficients for a track.

Functions for use with frame based processing

Table of Contents
sig2coef()
sigpr_base()
power()
energy()
fbank()
melcep()

In the following functions, the input is a \Ref{EST_Wave} waveform, and the output is a (usually multi-channel) \Ref{EST_Track}. The track must be set up appropriately before hand. This means the track must be resized accordingly with the correct numbers of frame and channels.

The positions of the frames are found by examination of the {\bf time} array in the EST_Track, which must be filled prior to the function call. The usual requirement is for fixed frame analysis, where each analysis frame is, say, 10ms after the previous one.

A common alternative is to perform pitch-synchronous analysis where the time shift is related to the local pitch period.

sig2coef()

void sig2coef(EST_Wave &sig, EST_Track &a, EST_String type, float factor = 2.0, EST_WindowFunc *wf = EST_Window::creator(DEFAULT_WINDOW_NAME))

Produce a single set of coefficents from a waveform. The type of coefficient required is given in the argument type. Possible types are:

lpclinear predictive coding
cepcepstrum coding from lpc coefficients
melcepMel scale cepstrum coding via fbank
fbankMel scale log filterbank analysis
lsfline spectral frequencies
refLinear prediction reflection coefficients
power
f0srpd algorithm
energyroot mean square energy

The order of the analysis is calculated from the number of channels in fv. The positions of the analysis windows must be given by filling in the track's time array.

This function windows the waveform at the intervals given by the track time array. The length of each window is factorthe local time shift. The windowing function is giveb by wf.

Parameters
sig

input waveform

fv

output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.

type

the types of coefficients to be produced. "lpc", "cep" etc

factor

the frame length factor, i.e. the analysis frame length will be this times the local pitch period.

wf

function for windowing. See \Ref{Windowing mechanisms}

sigpr_base()

void sigpr_base(EST_Wave &sig, EST_Track &fv, EST_Features &op, const EST_StrList &slist)

Produce multiple coefficients from a waveform by repeated calls to sig2coef.

Parameters
sig

input waveform

fv

output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.

op

Features structure containing options for analysis order, frame shift etc.

slist

list of types of coefficients required, from the set of possible types that sig2coef can take.

power()

void power(EST_Wave &sig, EST_Track &a, float factor)

Calculate the power for each frame of the waveform.

Parameters
sig

input waveform

a

output power track

factor

the frame length factor, i.e. the analysis frame length will be this times the local pitch period.

energy()

void energy(EST_Wave &sig, EST_Track &a, float factor)

Calculate the rms energy for each frame of the waveform.

This function calls \Ref{sig2energy}

Parameters
sig

input waveform

a

output coefficients

factor

optional: the frame length factor, i.e. the analysis frame length will be this times the local pitch period.

fbank()

void fbank(EST_Wave &sig, EST_Track &fbank, const float factor, EST_WindowFunc *wf = EST_Window::creator(DEFAULT_WINDOW_NAME), const bool up = false, const bool take_log = true)

Mel scale filter bank analysis. The Mel scale triangular filters are computed via an FFT (see \Ref{fastFFT}). This routine is required for Mel cepstral analysis (see \Ref{melcep}). The analysis of each frame is done by \Ref{sig2fbank}.

A typical filter bank analysis for speech recognition might use log energy outputs from 20 filters.

Parameters
sig

input waveform

fbank

the output. The number of filters is determined from the number size of this track.

factor

the frame length factor, i.e. the analysis frame length will be this times the local pitch period

wf

function for windowing. See \Ref{Windowing mechanisms}

up

whether the filterbank analysis should use power rather than energy.

take_log

whether to take logs of the filter outputs

melcep()

void melcep(EST_Wave &sig, EST_Track &mfcc_track, float factor, int fbank_order, float liftering_parameter, EST_WindowFunc *wf = EST_Window::creator(DEFAULT_WINDOW_NAME), const bool include_c0 = false, const bool up = false)

Mel scale cepstral analysis via filter bank analysis. Cepstral parameters are computed for each frame of speech. The analysis requires \Ref{fbank}. The cepstral analysis of the filterbank outputs is performed by \Ref{fbank2melcep}.

A typical Mel cepstral coefficient (MFCC) analysis for speech recognition might use 12 cepstral coefficients computed from a 20 channel filterbank.

Parameters
sig

input: waveform

mfcc_track

the output

factor

the frame length factor, i.e. the analysis frame length will be this times the local pitch period

fbank_order

the number of Mel scale filters used for the analysis

liftering_parameter

for filtering in the cepstral domain See \Ref{fbank2melcep}

wf

function for windowing. See \Ref{Windowing mechanisms}

include_c0

whether the zero'th cepstral coefficient is to be included

up

whether the filterbank analysis should use power rather than energy.

Delta and Acceleration coefficents

Table of Contents
delta()
sigpr_delta()
sigpr_acc()

Produce delta and acceleration coefficents from a set of coefficients or the waveform.

delta()

void delta(EST_Track &tr, EST_Track &d, int regression_length = 3)

Produce a set of delta coefficents for a track

The delta function is used to produce a set of coefficients which estimate the rate of change of a set of parameters. The output track d must be setup before hand, i.e. it must have the same number of frames and channels as tr.

Parameters
tr

input track of base coefficients

d

output track of delta coefficients.

regression_length

number of previous frames on which delta estimation is calculated on.

sigpr_delta()

void sigpr_delta(EST_Wave &sig, EST_Track &fv, EST_Features &op, const EST_StrList &slist)

Produce multiple sets of delta coefficents from a waveform.

Calculate specified types of delta coefficients. This function is used when the base types of coefficients haven't been calculated. This function calls sig2coef to calculate the base types from which the deltas are calculated, and hence the requirements governing the setup of fv for sig2coef also hold here.

Parameters
sig

input waveform

fv

output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.

op

Features structure containing options for analysis order, frame shift etc.

slist

list of types of delta coefficients required.

sigpr_acc()

void sigpr_acc(EST_Wave &sig, EST_Track &fv, EST_Features &op, const EST_StrList &slist)

Produce multiple sets of acceleration coefficents from a waveform

Calculate specified types of acceleration coefficients. This function is used when the base types of coefficient haven't been calculated. This function calls sig2coef to calculate the base types from which the deltas are calculated, and hence the requirements governing the setup of fv for sig2coef also hold here.

Parameters
sig

input waveform

fv

output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.

op

Features structure containing options for analysis order, frame shift etc.

slist

list of types of acceleration coefficients required. The delta function is used to produce a set of coefficients which estimate the rate of change of a set of parameters.

Pitch/F0 Detection Algorithm functions

These functions are used to produce a track of fundamental frequency (F0) against time of a waveform.

pda()

void pda(EST_Wave &sig, EST_Track &fz, EST_Features &op, EST_String method="")

Top level pitch (F0) detection algorithm. Returns a track conatining evenly spaced frames of speech, each containing a F0 value for that point.

At present, only the \Rref{srpd} pitch tracker is implemented, so this is always called regardless of what method is set to.

Parameters
sig

input waveform

fz

output f0 contour

op

parameters for pitch tracker

method

pda method to be used.

icda()

void icda(EST_Wave &sig, EST_Track &fz, EST_Track &speech, EST_Option &op, EST_String method = "")

Top level intonation contour detection algorithm. Returns a track conatining evenly spaced frames of speech, each containing a F0 for that point. {\tt icda} differs from \Ref{pda} in that the contour is smoothed, and unvoiced portions have interpolated F0 values.

Parameters
sig

input waveform

fz

output f0 contour

speech

Interpolation is controlled by the <tt>speech</tt> track. When a point has a positive value in the speech track, it is a candidate for interpolation.

op

parameters for pitch tracker

method

pda method to be used.

default_pda_options()

void default_pda_options(EST_Features &al)

Create a set sensible defaults for use in pda and icda

srpd()

void srpd(EST_Wave &sig, EST_Track &fz, EST_Features &options)

Super resolution pitch trackerer.

srpd is a pitch detection algorithm that produces a fundamental frequency contour from a speech waveform. At present only the super resolution pitch detetmination algorithm is implemented. See (Medan, Yair, and Chazan, 1991) and (Bagshaw et al., 1993) for a detailed description of the algorithm.

Frames of data are read in from sig in chronological order such that each frame is shifted in time from its predecessor by pda_frame_shift. Each frame is analysed in turn.

The maximum and minimum signal amplitudes are initially found over the duration of two segments, each of length N_min samples. If the sum of their absolute values is below two times noise_floor, the frame is classified as representing silence and no coefficients are calculated. Otherwise, a cross correlation coefficient is calculated for all n from a period in samples corresponding to min_pitch to a period in samples corresponding to max_pitch, in steps of decimation_factor. In calculating the coefficient only one in decimation_factor samples of the two segments are used. Such down-sampling permits rapid estimates of the coefficients to be calculated over the range N_min <= n <= N_max. This results in a cross-correlation track for the frame being analysed.

Local maxima of the track with a coefficient value above a specified threshold form candidates for the fundamental period. The threshold is adaptive and dependent upon the values v2uv_coeff_thresh, min_v2uv_coef_thresh , and v2uv_coef_thresh_rati_ratio. If the previously analysed frame was classified as unvoiced or silent (which is the initial state) then the threshold is set to v2uv_coef_thresh. Otherwise, the previous frame was classified as being voiced, and the threshold is set equal to [\-r] v2uv_coef_thresh_rati_ratio times the cross-correlation coefficient value at the point of the previous fundamental period in the former coefficients track. This product is not permitted to drop below v2uv_coef_thresh.

If no candidates for the fundamental period are found, the frame is classified as being unvoiced. Otherwise, the candidates are further processed to identify the most likely true pitch period. During this additional processing, a threshold given by anti_doubling_thres is used.

If the peak_tracking flag is set to true, biasing is applied to the cross-correlation track as described in (Bagshaw et al., 1993).

Parameters
sig

input waveform

op

options regarding pitch tracking parameters

op.min_pitch

minimum permitted F0 value

op.max_pitch

maximum permitted F0 value

op.pda_frame_shift

analysis frame shift

op.pda_frame_length

analysis frame length

op.lpf_cutoff

cut off frequency for low pass filtering

op.lpf_order

order of low pass filtering (must be odd)

op.decimation

op.noise_floor

op.min_v2uv_coef_thresh

op.v2uv_coef_thresh_ratio

op.v2uv_coef_thresh

op.anti_doubling_thresh

op.peak_tracking

smooth_phrase()

void smooth_phrase(EST_Track &c, EST_Track &speech, EST_Features &options, EST_Track &sm)

Smooth selected parts of an f0 contour. Interpolation is controlled by the speech track. When a point has a positive value in the speech track, it is a candidate for interpolation.

smooth_portion()

void smooth_portion(EST_Track &c, EST_Option &op)

Smooth all the points in an F0 contour

Pitchmarking Functions

Pitchmarking involves finding some pre-defined pitch related instant for every pitch period in the speech. At present, only functions for analysing laryngograph waveforms are available - the much harder problem of doing this on actual speech has not been attempted.

pitchmark()

EST_Track pitchmark(EST_Wave &lx, EST_Features &op)

Find pitchmarks in Larynograph (lx) signal.

This high level function places a pitchmark on each positive peak in the voiced portions of the lx signal. Pitchmarks are stored in the time component of a EST_Track object and returned. The function works by high and low pass filtering the signal using forward and backward filtering to remove phase shift. The negative going points in the smoothed differentiated signal, corresponding to peaks in the original are then chosen.

Parameters
lx

laryngograph waveform

op

options, mainly for filter control: \begin{itemize} \item {\bf lx_low_frequency} low pass cut off for lx filtering : typical value {\tt 400} \item {\bf lx_low_order} order of low pass lx filter: typical value 19 \item {\bf lx_high_frequency} high pass cut off for lx filtering: typical value 40 \item {\bf lx_high_order} order of high pass lx filter: typical value 19 \item {\bf median_order} order of high pass lx filter: typical value 19 \end{itemize}

pitchmark()

EST_Track pitchmark(EST_Wave &lx, int lx_lf, int lx_lo, int lx_hf, int lx_ho, int mo, int debug = 0)

Find pitchmarks in Larynograph (lx) signal. The function is the same as \Ref{pitchmark} but with more explicit control over the parameters.

Parameters
lx

laryngograph waveform

lx_lf

low pass cut off for lx filtering : typical value 400

lx_fo

order of low pass lx filter : typical value 19

lx_hf

high pass cut off for lx filtering : typical value 40

lx_ho

: typical value 19

mo

order of median smoother used to smoother differentiated lx : typical value 19

neg_zero_cross_pick()

void neg_zero_cross_pick(EST_Wave &lx, EST_Track &pm)

Find times where waveform cross zero axis in negative direction.

Parameters
sig

waveform

pm

pitchmark track which stores time positions of negative crossings

pm_fill()

void pm_fill(EST_Track &pm, float new_end, float max, float min, float def)

Produce a set of sensible pitchmarks.

Given a set of raw pitchmarks, this function makes sure no pitch period is shorter that {\tt min} seconds and no longer than {\tt max} seconds. Periods that are too short are eliminated. If a peroid is too long, extra pitchmarks are inserted whose period is {\it approxiamtely} {\tt def} seconds in duration. The approximation is to ensure that the pitch period in the interval, D, is constant, and so the actual pitch period is given by

pm_min_check()

void pm_min_check(EST_Track &pm, float min)

Remove pitchmarks which are too close together.

This doesn't work in a particularly sophisticated way, in that it removes a sequence of too close pitchmarks left to right, and doesn't attempt to find which ones in the sequence are actually spurious.

Spectrogram generation

raw_spectrogram()

void raw_spectrogram(EST_Track &sp, EST_Wave &sig, float length, float shift, int order, bool slow=0)

Compute the power-spectrogram

scale_spectrogram()

void scale_spectrogram(EST_Track &s, float range, float b, float w)

Manipulate the spectrogram to

Functions for Windowing Frames of Waveforms

The EST_Window class provides functions for the creation and use of signal processing windows.

Signal processing algorithms often work by on small sections of the speech waveform known as {\em frames}. A full signal must first be divided into these frames before these algroithms can work. While it would be simple to just "cut out" the required frames from the waveforms, this is usually undesirable as large discontinuities can occur at the frame edges. Instead it is customary to cut out the frame by means of a \{em window} function, which tapers the signal in the frame so that it has high values in the middle and low or zero values near the frame edges. The \Ref{EST_Window} class provides a wrap around for such windowing operations.

There are several types of window function, including:

\begin{itemize}

\item {\bf Rectangular}, which is used to give a simple copy of the the values between the window limits.

\item {\bf Hanning}. The rectangular window can cause sharp discontinuities at window edges. The hanning window solves this by ensuring that the window edges taper to 0.

\item {\bf Hamming.} The hanning window causes considerable energy loss, which the hamming window attempts to rectify.

\end{itemize}

The particular choice of window depends on the application. For instance in most speech synthesis applications Hanning windows are the most suitable as they don't have time domain discontinuities. For analysis applications hamming windows are normally used.

For example code, see \Ref{Windowing}

Func;

typedef EST_WindowFunc Func

A function which creates a window

Functions for making windows.

make_window()

static void make_window(EST_TBuffer<float> &window_vals, int size, const char *name)

Make a Buffer of containing a window function of specified type

make_window()

static void make_window(EST_FVector &window_vals, int size, const char *name)

Make a EST_FVector containing a window function of specified type

creator()

static Func* creator(const char *name, bool report_error = false)

Return the creation function for the given window type.

Performing windowing on a section of speech.

window_signal()

static void window_signal(const EST_Wave &sig, EST_WindowFunc *make_window, int start, int size, EST_TBuffer<float> &frame)

Window the waveform {\tt sig} starting at point {\tt start} for a duration of {\tt size} samples. The windowing function required is given as a function pointer {\tt *make_window} which has already been created by a function such as \Ref{creator}. The output windowed frame is placed in the buffer {\tt frame} which will have been resized accordingly within the function.

window_signal()

static void window_signal(const EST_Wave &sig, EST_WindowFunc *make_window, int start, int size, EST_FVector &frame, int resize=0)

Window the waveform {\tt sig} starting at point {\tt start} for a duration of {\tt size} samples. The windowing function required is given as a function pointer {\tt *make_window} which has already been created by a function such as \Ref{creator}. The output windowed frame is placed in the EST_FVector {\tt frame}. By default, it is assumed that this is already the correct size (i.e. {\tt size} samples long), but if resizing is required the last argument should be set to 1.

window_signal()

static void window_signal(const EST_Wave &sig, const EST_String &window_name, int start, int size, EST_FVector &frame, int resize=0)

Window the waveform {\tt sig} starting at point {\tt start} for a duration of {\tt size} samples. The windowing function required is given as a string: this function will make a temporary window of this type. The output windowed frame is placed in the EST_FVector {\tt frame}. By default, it is assumed that this is already the correct size (i.e. {\tt size} samples long), but if resizing is required the last argument should be set to 1.

window_signal()

static void window_signal(const EST_Wave &sig, EST_TBuffer<float> &window_vals, int start, int size, EST_FVector &frame, int resize=0)

Window the waveform {\tt sig} starting at point {\tt start} for a duration of {\tt size} samples. The window shape required is given as an array of floats. The output windowed frame is placed in the EST_FVector {\tt frame}. By default, it is assumed that this is already the correct size (i.e. {\tt size} samples long), but if resizing is required the last argument should be set to 1.

Utility window functions.

description()

static EST_String description(const char *name)

Return the description for a given window type.

options_supported()

static EST_String options_supported(void)

Return a paragraph describing the available windows.

options_short()

static EST_String options_short(void)

Return a comma separated list of the available window types.

Filter funtions

A filter modifies a waveform by changing its frequency characteristics. The following types of filter are currently supported:

FIR filters

FIR filters are general purpose finite impulse response filters which are useful for band-pass, low-pass and high-pass filtering.

Linear Prediction filters

are used to produce LP residuals from waveforms and vice versa

Pre Emphasis filters

are simple filters for changing the spectral tilt of a signal

Non linear filters

Miscelaneous filters

FIR filters

Finite impulse response (FIR) filters which are useful for band-pass, low-pass and high-pass filtering.

FIR filters perform the following operation:

where is the filter order, are the filter coefficients, is the input at time and is the output at time . Functions are provided for designing the filter (i.e. finding the coefficients).

FIRfilter()

void FIRfilter(EST_Wave &in_sig, const EST_FVector &numerator, int delay_correction=0)

General purpose FIR filter. This function will filter the waveform {\tt sig} with a previously designed filter, given as {\tt numerator}. The filter coefficients can be designed using one of the designed functions, e.g. \Ref{design_FIR_filter}.

FIRfilter()

void FIRfilter(const EST_Wave &in_sig, EST_Wave &out_sig, const EST_FVector &numerator, int delay_correction=0)

General purpose FIR filter. This function will filter the waveform {\tt sig} with a previously designed filter, given as {\tt numerator}. The filter coefficients can be designed using one of the designed functions, e.g. \Ref{design_FIR_filter} .

FIR_double_filter()

void FIR_double_filter(EST_Wave &in_sig, EST_Wave &out_sig, const EST_FVector &numerator)

General purpose FIR double (zero-phase) filter. This function will double filter the waveform {\tt sig} with a previously designed filter, given as {\tt numerator}. The filter coefficients can be designed using one of the designed functions, e.g. \Ref{design_FIR_filter}. Double filtering is performed by filtering the signal normally, resversing the waveform, filtering again and reversing the waveform again. Normal filtering will impose a lag on the signal depending on the order of the filter. By filtering the signal forwards and backwards, the lags cancel each other out and the output signal is in phase with the input signal.

FIRlowpass_filter()

void FIRlowpass_filter(EST_Wave &sigin, int freq, int order=DEFAULT_FILTER_ORDER)

Quick function for one-off low pass filtering. If repeated lowpass filtering is needed, first design the required filter using \Ref{design_lowpass_filter}, and then use \Ref{FIRfilter} to do the actual filtering.

Parameters
in_sig

input waveform, which will be overwritten

freq

order

number of filter coefficients, eg. 99

FIRlowpass_filter()

void FIRlowpass_filter(const EST_Wave &in_sig, EST_Wave &out_sig, int freq, int order=DEFAULT_FILTER_ORDER)

Quick function for one-off low pass filtering. If repeated lowpass filtering is needed, first design the required filter using \Ref{design_lowpass_filter}, and then use \Ref{FIRfilter} to do the actual filtering.

Parameters
in_sig

input waveform

out_sig

output waveform

freq

cutoff frequency in Hertz

order

number of filter coefficients , e.g. 99

FIRhighpass_filter()

void FIRhighpass_filter(EST_Wave &in_sig, int freq, int order)

Quick function for one-off high pass filtering. If repeated lowpass filtering is needed, first design the required filter using design_lowpass_filter, and then use FIRfilter to do the actual filtering.

Parameters
in_sig

input waveform, which will be overwritten

freq

cutoff frequency in Hertz

order

number of filter coefficients, eg. 99

FIRhighpass_filter()

void FIRhighpass_filter(const EST_Wave &sigin, EST_Wave &out_sig, int freq, int order=DEFAULT_FILTER_ORDER)

Quick function for one-off high pass filtering. If repeated highpass filtering is needed, first design the required filter using design_highpass_filter, and then use FIRfilter to do the actual filtering.

Parameters
in_sig

input waveform

out_sig

output waveform

freq

cutoff frequency in Hertz

order

number of filter coefficients, eg. 99

FIRhighpass_double_filter()

void FIRhighpass_double_filter(EST_Wave &sigin, int freq, int order=DEFAULT_FILTER_ORDER)

Quick function for one-off double low pass filtering.

Normal low pass filtering (\Ref{FIRlowpass_filter}) introduces a time delay. This function filters the signal twice, first forward and then backwards, which ensures a zero phase lag. Hence the order parameter need only be half what it is for (\Ref{FIRlowpass_filter} to achieve the same effect.

Parameters
in_sig

input waveform, which will be overwritten

freq

cutoff frequency in Hertz

order

number of filter coefficients, eg. 99

FIRhighpass_double_filter()

void FIRhighpass_double_filter(const EST_Wave &int_sig, EST_Wave &out_sig, int freq, int order=DEFAULT_FILTER_ORDER)

Quick function for one-off double low pass filtering.

Normal low pass filtering (\Ref{FIRlowpass_filter}) introduces a time delay. This function filters the signal twice, first forward and then backwards, which ensures a zero phase lag. Hence the order parameter need only be half what it is for (\Ref{FIRlowpass_filter} to achieve the same effect.

Parameters
in_sig

input waveform

out_sig

output waveform

freq

cutoff frequency in Hertz

order

number of filter coefficients, eg. 99

FIRlowpass_double_filter()

void FIRlowpass_double_filter(EST_Wave &sigin, int freq, int order=DEFAULT_FILTER_ORDER)

Quick function for one-off zero phase high pass filtering.

Normal high pass filtering (\Ref{FIRhighpass_filter}) introduces a time delay. This function filters the signal twice, first forward and then backwards, which ensures a zero phase lag. Hence the order parameter need only be half what it is for (\Ref{FIRhighpass_filter} to achieve the same effect.

Parameters
in_sig

input waveform, which will be overwritten

freq

cutoff frequency in Hertz

order

number of filter coefficients, eg. 99

FIRlowpass_double_filter()

void FIRlowpass_double_filter(const EST_Wave &in_sig, EST_Wave &out_sig, int freq, int order=DEFAULT_FILTER_ORDER)

Quick function for one-off zero phase high pass filtering.

Normal high pass filtering (\Ref{FIRhighpass_filter}) introduces a time delay. This function filters the signal twice, first forward and then backwards, which ensures a zero phase lag. Hence the order parameter need only be half what it is for (\Ref{FIRhighpass_filter} to achieve the same effect.

Parameters
in_sig

input waveform

out_sig

output waveform

freq

cutoff frequency in Hertz

order

number of filter coefficients, eg. 99

Linear Prediction filters

The linear prediction filters are used for the analysis and synthesis of waveforms according the to linear prediction all-pole model.

The linear prediction states that the value of a signal at a given point is equal to a weighted sum of the previous P values, plus a correction value for that point:

Given a set of coefficents and the original signal, we can use this equation to work out e, the {\it residual}. Conversely given the coefficients and the residual signal, an estimation of the original signal can be calculated.

If a single set of coefficients were used for the entire waveform, the filtering process would be simple. It is usual however to have a different set of coefficients for every frame, and there are many possible ways to switch from one coefficient set to another so as not to cause discontinuities at the frame boundaries.

lpc_filter()

void lpc_filter(EST_Wave &sig, EST_FVector &a, EST_Wave &res)

Synthesize a signal from a single set of linear prediction coefficients and the residual values.

Parameters
sig

the waveform to be synthesized

a

a single set of LP coefficients

res

the input residual waveform

inv_lpc_filter()

void inv_lpc_filter(EST_Wave &sig, EST_FVector &a, EST_Wave &res)

Filter the waveform using a single set of coefficients so as to produce a residual signal.

Parameters
sig

the speech waveform to be filtered

a

a single set of LP coefficients

res

the output residual waveform

lpc_filter_1()

void lpc_filter_1(EST_Track &lpc, EST_Wave & res, EST_Wave &sig)

Synthesize a signal from a track of linear prediction coefficients. This function takes a set of LP frames and a residual and produces a synthesized signal.

For each frame, the function picks an end point, which is half-way between the current frame's time position and the next frame's. A start point is defined as being the previous frame's end. Using these two values, a portion of residual is extracted and passed to \Ref{lpc_filter} along with the LP coefficients for that frame. This function writes directly into the signal for the values between start and end;

Parameters
sig

the waveform to be synthesized

lpc

a track of time positioned LP coefficients

res

the input residual waveform

lpc_filter_fast()

void lpc_filter_fast(EST_Track &lpc, EST_Wave & res, EST_Wave &sig)

Synthesize a signal from a track of linear prediction coefficients. This function takes a set of LP frames and a residual and produces a synthesized signal.

This is functionally equivalent to \Ref{lpc_filter_1} except it resduces the residual by 0.5 before filtering. Importantly it is about three times faster than \Ref{lpc_filter_1} but in doing so uses direct C buffers rather than the neat C++ access function. This function should be regarded as temporary and will be deleted after we restructure the low level classes to give better access.

Parameters
sig

the waveform to be synthesized

lpc

a track of time positioned LP coefficients

res

the input residual waveform

inv_lpc_filter_ola()

void inv_lpc_filter_ola(EST_Wave &sig, EST_Track &lpc, EST_Wave &res)

Produce a residual from a track of linear prediction coefficients and a signal using an overlap add technique.

For each frame, the function estimates the local pitch period and picks a start point one period before the current time position and an end point one period after it.

A portion of residual corresponding to these times is then produced using \Ref{inv_lpc_filter}. The resultant section of residual is then overlap-added into the main residual wave object.

Parameters
sig

the speech waveform to be filtered

lpc

a track of time positioned LP coefficients

res

the output residual waveform

Pre/Post Emphasis filters.

These functions adjust the spectral tilt of the input waveform.

pre_emphasis()

void pre_emphasis(EST_Wave &sig, float a=DEFAULT_PRE_EMPH_FACTOR)

Pre-emphasis filtering. This performs simple high pass filtering with a one tap filter of value {\tt a}. Normal values of a range between 0.95 and 0.99.

pre_emphasis()

void pre_emphasis(EST_Wave &sig, EST_Wave &out, float a=DEFAULT_PRE_EMPH_FACTOR)

Pre-emphasis filtering. This performs simple high pass filtering with a one tap filter of value {\tt a}. Normal values of a range between 0.95 and 0.99.

post_emphasis()

void post_emphasis(EST_Wave &sig, float a=DEFAULT_PRE_EMPH_FACTOR)

Post-emphasis filtering. This performs simple low pass filtering with a one tap filter of value a. Normal values of a range between 0.95 and 0.99. The same values of {\tt a} should be used when pre- and post-emphasizing the same signal.

post_emphasis()

void post_emphasis(EST_Wave &sig, EST_Wave &out, float a=DEFAULT_PRE_EMPH_FACTOR)

Post-emphasis filtering. This performs simple low pass filtering with a one tap filter of value a. Normal values of a range between 0.95 and 0.99. The same values of {\tt a} should be used when pre- and post-emphasizing the same signal.

Miscelaneous filters.

Some of these filters are non-linear and therefore don't fit the normal paradigm.

simple_mean_smooth()

void simple_mean_smooth(EST_Wave &c, int n)

Filters the waveform by means of median smoothing.

This is a sort of low pass filter which aims to remove extreme values. Median smoothing works examining each sample in the wave, taking all the values in a window of size {\tt n} around that sample, sorting them and replacing that sample with the middle ranking sample in the sorted samples.

Parameters
sig

waveform to be filtered

n

size of smoothing window

Filter Design

FIR Filtering is a 2 stage process, first involving design and then the filtering itself. As the design is somewhat costly, it is usually desirable to design a filter outside the main loop.

For one off filtering operations, functions are provided which design and filter the waveform in a single go.

It is impossible to design an ideal filter, i.e. one which exactly obeys the desired frequency response. The "quality" of a filter is given by the order parameter, with high values indicating good approximations to desired responses. High orders are slower. The default is 199 which gives a pretty good filter, but a value as low as 19 is still usable if speech is important.

design_FIR_filter()

EST_FVector design_FIR_filter(const EST_FVector &freq_response, int filter_order)

Create an artibtrary filter or order {\tt order} that attempts to give the frequecny response given by {\tt freq_response}. The vector {\tt freq_response} should be any size 2**N and contain a plot of the desired frequency response with values ranging between 0.0 and 1.0. The actual filtering is done by \Ref{FIRfilter}.

design_lowpass_FIR_filter()

EST_FVector design_lowpass_FIR_filter(int sample_rate, int freq, int order)

Design a FIR lowpass filter of order {\tt order} and cut-off freqeuncy {\tt freq}. The filter coefficients are returned in the FVector and should be used in conjunction with \Ref{FIRfilter}.

design_highpass_FIR_filter()

EST_FVector design_highpass_FIR_filter(int sample_rate, int freq, int order)

Design a FIR highpass filter of order {\tt order} and cut-off freqeuncy {\tt freq}. The filter coefficients are returned in the FVector and should be used in conjunction with \Ref{FIRfilter}