The following set of functions perform either a signal processing operation on a single frame of speech to produce a set of coefficients, or a transformation on an existing set of coefficients to produce a new set. In most cases, the first argument to the function is the input, and the second is the output. It is assumed that any input speech frame has already been windowed with an appropriate windowing function (eg. Hamming) - see \Ref{Windowing mechanisms} on how to produce such a frame. See also .

It is also assumed that the output vector is of the correct size. No resizing is done in these functions as the incoming vectors may be subvectors of whole tracks etc. In many cases (eg. lpc analysis), an {\bf order} parameter is required. This is usually derived from the size of the input or output vectors, and hence is not passed explicitly.

Linear Prediction functions

Table of Contents
sig2lpc()
lpc2cep()
sig2lpc()
sig2ref()
Area Functions
ref2truearea()
ref2area()
ref2logarea()
lpc2ref()
ref2lpc()
lpc2lsf()
lsf2lpc()
Including, generation of coefficients from the signal, reflection coefficients, line spectral frequencies, areas.
sig2lpc()
void sig2lpc ( const EST_FVector &sig, EST_FVector &acf, EST_FVector &ref, EST_FVector &lpc)
Produce the full set of linear prediction coefficients from a frame of speech waveform.
Parameters
sig
the frame of input waveform
acf
the autocorrelation coefficients
ref
the reflection coefficients
lpc
the LPC coefficients The order of the lpc analysis is given as the size of the <parameter> lpc <parameter> vector - 1. The coefficients are placed in the locations 1 - size, and the energy is placed in location 0.
lpc2cep()
void lpc2cep ( const EST_FVector &lpc, EST_FVector &cep)
Calulate cepstral coefficients from lpc coefficients.
It is possible to calculate a set of cepstral coefficients from lpc coefficients using the relationship:
The order of the cepstral analysis can be different from the lpc order. If the cepstral order is greater, interpolation is used (FINISH add equation). Both orders are taken from the lengths of the respective vectors. Note that these cepstral coefficients take on the assumptions (and errors) of the lpc model and hence will not be the same as cepstral coefficients calculated using DFT functions.
Parameters
lpc
the LPC coefficients (input)
lpc
the cepstral coefficients (output)
sig2lpc()
void sig2lpc ( const EST_FVector &sig, EST_FVector &lpc)
Produce a set linear prediction coefficients from a frame of speech waveform. {\tt sig} is the frame of input waveform, and {\tt lpc} are the LPC coefficients. The {\bf order} of the lpc analysis is given as the size of the {\tt lpc} vector -1. The coefficients are placed in the locations 1 - size, and the energy is placed in location 0.
sig2ref()
void sig2ref ( const EST_FVector &sig, EST_FVector &ref)
Produce a set of reflection coefficients from a frame of speech waveform. {\tt sig} is the frame of input waveform, and {\tt ref} are the LPC coefficients. The {\bf order} of the lpc analysis is given as the size of the {\tt lpc} vector -1. The coefficients are placed in the locations 1 - size, and the energy is placed in location 0.
Area Functions
Using the analogy of the lossless tube, the cross-sectional areas of the sections of this tube are related to the reflection coefficients and can be calculated from the following relationship:
ref2truearea()
void ref2truearea ( const EST_FVector &ref, EST_FVector &area)
The area according to the formula
ref2area()
void ref2area ( const EST_FVector &ref, EST_FVector &area)
An approximation of the area is calculate by skipping the denominator in the formula
ref2logarea()
void ref2logarea ( const EST_FVector &ref, EST_FVector &logarea)
The logs of the areas
lpc2ref()
void lpc2ref ( const EST_FVector &lpc, EST_FVector &ref)
Calculate the reflection coefficients from the lpc coefficients. Note that in the standard linear prediction analysis, the reflection coefficients are generated as a by-product. @see sig2lpc
ref2lpc()
void ref2lpc ( const EST_FVector &ref, EST_FVector &lpc)
Calculate the linear prediction coefficients from the reflection coefficients. Use the equation:
lpc2lsf()
void lpc2lsf ( const EST_FVector &lpc, EST_FVector &lsf)
Calculate line spectral frequencies from linear prediction coefficients. Use the equation:
lsf2lpc()
void lsf2lpc ( const EST_FVector &lsf, EST_FVector &lpc)
Calculate line spectral frequencies from linear prediction coefficients. Use the equation:

Parameters
sig	the frame of input waveform
acf	the autocorrelation coefficients
ref	the reflection coefficients
lpc	the LPC coefficients The order of the lpc analysis is given as the size of the <parameter> lpc <parameter> vector - 1. The coefficients are placed in the locations 1 - size, and the energy is placed in location 0.

Parameters
lpc	the LPC coefficients (input)
lpc	the cepstral coefficients (output)

Energy and power frame functions

Table of Contents
sig2pow()
sig2rms()
sig2pow()
void sig2pow ( EST_FVector &frame, float &power)
Calculate the power for a frame of speech. This is defined as
sig2rms()
void sig2rms ( EST_FVector &frame, float &rms_energy)
Calculate the root mean square energy for a frame of speech. This is defined as

Fast Fourier Transform functions

Table of Contents
slowFFT()
FFT()
slowIFFT()
IFFT()
power_spectrum()
power_spectrum_slow()
fastFFT()
These are the low level functions where the actual FFT is performed. Both slow and fast implementations are available for historical reasons. They have identical functionality. At this time, vectors of complex numbers are handled as pairs of vectors of real and imaginary numbers.
What is a Fourier Transform ?
The Fourier transform of a signal gives us a frequency-domain representation of a time-domain signal. In discrete time, the Fourier Transform is called a Discrete Fourier Transform (DFT) and is given by:
where is the DFT (of order ) of the signal , where are the n complex nth roots of 1.
The Fast Fourier Transform (FFT) is a very efficient implementation of a Discrete Fourier Transform. See, for example "Algorithms" by Thomas H. Cormen, Charles E. Leiserson and Ronald L. Rivest (pub. MIT Press), or any signal processing textbook.
slowFFT()
int slowFFT ( EST_FVector &real, EST_FVector &imag)
Basic in-place FFT.
There's no point actually using this - use \Ref{fastFFT} instead. However, the code in this function closely matches the classic FORTRAN version given in many text books, so is at least easy to follow for new users.
The length of real and imag must be the same, and must be a power of 2 (e.g. 128).
FFT()
inline int FFT ( EST_FVector &real, EST_FVector &imag)
Alternate name for slowFFT
slowIFFT()
int slowIFFT ( EST_FVector &real, EST_FVector &imag)
Basic inverse in-place FFT int slowFFT
IFFT()
inline int IFFT ( EST_FVector &real, EST_FVector &imag)
Alternate name for slowIFFT
power_spectrum()
int power_spectrum ( EST_FVector &real, EST_FVector &imag)
Power spectrum using the fastFFT function. The power spectrum is simply the squared magnitude of the FFT. The result real and imaginary parts are both set equal to the power spectrum (you only need one of them !)
power_spectrum_slow()
int power_spectrum_slow ( EST_FVector &real, EST_FVector &imag)
Power spectrum using the slowFFT function
fastFFT()
int fastFFT ( EST_FVector &invec)
Fast FFT An optimised implementation by Tony Robinson to be used in preference to slowFFT

Frame based filter bank and cepstral analysis

Table of Contents
sig2fbank()
sig2fft()
fft2fbank()
fbank2melcep()
make_mel_triangular_filter()
Frequency conversion functions
Hz2Mel()
Mel2Hz()
These functions are \Ref{Frame based signal processing functions}.
sig2fbank()
void sig2fbank ( const EST_FVector &sig, EST_FVector &fbank_frame, const float sample_rate, const bool use_power_rather_than_energy, const bool take_log)
Calculate the (log) energy (or power) in each channel of a Mel scale filter bank for a frame of speech. The filters are triangular, are evenly spaced and are all of equal width, on a Mel scale. The upper and lower cutoffs of each filter are at the centre frequencies of the adjacent filters. The Mel scale is described under {\tt Hz2Mel}.
sig2fft()
void sig2fft ( const EST_FVector &sig, EST_FVector &fft_vec, const bool use_power_rather_than_energy)
Calculate the energy (or power) spectrum of a frame of speech. The FFT order is determined by the number of samples in the frame of speech, and is a power of 2. Note that the FFT vector returned corresponds to frequencies from 0 to half the sample rate. Energy is the magnitude of the FFT; power is the squared magnitude.
fft2fbank()
void fft2fbank ( const EST_FVector &fft_frame, EST_FVector &fbank_vec, const float Hz_per_fft_coeff, const EST_FVector &mel_fbank_frequencies)
Given a Mel filter bank description, bin the FFT coefficients to compute the output of the filters. The first and last elements of {\tt mel_fbank_frequencies} define the lower and upper bound of the first and last filters respectively and the intervening elements give the filter centre frequencies. That is, {\tt mel_fbank_frequencies} has two more elements than {\tt fbank_vec}.
fbank2melcep()
void fbank2melcep ( const EST_FVector &fbank_vec, EST_FVector &mfcc, const float liftering_parameter, const bool include_c0 = false)
Compute the dicrete cosine transform of log Mel-scale filter bank output to get the Mel cepstral coeffecients for a frame of speech. Optional liftering (filtering in the cepstral domain) can be applied to normalise the magnitudes of the coefficients. This is useful because, typically, the higher order cepstral coefficients are significantly smaller than the lower ones and it is often desirable to normalise the means and variances across coefficients.
The lifter (cepstral filter) used is:

A typical value of L used in speech recognition is 22. A value of L=0 is taken to mean no liftering. This is equivalent to L=1.
make_mel_triangular_filter()
void make_mel_triangular_filter ( const float this_mel_centre, const float this_mel_low, const float this_mel_high, const float Hz_per_fft_coeff, const int half_fft_order, int &fft_index_start, EST_FVector &filter)
Make a triangular Mel scale filter. The filter is centred at {\tt this_mel_centre} and extends from {\tt this_mel_low} to {\tt this_mel_high}. {\tt half_fft_order} is the length of a power/energy spectrum covering 0Hz to half the sampling frequency with a resolution of {\tt Hz_per_fft_coeff}.
The routine returns a vector of weights to be applied to the energy/power spectrum starting at element {\tt fft_index_start}. The number of points (FFT coefficients) covered by the filter is given by the length of the returned vector {\tt filter}.
Frequency conversion functions
These are functions used in \Ref{Filter bank and cepstral analysis}.
Hz2Mel()
float Hz2Mel ( float frequency_in_Hertz)
Convert Hertz to Mel. The Mel scale is defined by

Mel2Hz()
float Mel2Hz ( float frequency_in_Mel)
Convert Mel to Hertz.

Functions for Generating Tracks

Functions which operate on a whole waveform and generate coefficients for a track.

Functions for use with frame based processing

Table of Contents
sig2coef()
sigpr_base()
power()
energy()
fbank()
melcep()
In the following functions, the input is a \Ref{EST_Wave} waveform, and the output is a (usually multi-channel) \Ref{EST_Track}. The track must be set up appropriately before hand. This means the track must be resized accordingly with the correct numbers of frame and channels.
The positions of the frames are found by examination of the {\bf time} array in the EST_Track, which must be filled prior to the function call. The usual requirement is for fixed frame analysis, where each analysis frame is, say, 10ms after the previous one.
A common alternative is to perform pitch-synchronous analysis where the time shift is related to the local pitch period.
sig2coef()
void sig2coef ( EST_Wave &sig, EST_Track &a, EST_String type, float factor = 2.0, EST_WindowFunc *wf = EST_Window::creator(DEFAULT_WINDOW_NAME))
Produce a single set of coefficents from a waveform. The type of coefficient required is given in the argument type. Possible types are:
lpc linear predictive coding
cep cepstrum coding from lpc coefficients
melcep Mel scale cepstrum coding via fbank
fbank Mel scale log filterbank analysis
lsf line spectral frequencies
ref Linear prediction reflection coefficients
power
f0 srpd algorithm
energy root mean square energy
The order of the analysis is calculated from the number of channels in fv. The positions of the analysis windows must be given by filling in the track's time array.
This function windows the waveform at the intervals given by the track time array. The length of each window is factorthe local time shift. The windowing function is giveb by wf.
Parameters
sig
input waveform
fv
output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.
type
the types of coefficients to be produced. "lpc", "cep" etc
factor
the frame length factor, i.e. the analysis frame length will be this times the local pitch period.
wf
function for windowing. See \Ref{Windowing mechanisms}
sigpr_base()
void sigpr_base ( EST_Wave &sig, EST_Track &fv, EST_Features &op, const EST_StrList &slist)
Produce multiple coefficients from a waveform by repeated calls to sig2coef.
Parameters
sig
input waveform
fv
output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.
op
Features structure containing options for analysis order, frame shift etc.
slist
list of types of coefficients required, from the set of possible types that sig2coef can take.
power()
void power ( EST_Wave &sig, EST_Track &a, float factor)
Calculate the power for each frame of the waveform.
Parameters
sig
input waveform
a
output power track
factor
the frame length factor, i.e. the analysis frame length will be this times the local pitch period.
energy()
void energy ( EST_Wave &sig, EST_Track &a, float factor)
Calculate the rms energy for each frame of the waveform.
This function calls \Ref{sig2energy}
Parameters
sig
input waveform
a
output coefficients
factor
optional: the frame length factor, i.e. the analysis frame length will be this times the local pitch period.
fbank()
void fbank ( EST_Wave &sig, EST_Track &fbank, const float factor, EST_WindowFunc *wf = EST_Window::creator(DEFAULT_WINDOW_NAME), const bool up = false, const bool take_log = true)
Mel scale filter bank analysis. The Mel scale triangular filters are computed via an FFT (see \Ref{fastFFT}). This routine is required for Mel cepstral analysis (see \Ref{melcep}). The analysis of each frame is done by \Ref{sig2fbank}.
A typical filter bank analysis for speech recognition might use log energy outputs from 20 filters.
Parameters
sig
input waveform
fbank
the output. The number of filters is determined from the number size of this track.
factor
the frame length factor, i.e. the analysis frame length will be this times the local pitch period
wf
function for windowing. See \Ref{Windowing mechanisms}
up
whether the filterbank analysis should use power rather than energy.
take_log
whether to take logs of the filter outputs
melcep()
void melcep ( EST_Wave &sig, EST_Track &mfcc_track, float factor, int fbank_order, float liftering_parameter, EST_WindowFunc *wf = EST_Window::creator(DEFAULT_WINDOW_NAME), const bool include_c0 = false, const bool up = false)
Mel scale cepstral analysis via filter bank analysis. Cepstral parameters are computed for each frame of speech. The analysis requires \Ref{fbank}. The cepstral analysis of the filterbank outputs is performed by \Ref{fbank2melcep}.
A typical Mel cepstral coefficient (MFCC) analysis for speech recognition might use 12 cepstral coefficients computed from a 20 channel filterbank.
Parameters
sig
input: waveform
mfcc_track
the output
factor
the frame length factor, i.e. the analysis frame length will be this times the local pitch period
fbank_order
the number of Mel scale filters used for the analysis
liftering_parameter
for filtering in the cepstral domain See \Ref{fbank2melcep}
wf
function for windowing. See \Ref{Windowing mechanisms}
include_c0
whether the zero'th cepstral coefficient is to be included
up
whether the filterbank analysis should use power rather than energy.


lpc	linear predictive coding
cep	cepstrum coding from lpc coefficients
melcep	Mel scale cepstrum coding via fbank
fbank	Mel scale log filterbank analysis
lsf	line spectral frequencies
ref	Linear prediction reflection coefficients
power
f0	srpd algorithm
energy	root mean square energy

Parameters
sig	input waveform
fv	output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.
type	the types of coefficients to be produced. "lpc", "cep" etc
factor	the frame length factor, i.e. the analysis frame length will be this times the local pitch period.
wf	function for windowing. See \Ref{Windowing mechanisms}

Parameters
sig	input waveform
fv	output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.
op	Features structure containing options for analysis order, frame shift etc.
slist	list of types of coefficients required, from the set of possible types that sig2coef can take.

Parameters
sig	input waveform
a	output power track
factor	the frame length factor, i.e. the analysis frame length will be this times the local pitch period.

Parameters
sig	input waveform
a	output coefficients
factor	optional: the frame length factor, i.e. the analysis frame length will be this times the local pitch period.

Parameters
sig	input waveform
fbank	the output. The number of filters is determined from the number size of this track.
factor	the frame length factor, i.e. the analysis frame length will be this times the local pitch period
wf	function for windowing. See \Ref{Windowing mechanisms}
up	whether the filterbank analysis should use power rather than energy.
take_log	whether to take logs of the filter outputs

Parameters
sig	input: waveform
mfcc_track	the output
factor	the frame length factor, i.e. the analysis frame length will be this times the local pitch period
fbank_order	the number of Mel scale filters used for the analysis
liftering_parameter	for filtering in the cepstral domain See \Ref{fbank2melcep}
wf	function for windowing. See \Ref{Windowing mechanisms}
include_c0	whether the zero'th cepstral coefficient is to be included
up	whether the filterbank analysis should use power rather than energy.

Delta and Acceleration coefficents

Table of Contents
delta()
sigpr_delta()
sigpr_acc()
Produce delta and acceleration coefficents from a set of coefficients or the waveform.
delta()
void delta ( EST_Track &tr, EST_Track &d, int regression_length = 3)
Produce a set of delta coefficents for a track
The delta function is used to produce a set of coefficients which estimate the rate of change of a set of parameters. The output track d must be setup before hand, i.e. it must have the same number of frames and channels as tr.
Parameters
tr
input track of base coefficients
d
output track of delta coefficients.
regression_length
number of previous frames on which delta estimation is calculated on.
sigpr_delta()
void sigpr_delta ( EST_Wave &sig, EST_Track &fv, EST_Features &op, const EST_StrList &slist)
Produce multiple sets of delta coefficents from a waveform.
Calculate specified types of delta coefficients. This function is used when the base types of coefficients haven't been calculated. This function calls sig2coef to calculate the base types from which the deltas are calculated, and hence the requirements governing the setup of fv for sig2coef also hold here.
Parameters
sig
input waveform
fv
output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.
op
Features structure containing options for analysis order, frame shift etc.
slist
list of types of delta coefficients required.
sigpr_acc()
void sigpr_acc ( EST_Wave &sig, EST_Track &fv, EST_Features &op, const EST_StrList &slist)
Produce multiple sets of acceleration coefficents from a waveform
Calculate specified types of acceleration coefficients. This function is used when the base types of coefficient haven't been calculated. This function calls sig2coef to calculate the base types from which the deltas are calculated, and hence the requirements governing the setup of fv for sig2coef also hold here.
Parameters
sig
input waveform
fv
output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.
op
Features structure containing options for analysis order, frame shift etc.
slist
list of types of acceleration coefficients required. The delta function is used to produce a set of coefficients which estimate the rate of change of a set of parameters.

Parameters
tr	input track of base coefficients
d	output track of delta coefficients.
regression_length	number of previous frames on which delta estimation is calculated on.

Parameters
sig	input waveform
fv	output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.
op	Features structure containing options for analysis order, frame shift etc.
slist	list of types of delta coefficients required.

Parameters
sig	input waveform
fv	output coefficients. These have been pre-allocated and the number of channels in a indicates the order of the analysis.
op	Features structure containing options for analysis order, frame shift etc.
slist	list of types of acceleration coefficients required. The delta function is used to produce a set of coefficients which estimate the rate of change of a set of parameters.

Pitch/F0 Detection Algorithm functions

Table of Contents
pda()
icda()
default_pda_options()
srpd()
smooth_phrase()
smooth_portion()
These functions are used to produce a track of fundamental frequency (F0) against time of a waveform.
pda()
void pda ( EST_Wave &sig, EST_Track &fz, EST_Features &op, EST_String method="")
Top level pitch (F0) detection algorithm. Returns a track conatining evenly spaced frames of speech, each containing a F0 value for that point.
At present, only the \Rref{srpd} pitch tracker is implemented, so this is always called regardless of what method is set to.
Parameters
sig
input waveform
fz
output f0 contour
op
parameters for pitch tracker
method
pda method to be used.
icda()
void icda ( EST_Wave &sig, EST_Track &fz, EST_Track &speech, EST_Option &op, EST_String method = "")
Top level intonation contour detection algorithm. Returns a track conatining evenly spaced frames of speech, each containing a F0 for that point. {\tt icda} differs from \Ref{pda} in that the contour is smoothed, and unvoiced portions have interpolated F0 values.
Parameters
sig
input waveform
fz
output f0 contour
speech
Interpolation is controlled by the <tt>speech</tt> track. When a point has a positive value in the speech track, it is a candidate for interpolation.
op
parameters for pitch tracker
method
pda method to be used.
default_pda_options()
void default_pda_options ( EST_Features &al)
Create a set sensible defaults for use in pda and icda
srpd()
void srpd ( EST_Wave &sig, EST_Track &fz, EST_Features &options)
Super resolution pitch trackerer.
srpd is a pitch detection algorithm that produces a fundamental frequency contour from a speech waveform. At present only the super resolution pitch detetmination algorithm is implemented. See (Medan, Yair, and Chazan, 1991) and (Bagshaw et al., 1993) for a detailed description of the algorithm.
Frames of data are read in from sig in chronological order such that each frame is shifted in time from its predecessor by pda_frame_shift. Each frame is analysed in turn.

The maximum and minimum signal amplitudes are initially found over the duration of two segments, each of length N_min samples. If the sum of their absolute values is below two times noise_floor, the frame is classified as representing silence and no coefficients are calculated. Otherwise, a cross correlation coefficient is calculated for all n from a period in samples corresponding to min_pitch to a period in samples corresponding to max_pitch, in steps of decimation_factor. In calculating the coefficient only one in decimation_factor samples of the two segments are used. Such down-sampling permits rapid estimates of the coefficients to be calculated over the range N_min <= n <= N_max. This results in a cross-correlation track for the frame being analysed.
Local maxima of the track with a coefficient value above a specified threshold form candidates for the fundamental period. The threshold is adaptive and dependent upon the values v2uv_coeff_thresh, min_v2uv_coef_thresh , and v2uv_coef_thresh_rati_ratio. If the previously analysed frame was classified as unvoiced or silent (which is the initial state) then the threshold is set to v2uv_coef_thresh. Otherwise, the previous frame was classified as being voiced, and the threshold is set equal to [\-r] v2uv_coef_thresh_rati_ratio times the cross-correlation coefficient value at the point of the previous fundamental period in the former coefficients track. This product is not permitted to drop below v2uv_coef_thresh.
If no candidates for the fundamental period are found, the frame is classified as being unvoiced. Otherwise, the candidates are further processed to identify the most likely true pitch period. During this additional processing, a threshold given by anti_doubling_thres is used.
If the peak_tracking flag is set to true, biasing is applied to the cross-correlation track as described in (Bagshaw et al., 1993).

Parameters
sig
input waveform
op
options regarding pitch tracking parameters
op.min_pitch
minimum permitted F0 value
op.max_pitch
maximum permitted F0 value
op.pda_frame_shift
analysis frame shift
op.pda_frame_length
analysis frame length
op.lpf_cutoff
cut off frequency for low pass filtering
op.lpf_order
order of low pass filtering (must be odd)
op.decimation
op.noise_floor
op.min_v2uv_coef_thresh
op.v2uv_coef_thresh_ratio
op.v2uv_coef_thresh
op.anti_doubling_thresh
op.peak_tracking
smooth_phrase()
void smooth_phrase ( EST_Track &c, EST_Track &speech, EST_Features &options, EST_Track &sm)
Smooth selected parts of an f0 contour. Interpolation is controlled by the speech track. When a point has a positive value in the speech track, it is a candidate for interpolation.
smooth_portion()
void smooth_portion ( EST_Track &c, EST_Option &op)
Smooth all the points in an F0 contour

Pitchmarking Functions

Table of Contents
pitchmark()
pitchmark()
neg_zero_cross_pick()
pm_fill()
pm_min_check()
Pitchmarking involves finding some pre-defined pitch related instant for every pitch period in the speech. At present, only functions for analysing laryngograph waveforms are available - the much harder problem of doing this on actual speech has not been attempted.
pitchmark()
EST_Track pitchmark ( EST_Wave &lx, EST_Features &op)
Find pitchmarks in Larynograph (lx) signal.
This high level function places a pitchmark on each positive peak in the voiced portions of the lx signal. Pitchmarks are stored in the time component of a EST_Track object and returned. The function works by high and low pass filtering the signal using forward and backward filtering to remove phase shift. The negative going points in the smoothed differentiated signal, corresponding to peaks in the original are then chosen.
Parameters
lx
laryngograph waveform
op
options, mainly for filter control: \begin{itemize} \item {\bf lx_low_frequency} low pass cut off for lx filtering : typical value {\tt 400} \item {\bf lx_low_order} order of low pass lx filter: typical value 19 \item {\bf lx_high_frequency} high pass cut off for lx filtering: typical value 40 \item {\bf lx_high_order} order of high pass lx filter: typical value 19 \item {\bf median_order} order of high pass lx filter: typical value 19 \end{itemize}
pitchmark()
EST_Track pitchmark ( EST_Wave &lx, int lx_lf, int lx_lo, int lx_hf, int lx_ho, int mo, int debug = 0)
Find pitchmarks in Larynograph (lx) signal. The function is the same as \Ref{pitchmark} but with more explicit control over the parameters.
Parameters
lx
laryngograph waveform
lx_lf
low pass cut off for lx filtering : typical value 400
lx_fo
order of low pass lx filter : typical value 19
lx_hf
high pass cut off for lx filtering : typical value 40
lx_ho
: typical value 19
mo
order of median smoother used to smoother differentiated lx : typical value 19
neg_zero_cross_pick()
void neg_zero_cross_pick ( EST_Wave &lx, EST_Track &pm)
Find times where waveform cross zero axis in negative direction.
Parameters
sig
waveform
pm
pitchmark track which stores time positions of negative crossings
pm_fill()
void pm_fill ( EST_Track &pm, float new_end, float max, float min, float def)
Produce a set of sensible pitchmarks.
Given a set of raw pitchmarks, this function makes sure no pitch period is shorter that {\tt min} seconds and no longer than {\tt max} seconds. Periods that are too short are eliminated. If a peroid is too long, extra pitchmarks are inserted whose period is {\it approxiamtely} {\tt def} seconds in duration. The approximation is to ensure that the pitch period in the interval, D, is constant, and so the actual pitch period is given by

pm_min_check()
void pm_min_check ( EST_Track &pm, float min)
Remove pitchmarks which are too close together.
This doesn't work in a particularly sophisticated way, in that it removes a sequence of too close pitchmarks left to right, and doesn't attempt to find which ones in the sequence are actually spurious.

Spectrogram generation

Table of Contents
raw_spectrogram()
scale_spectrogram()
raw_spectrogram()
void raw_spectrogram ( EST_Track &sp, EST_Wave &sig, float length, float shift, int order, bool slow=0)
Compute the power-spectrogram
scale_spectrogram()
void scale_spectrogram ( EST_Track &s, float range, float b, float w)
Manipulate the spectrogram to

Functions for Windowing Frames of Waveforms

class EST_Window{}

class EST_Window

The EST_Window class provides functions for the creation and use of signal processing windows.

Signal processing algorithms often work by on small sections of the speech waveform known as {\em frames}. A full signal must first be divided into these frames before these algroithms can work. While it would be simple to just "cut out" the required frames from the waveforms, this is usually undesirable as large discontinuities can occur at the frame edges. Instead it is customary to cut out the frame by means of a \{em window} function, which tapers the signal in the frame so that it has high values in the middle and low or zero values near the frame edges. The \Ref{EST_Window} class provides a wrap around for such windowing operations.

There are several types of window function, including:

\begin{itemize}

\item {\bf Rectangular}, which is used to give a simple copy of the the values between the window limits.

\item {\bf Hanning}. The rectangular window can cause sharp discontinuities at window edges. The hanning window solves this by ensuring that the window edges taper to 0.

\item {\bf Hamming.} The hanning window causes considerable energy loss, which the hamming window attempts to rectify.

\end{itemize}

The particular choice of window depends on the application. For instance in most speech synthesis applications Hanning windows are the most suitable as they don't have time domain discontinuities. For analysis applications hamming windows are normally used.

For example code, see \Ref{Windowing}

Func;

typedef EST_WindowFunc Func

A function which creates a window

Functions for making windows.

make_window()
static void make_window ( EST_TBuffer<float> &window_vals, int size, const char *name)
Make a Buffer of containing a window function of specified type
make_window()
static void make_window ( EST_FVector &window_vals, int size, const char *name)
Make a EST_FVector containing a window function of specified type
creator()
static Func* creator ( const char *name, bool report_error = false)
Return the creation function for the given window type.

Performing windowing on a section of speech.

window_signal()
static void window_signal ( const EST_Wave &sig, EST_WindowFunc *make_window, int start, int size, EST_TBuffer<float> &frame)
Window the waveform {\tt sig} starting at point {\tt start} for a duration of {\tt size} samples. The windowing function required is given as a function pointer {\tt *make_window} which has already been created by a function such as \Ref{creator}. The output windowed frame is placed in the buffer {\tt frame} which will have been resized accordingly within the function.
window_signal()
static void window_signal ( const EST_Wave &sig, EST_WindowFunc *make_window, int start, int size, EST_FVector &frame, int resize=0)
Window the waveform {\tt sig} starting at point {\tt start} for a duration of {\tt size} samples. The windowing function required is given as a function pointer {\tt *make_window} which has already been created by a function such as \Ref{creator}. The output windowed frame is placed in the EST_FVector {\tt frame}. By default, it is assumed that this is already the correct size (i.e. {\tt size} samples long), but if resizing is required the last argument should be set to 1.
window_signal()
static void window_signal ( const EST_Wave &sig, const EST_String &window_name, int start, int size, EST_FVector &frame, int resize=0)
Window the waveform {\tt sig} starting at point {\tt start} for a duration of {\tt size} samples. The windowing function required is given as a string: this function will make a temporary window of this type. The output windowed frame is placed in the EST_FVector {\tt frame}. By default, it is assumed that this is already the correct size (i.e. {\tt size} samples long), but if resizing is required the last argument should be set to 1.
window_signal()
static void window_signal ( const EST_Wave &sig, EST_TBuffer<float> &window_vals, int start, int size, EST_FVector &frame, int resize=0)
Window the waveform {\tt sig} starting at point {\tt start} for a duration of {\tt size} samples. The window shape required is given as an array of floats. The output windowed frame is placed in the EST_FVector {\tt frame}. By default, it is assumed that this is already the correct size (i.e. {\tt size} samples long), but if resizing is required the last argument should be set to 1.

Utility window functions.

description()
static EST_String description ( const char *name)
Return the description for a given window type.
options_supported()
static EST_String options_supported ( void)
Return a paragraph describing the available windows.
options_short()
static EST_String options_short ( void)
Return a comma separated list of the available window types.

Filter funtions

A filter modifies a waveform by changing its frequency characteristics. The following types of filter are currently supported:

FIR filters
FIR filters are general purpose finite impulse response filters which are useful for band-pass, low-pass and high-pass filtering.
Linear Prediction filters
are used to produce LP residuals from waveforms and vice versa
Pre Emphasis filters
are simple filters for changing the spectral tilt of a signal
Non linear filters
Miscelaneous filters

FIR filters

Table of Contents
FIRfilter()
FIRfilter()
FIR_double_filter()
FIRlowpass_filter()
FIRlowpass_filter()
FIRhighpass_filter()
FIRhighpass_filter()
FIRhighpass_double_filter()
FIRhighpass_double_filter()
FIRlowpass_double_filter()
FIRlowpass_double_filter()
Finite impulse response (FIR) filters which are useful for band-pass, low-pass and high-pass filtering.
FIR filters perform the following operation:
where is the filter order, are the filter coefficients, is the input at time and is the output at time . Functions are provided for designing the filter (i.e. finding the coefficients).
FIRfilter()
void FIRfilter ( EST_Wave &in_sig, const EST_FVector &numerator, int delay_correction=0)
General purpose FIR filter. This function will filter the waveform {\tt sig} with a previously designed filter, given as {\tt numerator}. The filter coefficients can be designed using one of the designed functions, e.g. \Ref{design_FIR_filter}.
FIRfilter()
void FIRfilter ( const EST_Wave &in_sig, EST_Wave &out_sig, const EST_FVector &numerator, int delay_correction=0)
General purpose FIR filter. This function will filter the waveform {\tt sig} with a previously designed filter, given as {\tt numerator}. The filter coefficients can be designed using one of the designed functions, e.g. \Ref{design_FIR_filter} .
FIR_double_filter()
void FIR_double_filter ( EST_Wave &in_sig, EST_Wave &out_sig, const EST_FVector &numerator)
General purpose FIR double (zero-phase) filter. This function will double filter the waveform {\tt sig} with a previously designed filter, given as {\tt numerator}. The filter coefficients can be designed using one of the designed functions, e.g. \Ref{design_FIR_filter}. Double filtering is performed by filtering the signal normally, resversing the waveform, filtering again and reversing the waveform again. Normal filtering will impose a lag on the signal depending on the order of the filter. By filtering the signal forwards and backwards, the lags cancel each other out and the output signal is in phase with the input signal.
FIRlowpass_filter()
void FIRlowpass_filter ( EST_Wave &sigin, int freq, int order=DEFAULT_FILTER_ORDER)
Quick function for one-off low pass filtering. If repeated lowpass filtering is needed, first design the required filter using \Ref{design_lowpass_filter}, and then use \Ref{FIRfilter} to do the actual filtering.
Parameters
in_sig
input waveform, which will be overwritten
freq
order
number of filter coefficients, eg. 99
FIRlowpass_filter()
void FIRlowpass_filter ( const EST_Wave &in_sig, EST_Wave &out_sig, int freq, int order=DEFAULT_FILTER_ORDER)
Quick function for one-off low pass filtering. If repeated lowpass filtering is needed, first design the required filter using \Ref{design_lowpass_filter}, and then use \Ref{FIRfilter} to do the actual filtering.
Parameters
in_sig
input waveform
out_sig
output waveform
freq
cutoff frequency in Hertz
order
number of filter coefficients , e.g. 99
FIRhighpass_filter()
void FIRhighpass_filter ( EST_Wave &in_sig, int freq, int order)
Quick function for one-off high pass filtering. If repeated lowpass filtering is needed, first design the required filter using design_lowpass_filter, and then use FIRfilter to do the actual filtering.
Parameters
in_sig
input waveform, which will be overwritten
freq
cutoff frequency in Hertz
order
number of filter coefficients, eg. 99
FIRhighpass_filter()
void FIRhighpass_filter ( const EST_Wave &sigin, EST_Wave &out_sig, int freq, int order=DEFAULT_FILTER_ORDER)
Quick function for one-off high pass filtering. If repeated highpass filtering is needed, first design the required filter using design_highpass_filter, and then use FIRfilter to do the actual filtering.
Parameters
in_sig
input waveform
out_sig
output waveform
freq
cutoff frequency in Hertz
order
number of filter coefficients, eg. 99
FIRhighpass_double_filter()
void FIRhighpass_double_filter ( EST_Wave &sigin, int freq, int order=DEFAULT_FILTER_ORDER)
Quick function for one-off double low pass filtering.
Normal low pass filtering (\Ref{FIRlowpass_filter}) introduces a time delay. This function filters the signal twice, first forward and then backwards, which ensures a zero phase lag. Hence the order parameter need only be half what it is for (\Ref{FIRlowpass_filter} to achieve the same effect.
Parameters
in_sig
input waveform, which will be overwritten
freq
cutoff frequency in Hertz
order
number of filter coefficients, eg. 99
FIRhighpass_double_filter()
void FIRhighpass_double_filter ( const EST_Wave &int_sig, EST_Wave &out_sig, int freq, int order=DEFAULT_FILTER_ORDER)
Quick function for one-off double low pass filtering.
Normal low pass filtering (\Ref{FIRlowpass_filter}) introduces a time delay. This function filters the signal twice, first forward and then backwards, which ensures a zero phase lag. Hence the order parameter need only be half what it is for (\Ref{FIRlowpass_filter} to achieve the same effect.
Parameters
in_sig
input waveform
out_sig
output waveform
freq
cutoff frequency in Hertz
order
number of filter coefficients, eg. 99
FIRlowpass_double_filter()
void FIRlowpass_double_filter ( EST_Wave &sigin, int freq, int order=DEFAULT_FILTER_ORDER)
Quick function for one-off zero phase high pass filtering.
Normal high pass filtering (\Ref{FIRhighpass_filter}) introduces a time delay. This function filters the signal twice, first forward and then backwards, which ensures a zero phase lag. Hence the order parameter need only be half what it is for (\Ref{FIRhighpass_filter} to achieve the same effect.
Parameters
in_sig
input waveform, which will be overwritten
freq
cutoff frequency in Hertz
order
number of filter coefficients, eg. 99
FIRlowpass_double_filter()
void FIRlowpass_double_filter ( const EST_Wave &in_sig, EST_Wave &out_sig, int freq, int order=DEFAULT_FILTER_ORDER)
Quick function for one-off zero phase high pass filtering.
Normal high pass filtering (\Ref{FIRhighpass_filter}) introduces a time delay. This function filters the signal twice, first forward and then backwards, which ensures a zero phase lag. Hence the order parameter need only be half what it is for (\Ref{FIRhighpass_filter} to achieve the same effect.
Parameters
in_sig
input waveform
out_sig
output waveform
freq
cutoff frequency in Hertz
order
number of filter coefficients, eg. 99

Linear Prediction filters

Table of Contents
lpc_filter()
inv_lpc_filter()
lpc_filter_1()
lpc_filter_fast()
inv_lpc_filter_ola()
The linear prediction filters are used for the analysis and synthesis of waveforms according the to linear prediction all-pole model.
The linear prediction states that the value of a signal at a given point is equal to a weighted sum of the previous P values, plus a correction value for that point:
Given a set of coefficents and the original signal, we can use this equation to work out e, the {\it residual}. Conversely given the coefficients and the residual signal, an estimation of the original signal can be calculated.
If a single set of coefficients were used for the entire waveform, the filtering process would be simple. It is usual however to have a different set of coefficients for every frame, and there are many possible ways to switch from one coefficient set to another so as not to cause discontinuities at the frame boundaries.
lpc_filter()
void lpc_filter ( EST_Wave &sig, EST_FVector &a, EST_Wave &res)
Synthesize a signal from a single set of linear prediction coefficients and the residual values.
Parameters
sig
the waveform to be synthesized
a
a single set of LP coefficients
res
the input residual waveform
inv_lpc_filter()
void inv_lpc_filter ( EST_Wave &sig, EST_FVector &a, EST_Wave &res)
Filter the waveform using a single set of coefficients so as to produce a residual signal.
Parameters
sig
the speech waveform to be filtered
a
a single set of LP coefficients
res
the output residual waveform
lpc_filter_1()
void lpc_filter_1 ( EST_Track &lpc, EST_Wave & res, EST_Wave &sig)
Synthesize a signal from a track of linear prediction coefficients. This function takes a set of LP frames and a residual and produces a synthesized signal.
For each frame, the function picks an end point, which is half-way between the current frame's time position and the next frame's. A start point is defined as being the previous frame's end. Using these two values, a portion of residual is extracted and passed to \Ref{lpc_filter} along with the LP coefficients for that frame. This function writes directly into the signal for the values between start and end;
Parameters
sig
the waveform to be synthesized
lpc
a track of time positioned LP coefficients
res
the input residual waveform
lpc_filter_fast()
void lpc_filter_fast ( EST_Track &lpc, EST_Wave & res, EST_Wave &sig)
Synthesize a signal from a track of linear prediction coefficients. This function takes a set of LP frames and a residual and produces a synthesized signal.
This is functionally equivalent to \Ref{lpc_filter_1} except it resduces the residual by 0.5 before filtering. Importantly it is about three times faster than \Ref{lpc_filter_1} but in doing so uses direct C buffers rather than the neat C++ access function. This function should be regarded as temporary and will be deleted after we restructure the low level classes to give better access.
Parameters
sig
the waveform to be synthesized
lpc
a track of time positioned LP coefficients
res
the input residual waveform
inv_lpc_filter_ola()
void inv_lpc_filter_ola ( EST_Wave &sig, EST_Track &lpc, EST_Wave &res)
Produce a residual from a track of linear prediction coefficients and a signal using an overlap add technique.
For each frame, the function estimates the local pitch period and picks a start point one period before the current time position and an end point one period after it.
A portion of residual corresponding to these times is then produced using \Ref{inv_lpc_filter}. The resultant section of residual is then overlap-added into the main residual wave object.
Parameters
sig
the speech waveform to be filtered
lpc
a track of time positioned LP coefficients
res
the output residual waveform

Pre/Post Emphasis filters.

Table of Contents
pre_emphasis()
pre_emphasis()
post_emphasis()
post_emphasis()
These functions adjust the spectral tilt of the input waveform.
pre_emphasis()
void pre_emphasis ( EST_Wave &sig, float a=DEFAULT_PRE_EMPH_FACTOR)
Pre-emphasis filtering. This performs simple high pass filtering with a one tap filter of value {\tt a}. Normal values of a range between 0.95 and 0.99.
pre_emphasis()
void pre_emphasis ( EST_Wave &sig, EST_Wave &out, float a=DEFAULT_PRE_EMPH_FACTOR)
Pre-emphasis filtering. This performs simple high pass filtering with a one tap filter of value {\tt a}. Normal values of a range between 0.95 and 0.99.
post_emphasis()
void post_emphasis ( EST_Wave &sig, float a=DEFAULT_PRE_EMPH_FACTOR)
Post-emphasis filtering. This performs simple low pass filtering with a one tap filter of value a. Normal values of a range between 0.95 and 0.99. The same values of {\tt a} should be used when pre- and post-emphasizing the same signal.
post_emphasis()
void post_emphasis ( EST_Wave &sig, EST_Wave &out, float a=DEFAULT_PRE_EMPH_FACTOR)
Post-emphasis filtering. This performs simple low pass filtering with a one tap filter of value a. Normal values of a range between 0.95 and 0.99. The same values of {\tt a} should be used when pre- and post-emphasizing the same signal.

Miscelaneous filters.

Some of these filters are non-linear and therefore don't fit the normal paradigm.
simple_mean_smooth()
void simple_mean_smooth ( EST_Wave &c, int n)
Filters the waveform by means of median smoothing.
This is a sort of low pass filter which aims to remove extreme values. Median smoothing works examining each sample in the wave, taking all the values in a window of size {\tt n} around that sample, sorting them and replacing that sample with the middle ranking sample in the sorted samples.
Parameters
sig
waveform to be filtered
n
size of smoothing window

Filter Design

Table of Contents
design_FIR_filter()
design_lowpass_FIR_filter()
design_highpass_FIR_filter()
FIR Filtering is a 2 stage process, first involving design and then the filtering itself. As the design is somewhat costly, it is usually desirable to design a filter outside the main loop.
For one off filtering operations, functions are provided which design and filter the waveform in a single go.
It is impossible to design an ideal filter, i.e. one which exactly obeys the desired frequency response. The "quality" of a filter is given by the order parameter, with high values indicating good approximations to desired responses. High orders are slower. The default is 199 which gives a pretty good filter, but a value as low as 19 is still usable if speech is important.
design_FIR_filter()
EST_FVector design_FIR_filter ( const EST_FVector &freq_response, int filter_order)
Create an artibtrary filter or order {\tt order} that attempts to give the frequecny response given by {\tt freq_response}. The vector {\tt freq_response} should be any size 2**N and contain a plot of the desired frequency response with values ranging between 0.0 and 1.0. The actual filtering is done by \Ref{FIRfilter}.
design_lowpass_FIR_filter()
EST_FVector design_lowpass_FIR_filter ( int sample_rate, int freq, int order)
Design a FIR lowpass filter of order {\tt order} and cut-off freqeuncy {\tt freq}. The filter coefficients are returned in the FVector and should be used in conjunction with \Ref{FIRfilter}.
design_highpass_FIR_filter()
EST_FVector design_highpass_FIR_filter ( int sample_rate, int freq, int order)
Design a FIR highpass filter of order {\tt order} and cut-off freqeuncy {\tt freq}. The filter coefficients are returned in the FVector and should be used in conjunction with \Ref{FIRfilter}

Prev	Home	Next
Signal Processing	Up	Signal processing example code

Parameters
sig	input waveform
fz	output f0 contour
op	parameters for pitch tracker
method	pda method to be used.

Parameters
sig	input waveform
fz	output f0 contour
speech	Interpolation is controlled by the <tt>speech</tt> track. When a point has a positive value in the speech track, it is a candidate for interpolation.
op	parameters for pitch tracker
method	pda method to be used.

Parameters
sig	input waveform
op	options regarding pitch tracking parameters
op.min_pitch	minimum permitted F0 value
op.max_pitch	maximum permitted F0 value
op.pda_frame_shift	analysis frame shift
op.pda_frame_length	analysis frame length
op.lpf_cutoff	cut off frequency for low pass filtering
op.lpf_order	order of low pass filtering (must be odd)
op.decimation
op.noise_floor
op.min_v2uv_coef_thresh
op.v2uv_coef_thresh_ratio
op.v2uv_coef_thresh
op.anti_doubling_thresh
op.peak_tracking

Parameters
lx	laryngograph waveform
lx_lf	low pass cut off for lx filtering : typical value 400
lx_fo	order of low pass lx filter : typical value 19
lx_hf	high pass cut off for lx filtering : typical value 40
lx_ho	: typical value 19
mo	order of median smoother used to smoother differentiated lx : typical value 19

Parameters
sig	waveform
pm	pitchmark track which stores time positions of negative crossings

FIR filters	FIR filters are general purpose finite impulse response filters which are useful for band-pass, low-pass and high-pass filtering.
Linear Prediction filters	are used to produce LP residuals from waveforms and vice versa
Pre Emphasis filters	are simple filters for changing the spectral tilt of a signal
Non linear filters	Miscelaneous filters

Parameters
in_sig	input waveform, which will be overwritten
freq
order	number of filter coefficients, eg. 99

Parameters
in_sig	input waveform
out_sig	output waveform
freq	cutoff frequency in Hertz
order	number of filter coefficients , e.g. 99

Parameters
sig	the waveform to be synthesized
a	a single set of LP coefficients
res	the input residual waveform

Parameters
sig	the speech waveform to be filtered
a	a single set of LP coefficients
res	the output residual waveform

Parameters
sig	the waveform to be synthesized
lpc	a track of time positioned LP coefficients
res	the input residual waveform