Edinburgh Speech Tools Library
	Prev	Chapter 3. Executable Programs	Next

sig2fv Generate signal processing coefficients from waveforms

Table of Contents
Synopsis
Options
Examples

Synopsis

sig2fv [input file] -o [output file] [-h ] [-itype string] [-n int] [-f int] [-ibo string] [-iswap ] [-istype string] [-c string] [-start float] [-end float] [-from int] [-to int] [-otype string " {ascii}"] [-S float] [-o ofile] [-shift float] [-factor float] [-pm ifile] [-coefs string] [-delta string] [-acc string] [-window_type string] [-lpc_order int] [-ref_order int] [-cep_order int] [-melcep_order int] [-fbank_order int] [-preemph float] [-lifter float] [-usepower ] [-include_c0 ] [-order string]

sig2fv is used to create signal processing feature vector analysis on speech waveforms. The following types of analysis are provided:

Linear prediction (LPC)
Cepstrum coding from lpc coefficients
Mel scale cepstrum coding via fbank
Mel scale log filterbank analysis
Line spectral frequencies
Linear prediction reflection coefficients
Root mean square energy
Power
fundamental frequency (pitch)
calculation of delta and acceleration coefficients of all of the above

The -coefs option is used to specify a list of the names of what sort of basic processing is required, and -delta and -acc are used for delta and acceleration coefficients respectively.

Options

-h
Options help
-itype
string Input file type (optional). If set to raw, this indicates that the input file does not have a header. While this can be used to specify file types other than raw, this is rarely used for other purposes as the file type of all the existing supported types can be determined automatically from the file's header. If the input file is unheadered, files are assumed to be shorts (16bit). Supported types are nist, est, esps, snd, riff, aiff, audlab, raw, ascii
-n
int Number of channels in an unheadered input file
-f
int Sample rate in Hertz for an unheadered input file
-ibo
string Input byte order in an unheadered input file: possibliities are: MSB , LSB, native or nonnative. Suns, HP, SGI Mips, M68000 are MSB (big endian) Intel, Alpha, DEC Mips, Vax are LSB (little endian)
-iswap
Swap bytes. (For use on an unheadered input file)
-istype
string Sample type in an unheadered input file: short, mulaw, byte, ascii
-c
string Select a single channel (starts from 0). Waveforms can have multiple channels. This option extracts a single channel for progcessing and discards the rest.
-start
float Extract sub-wave starting at this time, specified in seconds
-end
float Extract sub-wave ending at this time, specified in seconds
-from
int Extract sub-wave starting at this sample point
-to
int Extract sub-wave ending at this sample point
-otype
string " {ascii}" Output file type, if unspecified ascii is assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_user, htk_discrete, xmg, xgraph, ema, ema_swapped, ascii, label
-S
float Frame spacing of output in seconds. If this is different from the internal spacing, the contour is resampled at this spacing
-o
ofile Output filename, defaults to stdout
-shift
float frame spacing in seconds for fixed frame analysis. This doesn't have to be the same as the output file spacing - the S option can be used to resample the track before saving default: 0.010
-factor
float Frames lengths will be FACTOR times the local pitch period. default: 2.000
-pm
ifile Pitch mark file name. This is used to specify the positions of the analysis frames for pitch synchronous analysis. Pitchmark files are just standard track files, but the channel information is ignored and only the time positions are used
-coefs
string list of basic types of processing required. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
-delta
string list of delta types of processing required. Basic processing does not need to be specfied for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
-acc
string list of acceleration (delta delta) processing required. Basic processing does not need to be specfied for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
-window_type
string Type of window used on waveform. Permissable types are: none unknown window type rectangle Rectangular window triangle Triangular window hanning Hanning window hamming Hamming window default: hamming
-lpc_order
int Order of lpc analysis.
-ref_order
int Order of lpc reflection coefficient analysis.
-cep_order
int Order of lpc cepstral analysis.
-melcep_order
int Order of Mel cepstral analysis.
-fbank_order
int Order of filter bank analysis.
-preemph
float Perform pre-emphasis with this factor.
-lifter
float lifter coefficient.
-usepower
use power rather than energy in filter bank analysis
-include_c0
include cepstral coefficient 0
-order
string order of analyses

-h	Options help
-itype	`string` Input file type (optional). If set to raw, this indicates that the input file does not have a header. While this can be used to specify file types other than raw, this is rarely used for other purposes as the file type of all the existing supported types can be determined automatically from the file's header. If the input file is unheadered, files are assumed to be shorts (16bit). Supported types are nist, est, esps, snd, riff, aiff, audlab, raw, ascii
-n	`int` Number of channels in an unheadered input file
-f	`int` Sample rate in Hertz for an unheadered input file
-ibo	`string` Input byte order in an unheadered input file: possibliities are: MSB , LSB, native or nonnative. Suns, HP, SGI Mips, M68000 are MSB (big endian) Intel, Alpha, DEC Mips, Vax are LSB (little endian)
-iswap	Swap bytes. (For use on an unheadered input file)
-istype	`string` Sample type in an unheadered input file: short, mulaw, byte, ascii
-c	`string` Select a single channel (starts from 0). Waveforms can have multiple channels. This option extracts a single channel for progcessing and discards the rest.
-start	`float` Extract sub-wave starting at this time, specified in seconds
-end	`float` Extract sub-wave ending at this time, specified in seconds
-from	`int` Extract sub-wave starting at this sample point
-to	`int` Extract sub-wave ending at this sample point
-otype	`string` " {ascii}" Output file type, if unspecified ascii is assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_user, htk_discrete, xmg, xgraph, ema, ema_swapped, ascii, label
-S	`float` Frame spacing of output in seconds. If this is different from the internal spacing, the contour is resampled at this spacing
-o	`ofile` Output filename, defaults to stdout
-shift	`float` frame spacing in seconds for fixed frame analysis. This doesn't have to be the same as the output file spacing - the S option can be used to resample the track before saving default: 0.010
-factor	`float` Frames lengths will be FACTOR times the local pitch period. default: 2.000
-pm	`ifile` Pitch mark file name. This is used to specify the positions of the analysis frames for pitch synchronous analysis. Pitchmark files are just standard track files, but the channel information is ignored and only the time positions are used
-coefs	`string` list of basic types of processing required. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
-delta	`string` list of delta types of processing required. Basic processing does not need to be specfied for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
-acc	`string` list of acceleration (delta delta) processing required. Basic processing does not need to be specfied for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
-window_type	`string` Type of window used on waveform. Permissable types are: none unknown window type rectangle Rectangular window triangle Triangular window hanning Hanning window hamming Hamming window default: hamming
-lpc_order	`int` Order of lpc analysis.
-ref_order	`int` Order of lpc reflection coefficient analysis.
-cep_order	`int` Order of lpc cepstral analysis.
-melcep_order	`int` Order of Mel cepstral analysis.
-fbank_order	`int` Order of filter bank analysis.
-preemph	`float` Perform pre-emphasis with this factor.
-lifter	`float` lifter coefficient.
-usepower	use power rather than energy in filter bank analysis
-include_c0	include cepstral coefficient 0
-order	`string` order of analyses

Examples

Fixed frame basic linear prediction: To produce a set of linear prediction coefficients at every 10ms, using pre-emphasis and saving in EST format:

$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5


Pitch Synchronous linear prediction: . The following used the set of pitchmarks in kdt_010.pm as the centres of the analysis windows.

$ sig2fv kdt_010.wav -pm kdt_010.pm -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5

F0, Linear prediction and cepstral coefficients:

$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -shift 0.01

Note that pitchtracking can also be done with the pda program. Both use the same underlying technique, but the pda program offers much finer control over the pitch track specific processing parameters.

Energy, Linear Prediction and Cepstral coefficients, with a 10ms frame shift during analis but a 5ms frame shift in the output file:

$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -S 0.005 -shift 0.01

Delta and acc coefficients can be calculated even if ther base form is not required. This produces normal energy coefficients and cepstral delta coeficients:

$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "energy" -delta "cep" -otype est

Mel-scaled cepstra, Delta and acc coefficients, as is common in speech recognition:

$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "melcep" -delta "melcep" -acc "melcep" -otype est -preemph 0.96

Prev	Home	Next
tilt_synthesis Generate F0 contours from Tilt descriptions	Up	spectgen Make spectrograms