sig2fv Generate signal processing coefficients from waveforms

Table of Contents
Synopsis
Options
Examples

Synopsis

sig2fv [input file] -o [output file] [-h ] [-itype string] [-n int] [-f int] [-ibo string] [-iswap ] [-istype string] [-c string] [-start float] [-end float] [-from int] [-to int] [-otype string " {ascii}"] [-S float] [-o ofile] [-shift float] [-factor float] [-pm ifile] [-coefs string] [-delta string] [-acc string] [-window_type string] [-lpc_order int] [-ref_order int] [-cep_order int] [-melcep_order int] [-fbank_order int] [-preemph float] [-lifter float] [-usepower ] [-include_c0 ] [-order string]

sig2fv is used to create signal processing feature vector analysis on speech waveforms. The following types of analysis are provided:

  • Linear prediction (LPC)

  • Cepstrum coding from lpc coefficients

  • Mel scale cepstrum coding via fbank

  • Mel scale log filterbank analysis

  • Line spectral frequencies

  • Linear prediction reflection coefficients

  • Root mean square energy

  • Power

  • fundamental frequency (pitch)

  • calculation of delta and acceleration coefficients of all of the above

The -coefs option is used to specify a list of the names of what sort of basic processing is required, and -delta and -acc are used for delta and acceleration coefficients respectively.

Options

-h

Options help

-itype

string Input file type (optional). If set to raw, this indicates that the input file does not have a header. While this can be used to specify file types other than raw, this is rarely used for other purposes as the file type of all the existing supported types can be determined automatically from the file's header. If the input file is unheadered, files are assumed to be shorts (16bit). Supported types are nist, est, esps, snd, riff, aiff, audlab, raw, ascii

-n

int Number of channels in an unheadered input file

-f

int Sample rate in Hertz for an unheadered input file

-ibo

string Input byte order in an unheadered input file: possibliities are: MSB , LSB, native or nonnative. Suns, HP, SGI Mips, M68000 are MSB (big endian) Intel, Alpha, DEC Mips, Vax are LSB (little endian)

-iswap

Swap bytes. (For use on an unheadered input file)

-istype

string Sample type in an unheadered input file: short, mulaw, byte, ascii

-c

string Select a single channel (starts from 0). Waveforms can have multiple channels. This option extracts a single channel for progcessing and discards the rest.

-start

float Extract sub-wave starting at this time, specified in seconds

-end

float Extract sub-wave ending at this time, specified in seconds

-from

int Extract sub-wave starting at this sample point

-to

int Extract sub-wave ending at this sample point

-otype

string " {ascii}" Output file type, if unspecified ascii is assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_user, htk_discrete, xmg, xgraph, ema, ema_swapped, ascii, label

-S

float Frame spacing of output in seconds. If this is different from the internal spacing, the contour is resampled at this spacing

-o

ofile Output filename, defaults to stdout

-shift

float frame spacing in seconds for fixed frame analysis. This doesn't have to be the same as the output file spacing - the S option can be used to resample the track before saving default: 0.010

-factor

float Frames lengths will be FACTOR times the local pitch period. default: 2.000

-pm

ifile Pitch mark file name. This is used to specify the positions of the analysis frames for pitch synchronous analysis. Pitchmark files are just standard track files, but the channel information is ignored and only the time positions are used

-coefs

string list of basic types of processing required. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy

-delta

string list of delta types of processing required. Basic processing does not need to be specfied for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy

-acc

string list of acceleration (delta delta) processing required. Basic processing does not need to be specfied for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy

-window_type

string Type of window used on waveform. Permissable types are: none unknown window type rectangle Rectangular window triangle Triangular window hanning Hanning window hamming Hamming window default: hamming

-lpc_order

int Order of lpc analysis.

-ref_order

int Order of lpc reflection coefficient analysis.

-cep_order

int Order of lpc cepstral analysis.

-melcep_order

int Order of Mel cepstral analysis.

-fbank_order

int Order of filter bank analysis.

-preemph

float Perform pre-emphasis with this factor.

-lifter

float lifter coefficient.

-usepower

use power rather than energy in filter bank analysis

-include_c0

include cepstral coefficient 0

-order

string order of analyses

Examples

Fixed frame basic linear prediction: To produce a set of linear prediction coefficients at every 10ms, using pre-emphasis and saving in EST format:

$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5

Pitch Synchronous linear prediction
. The following used the set of pitchmarks in kdt_010.pm as the centres of the analysis windows.

$ sig2fv kdt_010.wav -pm kdt_010.pm -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5

F0, Linear prediction and cepstral coefficients:

$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -shift 0.01
Note that pitchtracking can also be done with the pda program. Both use the same underlying technique, but the pda program offers much finer control over the pitch track specific processing parameters.

Energy, Linear Prediction and Cepstral coefficients, with a 10ms frame shift during analis but a 5ms frame shift in the output file:

$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -S 0.005 -shift 0.01

Delta and acc coefficients can be calculated even if ther base form is not required. This produces normal energy coefficients and cepstral delta coeficients:

$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "energy" -delta "cep" -otype est

Mel-scaled cepstra, Delta and acc coefficients, as is common in speech recognition:

$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "melcep" -delta "melcep" -acc "melcep" -otype est -preemph 0.96