Classes

class EST_SCFG{}

class EST_SCFG

A class representing a stochastic context free grammar (SCFG).

This class includes the representation of the grammar itself and methods for training and testing it against some corpus.

At presnet of grammars in Chomsky Normal Form are supported. That is rules may be binary or unary. If binary the mother an two daughters are nonterminals, if unary the mother must be nonterminal and daughter a terminal symbol.

The terminals and nonterminals symbol sets are derived automatically from the LISP representation of the rules at initialization time and are represented as \Ref{EST_Discrete}s. The distinguished symbol is assumed to be the first mother of the first rule in the given grammar.

Constructor and initialisation functions

EST_SCFG()

EST_SCFG(LISP rules)

Initialize from a set of rules

utility functions

set_rules()

void set_rules(LISP rules)

Set (or reset) rules from external source after construction

get_rules()

LISP get_rules()

Return rules as LISP list.

rules;

SCFGRuleList rules

The rules themselves

find_terms_nonterms()

void find_terms_nonterms(EST_StrList &nt, EST_StrList &t, LISP rules)

Find the terminals and nonterminals in the given grammar, adding them to the appropriate given string lists

nonterminal()

EST_String nonterminal(int p) const

Convert nonterminal index to string form

terminal()

EST_String terminal(int m) const

Convert terminal index to string form

nonterminal()

int nonterminal(const EST_String &p) const

Convert nonterminal string to index

terminal()

int terminal(const EST_String &m) const

Convert terminal string to index

num_nonterminals()

int num_nonterminals() const

Number of nonterminals

num_terminals()

int num_terminals() const

Number of terminals

prob_B()

double prob_B(int p, int q, int r) const

The rule probability of given binary rule

prob_U()

double prob_U(int p, int m) const

The rule probability of given unary rule

set_rule_prob_cache()

void set_rule_prob_cache()

(re-)set rule probability caches

file i/o functions

load()

EST_read_status load(const EST_String &filename)

Load grammar from named file

save()

EST_write_status save(const EST_String &filename)

Save current grammar to named file

class EST_SCFG_Rule{}

class EST_SCFG_Rule

A stochastic context free grammar rule.

At present only two types of rule are supported: {\tt est\_scfg\_binary\_rule} and {\tt est\_scfg\_unary\_rule}. This is sufficient for the representation of grammars in Chomsky Normal Form. Each rule also has a probability associated with it. Terminals and noterminals are represented as ints using the \Ref{EST_Discrete}s in \Ref{EST_SCFG} to reference the actual alphabets.

Although this class includes a ``probability'' nothing in the rule itself enforces it to be a true probability. It is responsibility of the classes that use this rule to enforce that condition if desired.

EST_SCFG_Rule()

EST_SCFG_Rule(double prob, int p, int m)

Create a unary rule.

EST_SCFG_Rule()

EST_SCFG_Rule(double prob, int p, int q, int r)

Create a binary rule.

prob()

double prob() const

The rule's probability

set_prob()

void set_prob(double p)

set the probability

type()

est_scfg_rtype type() const

rule type

mother()

int mother() const

daughter1()

int daughter1() const

In a unary rule this is a terminal, in a binary rule it is a nonterminal

daughter2()

int daughter2() const

set_rule()

void set_rule(double prob, int p, int m)

set_rule()

void set_rule(double prob, int p, int q, int r)

class EST_SCFG_traintest{}

class EST_SCFG_traintest(: public EST_SCFG

A class used to train (and test) SCFGs is an extention of \Ref{EST_SCFG}.

This offers an implementation of Pereira and Schabes ``Inside-Outside reestimation from partially bracket corpora.'' ACL 1992.

A SCFG maybe trained from a corpus (optionally) containing brackets over a series of passes reestimating the grammar probabilities after each pass. This basically extends the \Ref{EST_SCFG} class adding support for a bracket corpus and various indexes for efficient use of the grammar.

test_corpus()

void test_corpus()

Test the current grammar against the current corpus print summary.

Cross entropy measure only is given.

test_crossbrackets()

void test_crossbrackets()

Test the current grammar against the current corpus.

Sumamry includes percentage of cross bracketing accuracy and percentage of fully correct parses.

load_corpus()

void load_corpus(const EST_String &filename)

Load a corpus from the given file.

Each setence in the corpus should be contained in parentheses. Additional paranethesis may be used to denote phrasing within a sentence. The corpus is read using the LISP reader so LISP conventions shold apply, notable single quotes should appear within double quotes.

train_inout()

void train_inout(int passes, int startpass, int checkpoint, int spread, const EST_String &outfile)

Train a grammar using the loaded corpus.

Parameters
passes

the number of training passes desired.

startpass

from which pass to start from

checkpoint

save the grammar every n passes

spread

Percentage of corpus to use on each pass, this cycles through the corpus on each pass.

_string-2;

class EST_WFST{}

class EST_WFST

a call representing a weighted finite-state transducer

Constructor and initialisation functions

Reseting functions

init()

void init(int init_num_states=10)

Clear with (estimation of number of states required)

init()

void init(LISP in, LISP out)

clear an initialise with given input and out alphabets

copy()

void copy(const EST_WFST &wfst)

Copy from existing wfst

clear()

void clear()

clear removing existing states if any

General utility functions

in_symbol()

int in_symbol(const EST_String &s) const

Map input symbol to input alphabet index

in_symbol()

const EST_String& in_symbol(int i) const

Map input alphabet index to input symbol

out_symbol()

int out_symbol(const EST_String &s) const

Map output symbol to output alphabet index

out_symbol()

const EST_String& out_symbol(int i) const

Map output alphabet index to output symbol

epsilon_label()

LISP epsilon_label() const

LISP for on epsilon symbols

in_epsilon()

int in_epsilon() const

Internal index for input epsilon

out_epsilon()

int out_epsilon() const

Internal index for output epsilon

state()

const EST_WFST_State* state(int i) const

Return internal state information

final()

int final(int i) const

True if state {\tt i} is final

file i/o

transduction functions

transition()

int transition(int state, int in, int out) const

Find (first) new state given in and out symbols

transition()

int transition(int state, const EST_String &in, const EST_String &out) const

Find (first) new state given in and out strings

transition()

int transition(int state, const EST_String &inout) const

Find (first) new state given in/out string

transduce()

int transduce(int state, int in, int &out) const

Transduce in to out from state

transduce()

int transduce(int state, const EST_String &in, EST_String &out) const

Transduce in to out (strings) from state

transduce()

void transduce(int state, int in, wfst_translist &out) const

Transduce in to list of transitions

transition_all()

void transition_all(int state, int in, int out, EST_WFST_MultiState *ms) const

Find all possible transitions for given state/input/output

Cumulation functions for adding collective probabilities

for transitions from data

cumulate()

int cumulate() const

Cumulation condition

start_cumulate()

void start_cumulate()

Clear and start cumulation

stop_cumulate()

void stop_cumulate()

Stop cumulation and calculate probabilities on transitions

WFST construction functions from external represenations *

add_state()

int add_state(enum wfst_state_type state_type)

Add a new state, returns new name

const;

enum wfst_state_type ms_type EST_WFST_MultiState *msconst

Given a multi-state return type (final, ok, error)

build_wfst()

void build_wfst(int start, int end, LISP regex)

Basic regex constructor

build_and_transition()

void build_and_transition(int start, int end, LISP conjunctions)

Basic conjunction constructor

build_or_transition()

void build_or_transition(int start, int end, LISP disjunctions)

Basic disjunction constructor

Basic WFST operators

determinize()

void determinize(const EST_WFST &a)

Build determinized form of a

minimize()

void minimize(const EST_WFST &a)

Build minimized form of a

complement()

void complement(const EST_WFST &a)

Build complement of a

intersection()

void intersection(EST_TList<EST_WFST> &wl)

Build intersection of all WFSTs in given list. The new WFST recognizes the only the strings that are recognized by all WFSTs in the given list

intersection()

void intersection(const EST_WFST &a, const EST_WFST &b)

Build intersection of WFSTs a and b The new WFST recognizes the only the strings that are recognized by both a and b list

uunion()

void uunion(EST_TList<EST_WFST> &wl)

Build union of all WFSTs in given list. The new WFST recognizes the only the strings that are recognized by at least one WFSTs in the given list

uunion()

void uunion(const EST_WFST &a, const EST_WFST &b)

Build union of WFSTs a and b. The new WFST recognizes the only the strings that are recognized by either a or b

compose()

void compose(const EST_WFST &a, const EST_WFST &b)

Build new WFST by composition of a and b. Outputs of a are fed to b, given a new WFSTs which has a's input language and b's output set. a's output and b's input alphabets must be the same

difference()

void difference(const EST_WFST &a, const EST_WFST &b)

Build WFST that accepts only strings in a that aren't also accepted by strings in b

concat()

void concat(const EST_WFST &a, const EST_WFST &b)

Build WFST that accepts a language that consists of any string in a followed by any string in b *

construction support fuctions

deterministic()

int deterministic() const

True if WFST is deterministic

apply_multistate()

EST_WFST_MultiState* apply_multistate(const EST_WFST &wfst, EST_WFST_MultiState *ms, int in, int out) const

Transduce a multi-state given n and out

add_epsilon_reachable()

void add_epsilon_reachable(EST_WFST_MultiState *ms) const

Extend multi-state with epsilon reachable states

remove_error_states()

void remove_error_states(const EST_WFST &a)

Remove error states from the WFST.

operator = ()

EST_WFST& operator = (const EST_WFST &a)