Introduction to the use of WFSTs in Speech and Language ... · Covers WFSTs and the algorithms used frequently in speech recognition Sample applications using WFSTs Edit distance

fsu-logo

Introduction Outline

Introduction to the use of WFSTs in Speech andLanguage processing

Paul Dixon and Sadaoki Furui

Department of Computer ScienceTokyo Institute of Technology

APSIPA Conference, 2009

1 / 163

fsu-logo


Outline of Tutorial

Introduction

MotivationDescription of some popular toolkits

Introduction to WFSTs and algorithms

Covers WFSTs and the algorithms used frequently in speechrecognition

Sample applications using WFSTs

Edit distanceImplementing a counting WFST using a custom semiring

Recognizing speech with WFSTs

Detailed walkthrough describing the construction of anintegrated speech recognition WFSTPart-of-speech tagging using WFST framework

2 / 163

fsu-logo


Acknowledgements

Tasuku Oonishi also at Titech working on the T 3 decoder

WFST introduction section heavily influenced by the followingtutorials and papers [15, 13, 4]

3 / 163

fsu-logo


Introduction

4 / 163

fsu-logo


Outline of Section

Motivation

Brief survey of popular WFST toolkits

5 / 163

fsu-logo


Why use WFSTs?

Economics, time to development, testing and so on

Understanding, go inside look and understand the structuresand operations

Tools for rendering and investigating the internals of WFST -useful when performing analysis and postmortems

Reduce dependency on domain specific toolkits

Write and control all own tools based on WFSTs

6 / 163

fsu-logo


Motivation for WFST in Speech

Unified representation of knowledge source

Pre/Post processing stage and even support tools can makeuse of WFST framework

WFST Speech Decoders can be easier to design andimplement

WFST Speech Decoders can be faster [10]

WFST Speech Decoders can be more flexible

7 / 163

fsu-logo


Getting Started

Get a WFST toolkit

Many toolkits available, the following are widely used in thespeech community

ATT FSM Libraryhttp://www.research.att.com/~fsmtools/fsm/OpenFst http://www.openfst.org/MIT Fst tools http://people.csail.mit.edu/ilh/fst/FSA toolkit http://www-i6.informatik.rwth-aachen.de/

~kanthak/fsa.html

Start experimenting with tools

8 / 163

http://www.research.att.com/~fsmtools/fsm/

http://www.openfst.org/

http://people.csail.mit.edu/ilh/fst/

http://www-i6.informatik.rwth-aachen.de/~kanthak/fsa.html

http://www-i6.informatik.rwth-aachen.de/~kanthak/fsa.html

fsu-logo


More Toolkit Details

OpenFst - Open source toolkit - heavy c++ template code,library or shell commands

MIT FST Toolkit - Open Source

OpenFst and ATT use same text representation conversion toand from MIT trivial. Windows and Linux

9 / 163

fsu-logo


ATT FSM toolkit

Closed source

Binaries for (Windows, *nix x86 and amd64 and some bigendian)

Shell commands free for non-commercial use

Companion toolkits for language modeling (grm) anddecoding tools (dcd) (both with more limited platformsupport)

Perhaps the easiest way to get up a WFST speech systemrunning by using the grm and dcd toolkits

Hasn’t been updated for a long time and the Windows versionis lagging a version

10 / 163

fsu-logo


OpenFst

Same authors as ATT toolkit

Very Similar feature set to ATT tools

Open source license (Apache 2)

Shell commands or C++ interfaces

Heavy template codeBuild times can be slow

Allows for the creation of user types and semirings

For example ad custom arc with acoustic and language modelscoresAdd the real or expectation semiringsPlug in a custom fst type

No shorthand command line options

Well documented

11 / 163

fsu-logo


MIT Fst

Open source license (MIT)

Windows and *nix platforms

Has some extra interesting tools in addition to the standardWFST operations

C++ or shell commands

12 / 163

fsu-logo


FSA

Source code is available (see usage restrictions)

Part of the RWTH recognition system

13 / 163

fsu-logo

Weighted Finite State Transducers

Introduction to WFSTs andAlgorithms

14 / 163

fsu-logo


WFST Operations Covered

Rational OperationsSum (Union)Product (Concatenation)Closure (Kleene Closure)

Unary OperationsReversalInversionProjection

Binary OperationsComposition

Optimization AlgorithmsDeterminizationMinimizationEpsilon RemovalWeight Pushing

Other Commands and utilitiesCompiling, Printing, Drawing and MapArcSort, Shortestpath, Shortertdistance (all by example)

15 / 163

fsu-logo


WFST Operations not Covered

Intersection

Equivalence

Label pushing

Encode/Decode

Difference

Topological Sort

Epsilon Normalization

Equivalent

Synchronization

Prune

A few others ....

16 / 163

fsu-logo


Formal Definitions

Formally a weighted transducer T of semiring S is defined as the 8tuple [13, 15]:

T = (Σ,∆,Q, I ,F ,E , λ, ρ) (1)

Where:

Σ is a finite input alphabet

∆ is a finite output alphabet

Q is a finite set of states

I ⊆ Q is the set of initial states

F ⊆ Q is the set of final states

E ⊆ Q × (Σ ∪ {ε})× (∆ ∪ {ε}) is a finite set of transitions

λ is an initial weight function

ρ is the final weight function

17 / 163

fsu-logo


Semirings

A semiring is a system with two operators ⊕ and ⊗, the tablebelow shows some example semiring that are often used in WFSTtool-kits [13, 15]

Semiring Set ⊕ ⊗ 0 1

Boolean {0, 1} ∨ ∧ 0 1Probability R + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R ∪ {−∞,+∞} min + +∞ 0

Where ∨ logical or and ∧ logical and ⊕log isx ⊕log y = −log (e−x + e−y )

Only the operations needed to:

Compute the weight of a path using the ⊗ operatorCompute the cost of sequence by summing over all the pathsusing the ⊕ operator

18 / 163

fsu-logo


Semirings

Boolean semiring

Un-weighted machines are equivalent to weighted machinesover the Boolean semiring

Real (Probability) semiring

The weights represent real numbers (or probabilities)

Log semiring

Weights are negative logs of the probabilities

Tropical semiring

Same as log semiring with the ⊕ replaced with the Viterbiapproximation.

Log and tropical semirings have the conventions exp−∞ = 0and − log(0) =∞ [2]

19 / 163

fsu-logo


Glossary or WFST related terms

Σ∗ (Sigma star) – is the Kleene closure of an alphabet Σ andrepresents the set of all finite strings over the alphabet Σ

Simple path – is a path with no cycle

A semiring is idempotent if x ⊕ x = x for all x ∈ S (Booleanand tropical semirings) [13]

Commutative – ”A commutative semiring has a commutativemultiplicative operator ⊗” [13]

Idempotent – A semiring is idempotent if x ⊕ x = xforallx ∈ S(quote) [13]

Mirror Image – ”The mirror image of a stringx = x1, x2, . . . , xn is the string xR = xn, xn−1, . . . , x1” (quote)[13]

20 / 163

fsu-logo


Glossary Continued

Deterministic – Has one initial state and at any state no morethan one outgoing arc with any input label

Sequential – Are deterministic on one side

Sub-sequential – Have an additional string at the final state

p-Sub-sequential – Have an p additional strings at the finalstate

Stochastic – The sum of arc weights leaving a state is one

Trim – Trim Transducers have no useless states

21 / 163

fsu-logo


Glossary Continued

Commutative – a⊗ (b⊕ c) = a⊗ b + a⊗ c = (b⊕ c)⊗ a [18]

Regulated – A sum does not depend on the order of the terms

Unambiguous – For any string x ∈ Σ∗ it has oneaccepting/sucessful path at most

Cycle – A cycle is a path where the initial and states are thesame

ε–cycle – Is a cycle where the input and output labels areepsilons

Successful – Path is a path from the initial to final states

non-accessible – Not reachable from an initial state

non-co-accessible – Not reachable from a final state

22 / 163

fsu-logo


Transitions

A transition e from the set transitions E is described using thefollowing [2]

0 1/0ip:op/0.5

p[e] denotes the origin state (0 in this example)

n[e] denotes the next state (1 in this example)

i [e] denotes the input label (ip in this example)

o[e] denotes the output label (op in this example)

w [e] denotes the weight (0.5 in this example)

23 / 163

fsu-logo


Path

A path π is constructed from a sequence of transitions through themachine according to π = e1, . . . , ek

A path π is a member of E∗ (The Kleene closure of E) [13]

p[π] denotes the origin state [13]

n[π] denotes the next state [13]

A cycle is a path where p[π] = n[π] [13]

A path weight in the Log or Tropical semirings is identicalbecause the multiplication operation is identical [2]

w [π] is the weight of the path obtained by using the ⊗ to theconsecutive path weights [13]

P(Q1,Q2) denotes all the paths from state Q1 to state Q2 [13]

24 / 163

fsu-logo


Weighted Acceptors

In an Acceptor A each transition has an input label

In the weighted case the transitions can be augmented with anoptional weight

P(Q1, x ,Q2) denotes all the paths from state Q1 to state Q2

that accept the string x [13]

JAK (x) denotes the sum of all paths with input sequence x[13]

JAK (x) =⊕

π ∈ P(I , x ,F )λ(p[π])⊗ w [π]⊗ ρ(n[π]) [13]

Used to recognize string

25 / 163

fsu-logo


Acceptor Example

0

1a/1

b/4

2

a/3

b/1

3/1

b/1

b/3

b/3

b/5

In the real semiring

JAK (a, b) = (1× 1× 1) + (3× 3× 1) = 10

26 / 163

fsu-logo


Weighted Transducers

In weighted transducers the arcs have input and output labelswith optional weight [13]

P(Q1, x , , yQ2) denotes all the paths from state Q1 to stateQ2 that transduces the string x to the string y [13]

JT K (x , y) denotes the sum of all paths with input sequence x[13]

JT K (x , y) =⊕

π ∈ P(I , x , y ,F )λ(p[π])⊗ w [π]⊗ ρ(n[π]) [4, 13]

Used to perform a mapping between strings with a weight [13]

27 / 163

fsu-logo


Transducer Example

0

2a:Z/1

b:X/4

1/1

a:X/3

b:Z/1

3/1

b:X/1

b:Z/3

b:Z/3

b:X/5

In the real semiring

JAK (bb,XZ ) = 4× 3× 1 = 10

28 / 163

fsu-logo


Formats, compiling and drawing

When using the tool-kits several levels of representation

Text level - text files describing the WFST topology andsymbolsBinary level - compiled binary format from the compilation toolGraph level - graphical representation can be one of the manyimage formats generated from graphviz

Introduce the representations

Text level WFSTs need to be compiled before they can beused

Print - to recover the text level

Drawing - to visualize the WFST

29 / 163

fsu-logo


Compiling FSTs

For OpenFst and ATT FSM Toolkit text format is 3,4 or 5columns

Acceptors have 3 columns optional 4th column the weightsTransducers have 4 columns with optional 5th columnrepresenting the weights

ATT fsmcompile defaults to acceptor format – use -t tospecify transducer

OpenFst fstcompile defaults to transducer format – use-acceptor=true to specify acceptor

Can use a transducer to emulate an acceptor by settings theinput and output labels the same for each arc

Final states single entry on own line with an optional final cost

30 / 163

fsu-logo


Simple Acceptor Example

Un-weighted acceptordefinition

0 1 Tokyo0 1 Kyoto0 1 Yokohama0 1 Nagoya0 1 Osaka1

Corresponding symbols

Tokyo 1Kyoto 2Yokohama 3Nagoya 4Osaka 5

31 / 163

fsu-logo


Simple Transducer Example

Weighted transducerdefinition

0 2 a Z 10 2 b X 40 1 a X 30 1 b Z 12 3 b X 12 3 b Z 31 3 b Z 31 3 b X 53

Corresponding input symbols

– 0a 1b 2

Corresponding output symbols

– 0X 1Z 2

32 / 163

fsu-logo


Printing and Drawing WFSTs

Printing WFST

Can recover the text format from a binary format

Drawing WFSTs

Very useful for understandingDebugging and performing post postmortemsUse GraphvizViewing large graphs is a problem

33 / 163

fsu-logo


Graphical Representation

0

2a:Z/1

b:X/4

1/1

a:X/3

b:Z/1

3/1

b:X/1

b:Z/3

b:Z/3

b:X/5

States are circles

Arcs are lines

Initial state thick line

Final state two lines on state

Each arc label is represented as in label:out label/weightin labelout labelweight when no weight is shown or given defaults to 1

34 / 163

fsu-logo


A Few Additional Points

A final state can have outgoing arcs

An initial state can also be a final state

Double bold circle

Not possible with ATT/OpenFst but allowable in general

Non final states can have a costMachines can have more than one initial stateHowever, both of these are easy to simulate

35 / 163

fsu-logo


Drawing WFSTs with Japanese Labels

Graphviz (dot) had some font hard wiring for postscript

Latest version works with the the pdf backend

0 1重み 2付き 3有限状態 4トランスデューサ

fstdraw -acceptor=true -isymbols=jpn.syms-portrait=true jpn.ofsa | dot -Tpdf > jpn.pdf

36 / 163

fsu-logo


Null Transitions

Null or epsilon ( ε) transitions

Consume no input symbols when traversing

Generate no output symbols when traversing

Make certain algorithms more complicated

Why they occur:

Inserted by certain algorithmNeeded in transducers to perform one-to-many mapping,cycles...

Can always be removed from an WFA

Impossible to remove from certain WFST, one-to-manymapping, cycles...

Introduce non-determinism

37 / 163

fsu-logo


Rational Operations

For the presented algorithms:

Useful for combining many simpler machines to create morecomplex ones

Algorithms are efficient and straightforward to implement

Admit on-the-fly construction

Introduce epsilons

38 / 163

fsu-logo


Sum (Union)

JT 1⊕ T 2K (x , y) = JT 1K(x1, y1)⊕ JT 2K(x2, y2)

0 1s:Sapporo

2a:-

3q:-

4p:-

5o:-

6r:-

7o:-

0 1s:Sendai

2e:-

3N:-

4d:-

5a:-

6i:-

0 1k:Kumamoto

2u:-

3m:-

4a:-

5m:-

6o:-

7t:-

8o:-

fsmunion sapporo.fsm sendai.fsm kumamoto.fsm > cities.fsmfstunion sapporo.fsm sendai.fsm | fstunion - kumamoto.fsm > cities.fsm

0

1s:Sapporo

8-:-

15

-:-

2a:-

9s:Sendai

16k:Kumamoto

3q:- 4p:- 5o:- 6r:- 7o:-

10e:-

11N:-

12d:-

13a:-

14i:-

17u:-

18m:-

19a:-

20m:-

21o:-

22t:-

23o:-

39 / 163

fsu-logo


Product (Concatenation)

JT 1⊗ T 2K (x , y) =⊕

x = x1x2y = y1y2

JT 1K(x1, y1)⊗ JT 2K(x2, y2)

0 1How

2much

3to

0 1

Tokyo

Kyoto

Yokohama

Nagoya

Osaka

fsmconcat grammar.fsm stations.fsm > gs.fsmfstconcat grammar.fsm stations.fsm > gs.fsm

0 1How

2much

3to

4-

5

Tokyo

Kyoto

Yokohama

Nagoya

Osaka

40 / 163

fsu-logo


Closure

JT ∗K (x , y) =∞⊕

n = 0JT nK (x , y) [4]

fsmclosure cities.fsm > citiesclosed.fsmfstclosure cities.fsm > citiesclosed.fsm

0

1s:Sapporo

8-:-

15-:-

2a:-

9s:Sendai

16k:Kumamoto

3q:- 4p:-

5o:-

6r:-

7

o:--:-

10e:-

11N:-

12d:-

13a:-

14i:-

-:-

17u:-

18m:- 19a:-

20m:-

21o:-

22

t:-

23

o:-

-:-

24-:-

41 / 163

fsu-logo


Reversal

Input is a single weighted transducer TOutput is a single reversed weighted transducer TJT K (x , y) = JT K (x , y) [4]

0 1How

2much

3to

4-

5

Tokyo

Kyoto

Yokohama

Nagoya

Osaka

(a) T

0 6-

5

Tokyo

Kyoto

Yokohama

Nagoya

Osaka

12How

3much

4to-

(b) T42 / 163

fsu-logo


Inversion

T−1 (x , y)) = T (y , x) [13]

Input is a single weighted transducer T

Output is a single weighted transducer T−1 with the inputand output labels reversed

Example based on [13]

0

2a:Z/1

b:X/4

1/1

a:X/3

b:Z/1

3/1

b:X/1

b:Z/3

b:Z/3

b:X/5

(a) T

0

2Z:a/1

X:b/4

1/1

X:a/3

Z:b/1

3/1

X:b/1

Z:b/3

Z:b/3

X:b/5

(b) T−1

43 / 163

fsu-logo


Projection

Input is a single weighted transducer T

Output a single acceptor with either:∏1 T – The output labels removed∏2 T – The input labels removed

0

2a:Z/1

b:X/4

1/1

a:X/3

b:Z/1

3/1

b:X/1

b:Z/3

b:Z/3

b:X/5

(a) T

0

2a/1

b/4

1/1

a/3

b/1

3/1

b/1

b/3

b/3

b/5

(b)∏

1 T

44 / 163

fsu-logo


Composition

Very powerful operation

Allows multiple levels of information to be combined

Formal definition

JT1 ◦ T2K ((x , y) =⊕z

JT1K (x , z)⊗ JT2K (z , y)

45 / 163

fsu-logo


Epsilon Free Composition

Given a state q1 in T 1 and a state q2 in T 2

Cycle through the arc combinations

If an output symbol matches an input symbol

Create a new arc

0 1c:a

2c:b

(a) T1

0

1a:Z

2c:Y

b:X

(b) T2

0 1c:Z

2c:X

(c) T1 ◦ T2

46 / 163

fsu-logo


Composition with Epsilons Canonical Example

The presence of epsilons makes composition more difficult

Canonical example from Mohri [14]

0 1a:a

2b:-

3c:-

4d:d

(a) T1

0 1a:d

2-:e

3d:a

(b) T2

47 / 163

fsu-logo


Canonical Example

Create T1 by replacing ε outputs with ε2 adding ε : ε1 selfloops to every stateCreate T2 by replacing ε inputs with ε1 adding ε2 : ε self loopsto every stateCompose use standard procedure and allow the match ε2ε1

c : ε2

ε : ε1 ε : ε1 ε : ε1

b : ε2 d : da : a

ε : ε1 ε : ε1

10 32 4

(a) T1

d : a

ε2 : ε ε2 : ε

ε1 : e

ε2 : ε

a : d

ε2 : ε

10 32

(b) T2

48 / 163

fsu-logo


εPath Multiplicity Problem

A problem is the composed machine has multiple paths, whenonly one is needed

Therefore costs assigned to paths are wrong in certainsemiring (non-idempotent (e.g. probability,log)

c : e

ε : ec : εd : a

c : εb : ε

b : e

ε : e

a : d

ε : e b : ε

1, 10, 0

2, 1

1, 2

3, 1

2, 2

4, 33, 2

49 / 163

fsu-logo


Canonical Example

We need to keep just one of the paths

Filter out unwanted paths

c : e

ε : ec : εd : a

c : εb : ε

b : e

ε : e

a : d

ε : e b : ε

1, 10, 0

2, 1

1, 2

3, 1

2, 2

4, 33, 2

50 / 163

fsu-logo


Composition Filter

The solutions is to use a composition filter

This will restrict the paths that can be made in the composedmachine

This can be represented as an WFST

Where x represents any symbol

x : x

ε2 : ε2ε2 : ε2

x : x

ε1 : ε1

ε1 : ε1

ε2 : ε1

x : x1

0

2

51 / 163

fsu-logo


Composition using the Filter

It is now possible to perform composition of all 3 togetherusing the standard algorithm

T 1 ◦ F ◦ T 2

c : e

ε : ec : εd : a

c : εb : ε

b : e

ε : e

a : d

ε : e b : ε

1, 10, 0

2, 1

1, 2

3, 1

2, 2

4, 33, 2

52 / 163

fsu-logo


Points to Consider for Efficient Composition

The following factors can be used to make composition moreefficient.

Sorting or index the output of T1 and the input of T2

ATT – indexed wit fsmconvertOpenFst – sort with fstarcosrt

Favour placing outputs labels early to avoid matching delaysand the generation of dead-end states

ATT FSM toolkit has indexed types that faster provide fastercomposition, for OpenFst use the arc sorting features

53 / 163

fsu-logo


Determinization

An automaton is deterministic if it meets the followingconditions:

A single initial stateNo two arcs leaving a state with the same input labelOpenFst and ATT FSM library treat ε the same as a normalsymbol

Determinization takes a WFST and attempts to construct adeterministic equivalent

Along with composition and shortest path algorithm extremelyuseful

Huge impact on search network efficiency

Matching is much more efficientCaveats:

Doesn’t terminate in some cases or in a reasonable amount oftime in other casesNot all automata can be determinized

Toolkits are extended to deal with transducers54 / 163

fsu-logo


Determinization Example

Example taken verbatim from Mohri [13]

0

2a/1

b/4

1/0

a/3

b/1

3/0

b/1

b/3

b/3

b/5

(a) A

0

1/2a/1

2/0

b/1 3/0

b/1

b/3

(b) A′

55 / 163

fsu-logo



Generalized versions of prefix tree

Consider the toy lexicon

0

1ky:共存

6a:あの

9g:額絵

14s:サードネス

22k:均一

28r:履歴

34m:見回し

40

r:郎

42

k:危険

47y:ゆうこ

2o::-

7n:-

10a:-

15a::-

23i:-

29e:-

35i:-

41o::-

43i:-

48u:-

3s:- 4o:-

5N:-

8o:-

11k:- 12u:- 13e:-

16d:-

17o:-

18n:- 19e:- 20

s:-21

u:-

24N:-

25i:-

26ts:- 27

u:-

30r:-

31i:-

32k:-

33i:-

36m:-

37a::-

38sh:-

39i:-

44k:- 45

e:-46

N:-

49u:-

50k:-

51o:-

56 / 163

fsu-logo



After applying determinization

0

1ky:共存

2s:サードネス

3y:ゆうこ

4g:額絵

5a:あの

6k:-

7r:-

8

m:見回し

9o::-

10a::-

11u:-

12a:-

13n:-

14i:-

15o::郎

16

e:履歴

17

i:-

18s:-

19d:-

20u:-

21k:-

22o:-

23N:均一

24

k:危険

25r:-

26m:-

27o:-

28o:-

29k:-

30u:-

31i:-

32e:-

33i:-

34a::-

35N:-

36n:-

37o:-

38e:-

39ts:-

40N:-

41k:-

42sh:-

43e:-

44u:-

45i:-

46i:-

47s:-

48u:-

57 / 163

fsu-logo


Mininization

Minimization takes a deterministic WFA and returns anequivalent WFA with the minimal ammount of states

First pushes the costs in the machine

Encodes as a costless acceptor

Performs the minimization

The OpenFst minimize command can accept transducers butunder the hood will convert to an acceptorThe ATT tools require that WFSTs are encoded as acceptors

Seems to have little improvement on speech recognition speed

Makes a small size reduction on speech recognitiontransducers

Choice of semiring can negatively affect performance[15]

58 / 163

fsu-logo


Minimization Example

Example taken verbatim from Mohri [13]

mm.fsm

0

1

a/1

b/2

c/3

2

d/4

e/5

3/1

e/0.800

f/1

e/4

f/5

(a) A

mmm.fsm

0 1

a/0.040

d/0.799

b/0.080c/0.120

e/1

2/25e/0.799

f/1

(b) A′

59 / 163

fsu-logo


Minimization Example

Minimization will share the suffixes in the toy lexicon

01

ky:共存

2s:サードネス

3y:ゆうこ

4g:額絵

5a:あの

6k:-

7r:-

8

m:見回し

9o::-

12a::-

18u:-

21a:-

20n:-

24i:-

35

o::郎

28e:履歴

32i:-

10s:-

13d:-

19u:-

22k:-

o:-

25N:均一

26k:危険

29r:-

33m:-

11o:- N:-

14o:-

15n:-

16e:-

17

s:-

u:-

k:-

23u:- e:-

27i:-

e:-ts:-

30i:-

31k:- i:-

34a::-

sh:-

60 / 163

fsu-logo


Weight Pushing

Weights can either be can pushed towards the initial or finalstates

Tropical semiring example taken verbatim from [13]

0

1

a

b/1

c/5

2

d

e/1

3

e

f/1

e/4

f/5

(a) Orginal

0

1

a

b/1

c/5

2

d/4

e/5

3

e

f/1

e

f/1

(b) Pushed to intial

61 / 163

fsu-logo


Weight Pushing Also Performs Normalization

Consider the previous example repeated in the real semiring

In the pushed machine:

The weights along the paths are normalizedThe machine is stochastic – the weight leaving a state all sumto oneThe path scores are still the same as the un-pushed machine

0

1a/0

b/1

2

d/0

e/1

3/1

e/0

f/1

e/4

f/5

(a) Orginal

0

1a/0

b/0.100

2

d/0

e/0.899

3/10

e/0

f/1

e/0.444

f/0.555

(b) Pushed to intial

62 / 163

fsu-logo


Epsilon Removal

Epsilon transitions can harm WFST efficiency

The epsilon removal algorithm attempts to remove these nulltransitions

Use with care as it can lead to explosion in network size

Not all epsilons can be removed

63 / 163

fsu-logo


Convert

Openfst tool to convert between FST types

Currently no support for converting semirings

Types not well documented so here is a list dumped out fromthe FstRegister. All should be available in both Log andTropical semirings

vector constcompact16 acceptor compact16 string compact16 unweightedcompact16 unweighted acceptor compact16 weighted string compact64 acceptorcompact64 string compact64 unweighted compact64 unweighted acceptorcompact64 weighted string compact8 acceptor compact8 stringcompact8 unweighted ompact8 unweighted acceptor compact8 weighted stringcompact acceptor compact string compact unweightedcompact unweighted acceptor compact weighted string const8const16 const64

64 / 163

fsu-logo


Map

OpenFst fstmap applies transformations to the arc and final weights asillustrated

0

1/0.5<s>:<s>/8

2/0.5an:an/4

4</s>:</s>/4

3/0.5

other:other/6.0117

an:an/0.25

other:other/0.25

</s>:</s>/0.5

</s>:</s>/0.33301

other:other/0.16699

(a) Orginal

0

1/0.5<s>:<s>/8

2/0.5an:an/4

4</s>:</s>/4

3/0.5

other:other/6.0117

an:an/0.25

other:other/0.25

</s>:</s>/0.5

</s>:</s>/0.33301

other:other/0.16699

(b) Identity 65 / 163

fsu-logo


Map continued

0

1/2<s>:<s>/0.125

2/2an:an/0.25

4</s>:</s>/0.25

3/2other:other/0.16634

an:an/4

other:other/4

</s>:</s>/2

</s>:</s>/3.0029

other:other/5.9883

(c) Invert

0

1/2.5<s>:<s>/10

2/2.5an:an/6

4/3</s>:</s>/6

3/2.5

other:other/8.0117

an:an/2.25

other:other/2.25

</s>:</s>/2.5

</s>:</s>/2.333

other:other/2.167

(d) Plus (2)

66 / 163

fsu-logo


Map continued

0

1/0.5<s>:<s>/8

2/0.5an:an/4

4</s>:</s>/4

3/0.5

other:other/6.0117

an:an/0.25

other:other/0.25

</s>:</s>/0.5

</s>:</s>/0.33301

other:other/0.16699

(e) Orginal

0

1<s>:<s>

2an:an

4</s>:</s>

3other:other

an:an

other:other

</s>:</s>

</s>:</s>

other:other

(f) Rmeight67 / 163

fsu-logo


Map continued

0

1<s>:<s>/16

2an:an/8

4/2</s>:</s>/8

3other:other/12.023

an:an/0.5

other:other/0.5

</s>:</s>

</s>:</s>/0.66602

other:other/0.33398

(g) Times (2)

0

1<s>:<s>/8

2an:an/4

4</s>:</s>/4

3

other:other/6.0117

an:an/0.25

other:other/0.25

5

-:-/0.5

</s>:</s>/0.5

-:-/0.5

-:-

other:other/0.16699

</s>:</s>/0.33301-:-/0.5

(h) Superfinal

68 / 163

fsu-logo


Section Summary

Introduced WFSTs

Covered the algorithms frequently used in speech recognitionnetwork construction

69 / 163

fsu-logo


End of Section

Questions?

70 / 163

fsu-logo

Examples

Simple applications usingWFSTs

71 / 163

fsu-logo

Examples

Section Overview

This section will introduce two simple examples of usingWFSTs:

Basic edit distance computation using WFSTsMore complicated counting example with a custom semiring

72 / 163

fsu-logo

Examples

Edit Distance

In this example use the shell level OpenFst tools to calculate theedit distance between two strings. The example demonstratescomposition and shortest path operations[11, 5].

0 1t

2o

3k

4y

5o

(a) WFST for Tokyo

0 1k

2y

3o

4t

5o

(b) WFST for Kyoto

73 / 163

fsu-logo

Examples

Edit Distance - Filter

0

a:-/1-:a/1a:aa:b/2b:-/1-:b/1b:a/2b:b

74 / 163

fsu-logo

Examples

Edit Distance - Composition

fstcompose tokyo.ofst filter.ofst | fstcompose -kyoto.ofst > tfk.ofst

0

1-:k/1

2t:-/1

3t:k/2t:-/1

4t:y/2

5-:y/1

-:k/1

6o:-/1

7o:k/2

-:y/1

o:-/18o:y/2 o:-/1

9-:o/1

10o:ot:-/1

t:o/2

11-:o/1

-:k/1

12k:-/1

13k:k

-:y/1

k:-/114k:y/2

-:o/1

k:-/115k:o/2

o:-/116o:t/2

17

-:t/1

k:-/1

-:t/118k:t/2

t:-/1

t:t

19-:t/1

-:k/1

20

y:-/121

y:k/2

-:y/1

y:-/122y:y

-:o/1

y:-/123y:o/2

-:t/1

y:-/124y:t/2

k:-/125k:o/2

26-:o/1o:-/1

o:o

27

-:o/1

y:-/1

-:o/1

28y:o/2

t:-/1

t:o/2

29-:o/1

-:k/1

30o:-/1 31

o:k/2

-:y/1

o:-/132o:y/2

-:o/1

o:-/133o:o

-:t/1

o:-/134o:t/2

-:o/1

o:-/135

o:o

y:-/1

k:-/1o:-/1

o:-/1

t:-/1

-:k/1

-:y/1

-:o/1

-:t/1

-:o/1

75 / 163

fsu-logo

Examples

Edit Distance - Best Path

fstshortestpath tfk.ofst > best.ofst

7 6t:-/1

5o:-/1

01o:o

2-:t/1

3-:o/1

4y:yk:k

76 / 163

fsu-logo

Examples

Counting Example

In this example

Use C++ and OpenFst to build a grmcount[2] clone

Add a custom semiring to OpenFst

Crete a user defined arc typeBuild a shared objects that is loaded by OpenFst tools

Count n–grams in text corpus

Makes use of the following operations:

Composition between the corpus and counting machineProjection, Epsilon Removal, Determinization andMinimization to generate a compact set of countsComposition and ShortestDistance to extract the count for aspecific n–gram

77 / 163

fsu-logo

Examples

Counting Example

From [2] calculate the expected count usingc (x) =

∑u∈Σ |u|x [A] (x)

Where [A] (x) is the probability of x in A

The below WFST implements the counting where X : Xdenotes an arbitrary transducer [12]

0

a:-/1b:-/1

1/1X:X/1

a:-/1b:-/1

The loops at state 0 consume unwanted input mapping themto epsilons

The machine X will match the desired input(s) patterns

The end loops will then map the unwanted trailing input toepsilons

78 / 163

fsu-logo

Examples

Counting Example

Project outputs and remove epsilons will to give a path foreach match

optionally determinize and minimize for a compactrepresentation

Can be applied to lattices that cannot be enumerated [2]

Extended to class models and used for training WFSTs byMurat and Roark[19]

To compute c(x) for a single n–gram compose with a linearchain automata representing the input x and use the shortestdistance algorithm to obtain the sum over all paths

Or compose with linear chain transducer with epsilons outputsand remove epsilons to give a single state and weight

79 / 163

fsu-logo

Examples

First Add Real Semiring

First Add Real Semiring

In practice can just use the log semiring.

For numerical stabilityAlready registered with the all the tools (printing, drawing ...)

Serves as example to illustrate OpenFst extensibility

Therefore the algorithms and tools should work with thecustom semiring

Real semiring does have practical application, for example Shuand Hethering[20] used the real semiring for EM training ofFSTs

80 / 163

fsu-logo

Examples

Implement Real Weight

Real (probability) semiring +,×, 0, 1OpenFst weight requirements are on the site here [21] andhere[3]

The floatweight type can be used to do most of the boilerplate work

Essential to implement the following static methods Plus,Times, Zero, One and Type

Implemented in real–weight.h

81 / 163

fsu-logo

Examples

Register Custom Arc Type

Now add support for the real semiring in the shell tools

#inc l u d e <f s t / a r c . h>#inc l u d e <f s t / v e c t o r−f s t . h>#inc l u d e <f s t / const−f s t . h>#inc l u d e <f s t / mains . h>#inc l u d e ” r e a l−w e i g h t . h”

us ing namespace f s t ;ex te rn ”C”{

vo id r e a l a r c i n i t ( ){

typede f ArcTpl<RealWeight> Rea lArc ;REGISTER FST ( V e c t o r F s t , Rea lArc ) ;REGISTER FST ( ConstFst , Rea lArc ) ;REGISTER FST MAINS ( Rea lArc ) ;

}}

g++ -O2 -fPIC real-arc.cpp -o real-arc.o -c-I/path/to/openfst-1.1/src/bin/g++ -shared -o real-arc.so real-arc.o

Final step add shared object path to LD_LIBRARY_PATH82 / 163

fsu-logo

Examples

The Counting Utility

acount

Input a text file

Output the compact count machine and

All the intermediate steps and the unigram counts

Usage: acount input.txt output.dir order.int

83 / 163

fsu-logo

Examples

Counting Training Text

Sample Text From APSIPA Siteat the centre of sapporo the city hall and the sapporo station are located at the end of an

alluvial fan at 20m above sea level while the salmon museum in makomanai is at an altitude of 80m

the distance between the two places is about 8 kilometers between the the total superficies

of the alluvial fan is approximately 20 square kilometers the central part of sapporo was

formed 6000 years before by the deposit of earth carried by the toyohira river from jozankei and

was frequently flooded in the 19th century when the river banks were not built yet there is

abundant ground water away from the riverbed due to the river underflows since there is no need

to dig deep wells to draw an high quality water life is easy on this fertile land which was used

for agriculture including the culture of fruit trees

Illustrate the process counting bigram and unigrams on a singlesentence (modified)

the distance between the two places is about 8 kilometers between the

./acount sample.sentance sentance.output 2

84 / 163

fsu-logo

Examples

Real Semiring Arc RegistrationRun fstinfo corpus.ofst, and if something went wrongERROR: FstMainRegister::GetMain: real-arc.so: cannot open shared object file: No such file or directory

ERROR: fstprint: Bad or unknown arc type "real" for this operation (PrintMain)

Otherwise:fst type vector

arc type real

input symbol table none

output symbol table none

# of states 13

# of arcs 12

initial state 0

# of final states 1

# of input/output epsilons 0

# of input epsilons 0

# of output epsilons 0

# of accessible states 13

# of coaccessible states 13

# of connected states 13

# of strongly conn components 13

expanded y

mutable y

acceptor y

input deterministic y

output deterministic y

input/output epsilons n

input epsilons n

output epsilons n

input label sorted y

output label sorted y

weighted n

cyclic n

cyclic at initial state n

top sorted y

accessible y

coaccessible y

string y

85 / 163

fsu-logo

Examples

Input and counting WFSTs

fstdraw -portrait=true -isymbols=words.syms -osymbols=words.syms corpus.ofst | dot -Tpdf > corpus.pdf

0 1the

2distance

3between

4the

5two

6places

7is

8about

98

10kilometers

0

the:-distance:-between:-two:-places:-is:-about:-8:-

kilometers:-

1

distance:distance

between:between

two:two

places:places

is:isabout:about

8:8

kilometers:kilometersthe:the

2

the:the

distance:distance

between:between

two:two

places:places

is:is

about:about

8:8

kilometers:kilometers

the:-distance:-between:-two:-places:-is:-about:-8:-

kilometers:-

the:the

distance:distance

between:between

two:two

places:places

is:is

about:about

8:8


86 / 163

fsu-logo

Examples

Composed Machine

0

1the:-

2the:the

3

the:the

4distance:-

5

distance:distance 6distance:distance

distance:-

distance:distance

7between:-

8

between:between 9between:between

between:-

between:between

10the:-

11

the:the 12the:the

the:-

the:the

13two:-

14

two:two 15two:two

two:-

two:two

16places:-

17

places:places 18places:places

places:-

places:places

19is:-

20

is:is 21is:is

is:-is:is

22about:-

23

about:about 24about:about

about:-

about:about

258:-

26

8:8

278:8

8:-8:8

28kilometers:kilometers

kilometers:-


87 / 163

fsu-logo

Examples

Projected Machine

0

1-

2the

3

the

4-

5

distance 6distance

-

distance

7-

8

between 9between

-

between

10-

11

the 12the

-the

13-

14

two 15two

-two

16-

17

places 18places

-

places

19-

20

is 21is

-is

22-

23

about 24about

-

about

25-

26

8

278

-8

28kilometers

-

kilometers

88 / 163

fsu-logo

Examples

Epsilon Removed Machine

0

3

distance

4distance

7the

8the

11places

12places

15

about

16about

19

kilometers

188

17

8

14is

13

is

10two

9two

6between

5between

2the

1the

between

two

is

8

kilometers

about

places

the

distance

89 / 163

fsu-logo

Examples

Optimized Machines

Determinized

0

1/0.5the/4

2/0.5distance/2

3/0.5between/2

4/0.5two/2

5/0.5places/2

6/0.5

is/2

7/0.5

about/2

8/0.5

8/2

9kilometers

10distance/0.25

11two/0.25

12between/0.5

13the/0.5

14places/0.5

15is/0.5

16about/0.5

178/0.5

kilometers/0.5

Minimized

0

1/0.5the/4

2/0.5distance/2

3/0.5between/2

4/0.5two/2

5/0.5places/2

6/0.5

is/2

7/0.5

about/2

8/0.5

8/2

9

kilometers

two/0.25

distance/0.25

between/0.5

the/0.5

places/0.5

is/0.5

about/0.5

8/0.5

kilometers/0.5

90 / 163

fsu-logo

Examples

Compare to grmcount

the distance between the two places is about 8 kilometers between the

acount

0

1/0.5the/4

2/0.5distance/2

3/0.5between/2

4/0.5two/2

5/0.5places/2

6/0.5

is/2

7/0.5

about/2

8/0.5

8/2

9

kilometers

two/0.25

distance/0.25

between/0.5

the/0.5

places/0.5

is/0.5

about/0.5

8/0.5

kilometers/0.5

grmcount

0/0

2/0the/2

3/0distance/1

4/0between/1

5/0two/1

6/0places/1

7/0

is/1

8/0

about/1

9/0

8/1

10/0

kilometers/1

1/0

two/1

distance/1

between/1

the/1

places/1is/1

about/1

8/1

kilometers/1

91 / 163

fsu-logo

Examples

Extracting Counts

Method 1 - ShortestDistance

0 1between

2the

fstcompile --arc type=real --isymbols=words.syms --osymbols=words.syms betweenthe.wfst |

fstcompose min.ofst - | fstshortestdistance --reverse=true

0 21 0.52 1

Method 2 - Projection and Epsilon Removal

0 1between:-

2the:-

fstcompile --arc type=real --isymbols=words.syms --osymbols=words.syms the.wfst |

fstcompose min.ofst -- |fstproject --project output=true |fstrmepsilon| fstprint

0 2

Don’t need to perform the previous optimizations to obtaincounts with this method

92 / 163

fsu-logo

Examples

Compare to a counting utility

Finally, compare acount to the counting tool in the MitLM[9]toolkitand 2alluvial 2fan 2in 2kilometers 2by 2there 2from 2water 2an 3sapporo 3was 3river 3to 3at 4is 6of 7the 17

alluvial 2and 2by 2fan 2from 2in 2kilometers 2there 2water 2an 3river 3sapporo 3to 3was 3at 4is 6of 7the 17

93 / 163

fsu-logo

Examples

Section Summary

Basic edit distance computation using WFSTs

More complicated counting example with a custom semiring

94 / 163

fsu-logo

Examples

End of Section

Questions?

95 / 163

fsu-logo

Speech Recognition

Speech Recognition withWFSTs

96 / 163

fsu-logo

Speech Recognition

Overview of Section

Detailed walkthrough on the construction of real world largevocabulary integrated WFST speech recognition system

Much more practical than other tutorials

Based on the procedure from Mohri [15]

Rules of thumb and things to avoid

Illustrate what happens when these guidelines are and are notused

The final recipe is very small

It should provide a process that builds a good integratedrecognition WFST

97 / 163

fsu-logo

Speech Recognition

Overview of Task

Use Corpus of Spontaneous Japanese to illustrate constructionmethods and toolkits

Standard large vocabulary Japanese task

233 hour of acoustic training data116 minutes of acoustic testing data

Acoustic Models have 3000 states and 128 Gaussian per state

Vocabulary 65k words where a word is a base form – POS Tag– Reading

Evaluate word accuracy after stripping POS tags and reading

Use T 3 decoder with GPU acoustic scoring [6–8]

Decode on a set of beam widths and see how the constructioninfluences the RTF/WER relationship

98 / 163

fsu-logo

Speech Recognition

WFSTs In Speech Recognition

Represent each knowledge source as a WFST, typically

Language Model – G

Lexicon – L

Context Dependency – C

Acoustic Models – H

Combine using composition – H ◦ C ◦ L ◦ G

Optimize final and intermediate WFSTs

Use in decoder

99 / 163

fsu-logo

Speech Recognition

The Process in this Walkthrough

Use the process described in the paper [15]

The cascade will be πdet(C ◦ det(L ◦ G ))

Mostly use the ATT tools for the walkthrough

Will also include a comparison to OpenFst

Should be straightforward to implement using a scriptinglanguage

100 / 163

fsu-logo

Speech Recognition

Optimization Don’t Always Improve Performance

WFSTs allow for highly efficient networks to be constructed

Use as many optimizations as possible

The decoder should go really fast?The search network should get really small?

The construction process can be very fragile

Certain transducers need to be constructed in specific ways

Applying too many optimizations

Often doesn’t improve performance or network sizeCan make performance worseCan make it impossible to construct the final network

101 / 163

fsu-logo

Speech Recognition

Language Model

Language model is converted to WFST

Finite state grammar is trivial

Also possible to efficiently represent an n–gram as WFST[1]

Use epsilons transitions to back-off states to reduce thenetwork size [1, 15]

102 / 163

fsu-logo

Speech Recognition

Trigram Example

P() is the n–gram probability, B() is the back-off weight

The states labels encodes the history

Example from [5]

-

a

a/P(a)b

b/P(b)

-/B(a) abb/P(b|a)

-/B(b)

ba

a/P(a|b)-/B(ab)

a/P(a|ab)

-/B(ba)b/P(b|ba)

103 / 163

fsu-logo

Speech Recognition

Full Bigram Example

Random strings generated (by an FSA) with an alphabet of twowords

b a a a a a a b a a ab bb ba b a b baaa a b a ab ba a

104 / 163

fsu-logo

Speech Recognition

Full Bigram Example

ARPA format LM trained with mitlm using estimate-ngram -o2-t ab.train -wl ab.lm

\data\

ngram 1=4

ngram 2=8

\1-grams:

-0.477121 </s>

-99 <s>-0.176091

-0.477121 b -0.124939

-0.477121 a -0.324511

\2-grams:

-0.477121 <s> b

-0.352183 <s> a

-0.477121 b </s>

-0.477121 b b

-0.477121 b a

-0.579784 a </s>

-0.676694 a b

-0.278754 a a

\end\

105 / 163

fsu-logo

Speech Recognition

Full Bigram Example

Converted to an WFST

<s> bb:1/1.0986

a

a:2/0.81093

-

-:0/0.40546

b:1/1.0986

a:2/1.0986

-:0/0.28768

</s>

</s>:3/1.0986

b:1/1.5581a:2/0.64185

-:0/0.74721</s>:3/1.335

b:1/1.0986

a:2/1.0986

</s>:3/1.0986

106 / 163

fsu-logo

Speech Recognition

Other Points for ARPA n–gram Models

Use the 99 value to set the start state

Use concatenation if you want a sentence begin tag

For end states:

Can’t assume no back-off implies the final state. When using arestricted vocabulary list or SRILM some back-off values mightbe missingInstead detect sentence end tags </s> and set thecorresponding state as finalAlso remember the higher n–grams a b </s> will have afinal state labelled b,<s>Direct them to a unique final state

Remember to convert log base

107 / 163

fsu-logo

Speech Recognition

Walkthrough Language Model

Trigram built from CSJ using ATT GRM Tools

Training set of approx 400,000 sentences

Katz smoothing and shrinking factor of 1

WFST States Arcs

G 299634 1422541

108 / 163

fsu-logo

Speech Recognition

Lexicon

Built from the pronunciation dictionary

Maps sequences of phonemes to words

Create a chain transducer for each word in the lexicon

Take the union

Perform closure

So we can looping through the machine

109 / 163

fsu-logo

Speech Recognition

Lexicon Example

Illustrate the procedure on a random subset from the CSJdictionary

共存 ky o: s o N あの a n o 額絵 g a k u e サードネス s a: d o n e s u 均一 k i N i ts u 履歴 r e r i k i 見回し m i m a: sh i 郎 r o: 危険 k i k e N ゆうこ y u u k o

110 / 163

fsu-logo

Speech Recognition

Chain Transducers

For each of the entries construct a linear chain transducer

0 1ky:共存

2o::-

3s:-

4o:-

5N:-

111 / 163

fsu-logo

Speech Recognition

Lexicon Example

Take the union of all words to build lexicon WFST

0

1ky:共存

6a:あの

9g:額絵

14s:サードネス

22k:均一

28r:履歴

34m:見回し

40

r:郎

42

k:危険

47y:ゆうこ

2o::-

7n:-

10a:-

15a::-

23i:-

29e:-

35i:-

41o::-

43i:-

48u:-

3s:- 4o:-

5N:-

8o:-

11k:- 12u:- 13e:-

16d:-

17o:-

18n:- 19e:- 20

s:-21

u:-

24N:-

25i:-

26ts:- 27

u:-

30r:-

31i:-

32k:-

33i:-

36m:-

37a::-

38sh:-

39i:-

44k:- 45

e:-46

N:-

49u:-

50k:-

51o:-

112 / 163

fsu-logo

Speech Recognition

Lexicon Example

Perform Closure

0

1ky:共存

6a:あの

9g:額絵

14s:サードネス

22k:均一

28r:履歴

34m:見回し

40r:郎

42k:危険

47y:ゆうこ

2o::-

7n:-

10a:-

15a::-

23i:-

29e:-

35i:-

41o::-

43i:-

48u:-

3s:-

4o:-

5N:-

-:-

8o:-

-:-

11k:- 12u:- 13e:-

-:-

16d:-

17o:-

18n:-

19

e:-

20

s:-

21

u:-

-:-

24N:-

25i:-

26ts:-

27

u:-

-:-

30r:-31

i:-32

k:-

33i:-

-:-

36m:- 37

a::- 38sh:- 39i:-

-:-

-:-

44k:-

45e:-

46N:-

-:-

49u:-

50k:-

51o:-

-:-

52-:-

113 / 163

fsu-logo

Speech Recognition

Lexicon - Points

Easier just to perform the previous step as a single script

Simple to do in a single passProbably not a good idea to create 100k small filesEasier to add the auxiliary symbols

Place output labels early in the path

The epsilons will delays in the matching and generate manydead end statesHomophones prevent determinization – add auxiliary symbolsAvoid optimizing the lexicon - because this will push theoutput labels further backSorting (or indexing) arcs is crucial in large system

114 / 163

fsu-logo

Speech Recognition

Good and bad label placement

Create two lexicons:

A lexicon with outputs labels at the start of the chains Ls

A lexicon with outputs labels at the end of the chains Le

Both Ls and Le have the same amount of states and arcs

Compose Ls or Le with G

The labels are indexed on the input of G and the output of L

Ls ◦ G took 4.5 seconds

Le ◦ G took...

115 / 163

fsu-logo

Speech Recognition

Good and bad label placement

Occurs, regardless of filter, back-off epsilon treatment ortoolkit

Repeat again on a 3000 word lexicon

Compose again without connection fsmcompose -d ls.fsmg.fsm

Ls ◦ G trim Ls ◦ G Le ◦ G trim Le ◦ GStates 46258 30710 66520834 66848Arcs 58507 42932 68004350 819747Accessible States 46258 30710 67992311 66848Co-accessible States 30710 30710 66848 66848

116 / 163

fsu-logo

Speech Recognition

Sort and Index Labels

Sorting or Indexing of arcs is also important

Informal comparison of composition with indexed, un-indexedmachines

Un-indexed composition took 4.5 minutes

Indexed composition took approx. 5 seconds

OpenFst will/should complain if arcs aren’t sorted

117 / 163

fsu-logo

Speech Recognition

Lexicon Determinization and Auxiliary Symbols

The lexicon may not be determinizable because ofhomophones

Verified on the walkthrough lexicon anyway

Determinization just runs out of memory

Solution is to add auxiliary symbols to disambiguate thehomophones

118 / 163

fsu-logo

Speech Recognition

Auxiliary Symbols

Consider the toy lexicon with an additional homophone共存 ky o: s o N あの a n o 額絵 g a k u e サードネス s a: d o n e s u 均一 k i N i ts u 履歴 r e r i k i 見回し m i m a: sh i 郎 r o: 危険 k i k e N ゆうこ y u u k o 棄権 k i k e N

Add an auxillary phone #n to each pronunciation and alter nto disambiguate to the homophones

It is now possible to determinize the lexicon

119 / 163

fsu-logo

Speech Recognition

Lexicon Augmented With Auxiliary Symbols

0

1ky:共存

7y:あの

12g:額絵

18s:サードネス

27k:均一

34r:履歴

41

m:見回し

48

r:郎

51

k:危険

57

y:ゆうこ

63k:棄権

2o::-

8o:-

13a:-

19a::-

28i:-

35e:-

42i:-

49o::-

52i:-

58u:-

64i:-

3s:- 4o:- 5N:- 6

#0:-

9n:- 10o:- 11#0:-

14k:-

15u:- 16e:- 17#0:-

20d:-

21o:-

22n:-

23e:- 24s:- 25

u:-26

#0:-

29N:-

30i:-

31ts:- 32

u:- 33#0:-

36r:-

37i:-

38k:-

39i:-

40#0:-

43m:-

44a::-

45sh:-

46i:-

47#0:-

50#0:-

53k:-

54e:- 55

N:- 56#0:-

59u:-

60k:-

61o:-

62#0:-

65k:-

66e:-

67N:-

68#1:-

120 / 163

fsu-logo

Speech Recognition

Side note using ATT automatic symbol insertion

The ATT include a special determinization algorithm toatomically add disambiguation symbols

fsmdeterminize -l 1000 in.fsm > out.fsm

Auxiliary symbols begin from numbers labeled 1000

Print out the wfst to get the symbols for later removal

Even using this algorithm – avoid determinizing the lexiconbefore composition

121 / 163

fsu-logo

Speech Recognition

Lexicon Example

Determinize to generate prefix tree

0

1ky:共存

2s:サードネス

3y:ゆうこ

4g:額絵

5a:あの

6k:-

7r:-

8

m:見回し

9o::-

10a::-

11u:-

12a:-

13n:-

14i:-

15o::郎

16

e:履歴

17

i:-

18s:-

19d:-

20u:-

21k:-

22o:-

23N:均一

24

k:危険

25r:-

26m:-

27o:-

28o:-

29k:-

30u:-

31i:-

32e:-

33i:-

34a::-

35N:-

36n:-

37o:-

38e:-

39ts:-

40N:-

41k:-

42sh:-

43e:-

44u:-

45i:-

46i:-

47s:-

48u:-

122 / 163

fsu-logo

Speech Recognition

Lexicon Example

Minimize to share common suffixes

01

ky:共存

2s:サードネス

3y:ゆうこ

4g:額絵

5a:あの

6k:-

7r:-

8

m:見回し

9o::-

12a::-

18u:-

21a:-

20n:-

24i:-

35

o::郎

28e:履歴

32i:-

10s:-

13d:-

19u:-

22k:-

o:-

25N:均一

26k:危険

29r:-

33m:-

11o:- N:-

14o:-

15n:-

16e:-

17

s:-

u:-

k:-

23u:- e:-

27i:-

e:-ts:-

30i:-

31k:- i:-

34a::-

sh:-

123 / 163

fsu-logo

Speech Recognition

Compose L and G

Optimizing L makes it smaller

But determinization pushes the labels back

We have seen this can severely damage the efficiency ofcomposition

Verified the composition of L or Lopt with G

The composition of any Lopt permutation fails

Confirms it is best not to optimize L at this stage

124 / 163

fsu-logo

Speech Recognition

Determinize L and G

The labels placements are optimized

The auxiliary symbols are in place

However using the Epsilon Matching filter and the epsilons inthe language model will cause problems

One solution is to replace epsilon transitions in G

Replace epsilon back-offs with an alternative symbol (usuallyφ)Add a φ loop to the lexicon

Easier solution is to use a Sequence filter

ATT fsmcompose with –s optionOpenFst defaults to sequence filter

125 / 163

fsu-logo

Speech Recognition

Walkthrough LG

WFST States ArcsG 299634 1422541L 401704 466703LG 1182138 2383541det(LG) 1410016 2613069

126 / 163

fsu-logo

Speech Recognition

Context-Dependency

Map from context dependent sequences to contextindependent sequenceOften the acoustic models contain only subset of tri–phonesFirst construct a network that can handle all possibletri–phonesThen replace with actual tri–phonesAdd arcs according to the following rules:

Use a delayed (and inverted) construction to ensuredeterminzability [15]

Name Type Source Destination Input OutputTri–phone l-c+r l,c c,r r l-c+rRight Bi–phone c+r –,c c,r r c+rLeft Bi–phone l-c l,c c,- (Final) - l-cMono–phone c – (Start) –,c c –

–,c c,– – c

127 / 163

fsu-logo

Speech Recognition

Deterministic Context Dependency Construction

-,-

-,a

-:a

-,b-:b

a,-a:-

a,a

a+a:aa,b

a+b:b

b,-

b:-

b,a

b+a:a

b,bb+b:b

a-a:-

a-a+a:aa-a+b:b

a-b:-

a-b+a:a

a-b+b:b

b-a:-b-a+a:a

b-a+b:b

b-b:-

b-b+a:a

b-b+b:b

128 / 163

fsu-logo

Speech Recognition

Sanity Check Construction

Construct a chain automata from the string ababb

Compose with CD transducer

The resulting WFST maps from CI to CD phones

0 1a:- 2b:a+b 3a:a-b+a 4b:b-a+b 5b:a-b+b 6-:b-b

Additionally check a monophone a

0 1a:- 2-:a

129 / 163

fsu-logo

Speech Recognition

Deterministic Context Dependency Construction withAuxiliary Symbols

Need to deal with the additional auxiliary phones

At each state in CD network add self loops that pass theauxiliary symbols through to the next level

Minimal example on the next slide

After performing the final set of optimizations replace theauxiliary symbols with epsilon symbols

130 / 163

fsu-logo

Speech Recognition

Deterministic Context Dependency Construction withAuxiliary Symbols

-,-

#2:#2#1:#1

-,a

-:a

-,b

-:b

#2:#2#1:#1

a,-

a:-

a,a

a+a:a

a,ba+b:b

#2:#2#1:#1

b,-

b:-

b,a

b+a:a

b,b

b+b:b

#1:#1#2:#2

a-a:-

a-a+a:a#2:#2#1:#1

a-a+b:b

#2:#2#1:#1

a-b:-

a-b+a:a

a-b+b:b

#2:#2#1:#1

b-a:-b-a+a:a

b-a+b:b#2:#2#1:#1

b-b:-

b-b+a:a

b-b+b:b#2:#2#1:#1

131 / 163

fsu-logo

Speech Recognition

Walkthrough CLG

WFST States ArcsG 299634 1422541L 401704 466703LG 1182138 2383541det(LG) 1410016 2613069C det(LG) 1466022 2738532det ( C det (LG)) 1466022 2738532

132 / 163

fsu-logo

Speech Recognition

Final Steps

Final processing step before recognition

Remove auxiliary symbols

Substitute logical Tri–phones with physical Tri–phones

133 / 163

fsu-logo

Speech Recognition

Walkthrough πCLG

WFST States Arcs Input Eps Output EpsG 299634 1422541 199631 199631L 401704 466703 65001 401703LG 1182138 2383541 404840 1182135det(LG) 1410016 2613069 434721 1418229C det(LG) 1466022 2738532 444095 1511787det ( C det (LG)) 1466022 2738532 444095 1511787π det ( C det (LG)) 1466022 2738532 692946 1511787

134 / 163

fsu-logo

Speech Recognition

CLG Illustrations

Toy Language model G

0 1危険棄権 2

見回し履歴

135 / 163

fsu-logo

Speech Recognition

CLG Illustrations

L ◦ G

0 1-:- 2k:危険

21k:棄権

3i:-

22i:-4k:- 5e:- 6N:- 7#10000:-

8-:- 9

m:見回し

15r:履歴

10i:-

16e:-11m:- 12a::- 13sh:- 14i:-

17r:- 18i:-19k:- 20i:-

23k:- 24e:- 25N:- 26#10002:- -:-

136 / 163

fsu-logo

Speech Recognition

CLG Illustrations

det(L ◦ G )

0 1-:-/0 2k:-/-0.69 3i:-/1.904 4k:-/1.904 5e:-/1.904 6N:-/1.904 7#10000:危険/0.693

8#10002:棄権/0.693 9

-:-/0

-:-/010

m:見回し/0

11r:履歴/0

12i:-/0

13e:-/014m:-/0

15r:-/0

16a::-/0

17i:-/0

18sh:-/0

19k:-/0

20/0i:-/0

21/0i:-/0

137 / 163

fsu-logo

Speech Recognition

CLG Illustrations

C ◦ det(L ◦ G )

0 1-:-/0 2-:-/-0.69 3k+i:-/1.904 4k-i+k:-/1.904 5i-k+e:-/1.904 6k-e+N:-/1.904 7#77659:危険/0.693

23#77661:棄権/0.693 8

-:-/0

-:-/09

e-N+m:見回し/0

16e-N+r:履歴/0

10N-m+i:-/0

17N-r+e:-/011m-i+m:-/0 12i-m+a::-/0 13m-a:+sh:-/0 14a:-sh+i:-/0 15/0sh-i:-/0

18r-e+r:-/0 19e-r+i:-/0 20r-i+k:-/021

i-k+i:-/022/0

k-i:-/0

138 / 163

fsu-logo

Speech Recognition

Recognition Performance

68

70

72

74

76

78

80

82

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Acc

urac

y

RTF

det(Cdet(LG))

139 / 163

fsu-logo

Speech Recognition

Comparison of a few Construction Techniques

WFST States Arcs

π det ( C det (LG)) 1466022 2738532Tropical π det ( C det (LG)) 1466022 2738532π min ( det ( C det (LG)) ) 1397962 2643926CLG 2036680 30169707dmake 1205482 2614254

140 / 163

fsu-logo

Speech Recognition

Performance with no Optimizations

20

30

40

50

60

70

80

90

0 1 2 3 4 5 6 7 8 9 10

Acc

urac

y

RTF

det(Cdet(LG))CLG

141 / 163

fsu-logo

Speech Recognition

Construction Semiring

Mohri[15] doesn’t mention the construction semiring

Allauzen[1] used log semiring for composition anddeterminization

The example presented here used the log semiring throughoutfor composition and determinization

Tropical semiring can be slightly faster but accuracy drops fornarrow beams

142 / 163

fsu-logo

Speech Recognition

Construction in the Log or Tropical Semirings

55

60

65

70

75

80

85

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Acc

urac

y

RTF

det(Cdet(LG)) - logdet(Cdet(LG)) - tropical

143 / 163

fsu-logo

Speech Recognition

Comparison with OpenFst

Using the tropical semiring

The network size is identical on either toolkit

55

60

65

70

75

80

85

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Acc

urac

y

RTF

det(Cdet(LG)) - tropical - OpenFstdet(Cdet(LG)) - tropical - ATT

144 / 163

fsu-logo

Speech Recognition

Construction Summary

Presented a construction walkthrough

Simple to implementObtains good performance on a real LVCSR setup

Many many other permutations and combinations

145 / 163

fsu-logo

Speech Recognition

Extensions

On-the-fly

The component WFSTs are composed dynamically as neededby the decoder during searchMore sophisticated technique exists to make this faster [16, 17]

Compose in the acoustic models

In this case add in the H machines

Apply factorization to further reduce WFST size

Take the composition a step further, compose with theacoustic observation O to get O ◦ H ◦ C ◦ L ◦ G and useshortest-path

This would not be practical for large systemsHowever the combination of Composition and Shortestpathcan be used to build many useful applications for speech andlangugage processing

146 / 163

fsu-logo

Speech Recognition

POS Tagger Using Component Networks and On-the-flyComposition

Just like in the speech recognition walk through

Train an n–gram model

Build a lexicon that convert words to tag–words pairs

Compose and determinize

Construct and input transducer from the input

Compose and run to shortest path

Possible to do the composition of the component networkson-the-flyRuntime system will use three WFSTs:

The input sequenceOptimized lexicon – Closed and determinizedLanguage model

Extend to On-the-fly and only expand the parts of thecomposed machine as needed by the shortest-path algorithm

147 / 163

fsu-logo

Speech Recognition

Construct Chain Automata

This is our input sequence

0 1<s> 2日 3本 4語 5が 6難 7し 8い 9</s>

148 / 163

fsu-logo

Speech Recognition

Compose with Lexicon

Compose with looped lexicon

This gives a lattice of possible constructions of the inputsequence

0 1-:- 2<s>:<s> 3-:- 4日:-

5本:-

6-:日+名詞+び

7-:日+名詞+にち

8-:日+名詞+ひ

9-:日本+名詞+にっぽん

10-:--:-

-:-

11-:-

12本:-

13語:--:本+接頭詞+ほん

14-:本+名詞+ほん

15-:本+名詞+もと

16-:語+名詞+ご

17-:語+名詞+かたり-:-

-:-

18-:-

-:- 19が:-

20-:が+助詞+が

21-:が+接続詞+が

22

-:が+動詞+が23

-:-

-:--:-

24難:-

25

し:-

26-:難+名詞+なん

27-:難+形容詞+かた

28

-:難+接頭詞+なん

29

-:難+形容詞+がた

30

い:難しい+形容詞+むずかしい

31-:難し+形容詞+むずかし

32-:難し+形容詞+がたし

33

-:-

-:-

-:-

-:-

34

-:-

35

-:-

-:-

36し:-

37</s>:</s>38い:-

-:し+助詞+し

39い:-

40-:し+動詞+し

41-:し+助動詞+し

42

-:い+動詞+い43-:い+名詞+い

-:しい+動詞+しい

-:-

-:-

-:--:-

149 / 163

fsu-logo

Speech Recognition

Compose with Language Model

Score all the paths in the lattice by composing with thelanguage model

Easy to see many different types of language model can byused

Multiple paths due to epsilon back-offs

0 1-:- 2<s>:<s> 3-:- 4日:-

5

-:-

6本:-

7-:-/2.081

8-:日+名詞+にち/9.2194

9-:日+名詞+ひ/9.6789

10-:-

-:日+名詞+にち/8.0191

-:日+名詞+ひ/8.497711

-:日+名詞+び/9.4405

12-:-

13-:-

14-:-/2.081 15

-:日本+名詞+にっぽん/6.0293

16-:-

17本:-

18本:-

-:日本+名詞+にっぽん/6.8999 19-:-

20本:-

21

-:-/1.7636

-:-/1.9319

22語:-

-:-/1.8574

23-:本+接頭詞+ほん/8.2844

24-:本+名詞+ほん/8.1731

25

-:本+名詞+もと/15.795

26

-:-/3.0529 27-:語+名詞+ご/1.293

28-:-

29-:-

30-:-

-:語+名詞+ご/7.0526

31-:語+名詞+かたり/8.7972

32-:-

33語:-

34語:-

35語:-

36-:-

37が:-

-:-/3.0905

-:-/1.9287

-:-/0.44796

38が:-

39

-:-/2.40340

-:が+助詞+が/3.1075

-:-/2.6126

-:が+助詞+が/2.4758-:が+助詞+が/3.6953

41

-:が+接続詞+が/9.176142-:が+動詞+が/15.102

43-:-

44-:-

45-:-

46難:-

47難:-

48難:-

49

-:-/2.3763

50し:-

51

-:難+名詞+なん/11.889

-:-/1.0882

52し:-

-:-/0.43021

53し:- -:難+名詞+なん/12.106

54-:難+形容詞+かた/14.696

55

-:難+接頭詞+なん/15.102

56

-:難+形容詞+がた/15.795

57

-:-/2.376358

い:難しい+形容詞+むずかしい/7.5998

59-:-

-:-/1.0882

-:-/0.43021

60-:-

61-:-

62-:-

い:難しい+形容詞+むずかしい/9.0333

63-:難し+形容詞+むずかし/12.011

64-:難し+形容詞+がたし/15.102 65

-:-

66し:-

67し:-

68し:-

69し:-

70-:-

71-:-

72

-:-/2.0334

73

</s>:</s>/3.2352

74

-:-/0.92999

75

い:-

-:-/0.7591

76

い:-

-:-/0.96042

77い:-

-:-/0.44738

78い:-

79い:-

80い:-

</s>:</s>/2.937981

-:し+助詞+し/7.1109

82-:し+動詞+し/4.303

83

-:し+助動詞+し/8.9489

84

-:-/0.92999

-:-/0.7591

-:-/0.96042

-:-/0.44738

85

-:-/2.7129

-:-/0.19807

86-:-

87-:-

88-:-

89-:しい+動詞+しい/13.492

90-:い+動詞+い/5.4038

91

-:い+名詞+い/8.482992

い:-

93い:-

94い:-

95-:-

96-:-

97-:-

-:-/1.4471

-:い+動詞+い/5.3166

-:-/3.6442

-:い+動詞+い/7.5406

-:-/0.5648

-:い+名詞+い/6.0635 -:-/0.39499

</s>:</s>/1.5202

-:-/2.0396

</s>:</s>/4.1664-:-/0.66302

</s>:</s>/2.0186

150 / 163

fsu-logo

Speech Recognition

Further Processing

The optimizations have removed the synchronization betweeninput and outputs

Just want the output sequences

Project output and remove epsilons

Determinize and use shortestpath to obtain n–best lists

151 / 163

fsu-logo

Speech Recognition

Project Output

To convert to an automata representing word sequences

0 1-:- 2<s>:<s> 3-:- 4-:-

5

-:-

6-:-

7-:-/2.081

8日+名詞+にち:日+名詞+にち/9.2194

9日+名詞+ひ:日+名詞+ひ/9.6789

10-:-

日+名詞+にち:日+名詞+にち/8.0191

日+名詞+ひ:日+名詞+ひ/8.497711

日+名詞+び:日+名詞+び/9.4405

12-:-

13-:-

14-:-/2.081 15

日本+名詞+にっぽん:日本+名詞+にっぽん/6.0293

16-:-

17-:-

18-:-

日本+名詞+にっぽん:日本+名詞+にっぽん/6.8999 19-:-

20-:-

21

-:-/1.7636

-:-/1.9319

22-:-

-:-/1.8574

23本+接頭詞+ほん:本+接頭詞+ほん/8.2844

24本+名詞+ほん:本+名詞+ほん/8.1731

25

本+名詞+もと:本+名詞+もと/15.795

26

-:-/3.0529 27語+名詞+ご:語+名詞+ご/1.293

28-:-

29-:-

30-:-

語+名詞+ご:語+名詞+ご/7.0526

31語+名詞+かたり:語+名詞+かたり/8.7972

32-:-

33-:-

34-:-

35-:-

36-:-

37-:-

-:-/3.0905

-:-/1.9287-:-/0.44796

38-:-

39

-:-/2.40340

が+助詞+が:が+助詞+が/3.1075

-:-/2.6126



41

が+接続詞+が:が+接続詞+が/9.176142が+動詞+が:が+動詞+が/15.102

43-:-

44-:-

45-:-

46-:-

47-:-

48-:-

49

-:-/2.3763

50-:-

51

難+名詞+なん:難+名詞+なん/11.889

-:-/1.0882

52-:-

-:-/0.43021

53-:- 難+名詞+なん:難+名詞+なん/12.106

54難+形容詞+かた:難+形容詞+かた/14.696

55

難+接頭詞+なん:難+接頭詞+なん/15.102

56

難+形容詞+がた:難+形容詞+がた/15.795

57

-:-/2.376358

難しい+形容詞+むずかしい:難しい+形容詞+むずかしい/7.5998

59-:-

-:-/1.0882

-:-/0.43021

60-:-

61-:-

62-:-

難しい+形容詞+むずかしい:難しい+形容詞+むずかしい/9.0333

63難し+形容詞+むずかし:難し+形容詞+むずかし/12.011

64難し+形容詞+がたし:難し+形容詞+がたし/15.10265

-:-

66-:-

67-:-

68-:-

69-:-

70-:-

71-:-

72

-:-/2.0334

73

</s>:</s>/3.2352

74

-:-/0.92999

75

-:-

-:-/0.7591

76

-:-

-:-/0.96042

77-:-

-:-/0.44738

78-:-

79-:-

80-:-

</s>:</s>/2.937981

し+助詞+し:し+助詞+し/7.1109

82し+動詞+し:し+動詞+し/4.303

83

し+助動詞+し:し+助動詞+し/8.9489

84

-:-/0.92999

-:-/0.7591

-:-/0.96042

-:-/0.44738

85

-:-/2.7129

-:-/0.19807

86-:-

87-:-

88-:-

89しい+動詞+しい:しい+動詞+しい/13.492

90い+動詞+い:い+動詞+い/5.4038

91

い+名詞+い:い+名詞+い/8.482992

-:-

93-:-

94-:-

95-:-

96-:-

97-:-

-:-/1.4471

い+動詞+い:い+動詞+い/5.3166

-:-/3.6442

い+動詞+い:い+動詞+い/7.5406

-:-/0.5648

い+名詞+い:い+名詞+い/6.0635 -:-/0.39499

</s>:</s>/1.5202

-:-/2.0396

</s>:</s>/4.1664-:-/0.66302

</s>:</s>/2.0186

152 / 163

fsu-logo

Speech Recognition

Remove Epsilons

To give a compact automata

0 1<s>

2日+名詞+にち/9.2194

3日+名詞+ひ/9.6789

4日+名詞+び/11.521

5日本+名詞+にっぽん/6.0293

6

本+接頭詞+ほん/10.048

7

本+名詞+ほん/9.9367

8

本+名詞+もと/17.558


本+名詞+ほん/10.105

本+名詞+もと/17.727


本+名詞+ほん/10.031

本+名詞+もと/17.652

10

語+名詞+かたり/11.85

9

語+名詞+ご/1.293


語+名詞+ご/10.143


語+名詞+ご/8.9813


語+名詞+ご/7.5005

12

が+接続詞+が/11.789

13が+動詞+が/17.714

11

が+助詞+が/2.4758

が+接続詞+が/11.579

が+動詞+が/17.505

が+助詞+が/3.1075

15

難+形容詞+かた/15.784

16

難+接頭詞+なん/16.19

17

難+形容詞+がた/16.883

18

難しい+形容詞+むずかしい/10.121

20

難し+形容詞+がたし/16.19

19

難し+形容詞+むずかし/13.099

14

難+名詞+なん/13.194







難+名詞+なん/12.536







難+名詞+なん/11.889

22

し+助詞+し/7.87 23

し+動詞+し/5.0621

24

し+助動詞+し/9.708

25

しい+動詞+しい/14.251

し+助詞+し/8.0714

し+動詞+し/5.2634

し+助動詞+し/9.9094


し+助詞+し/7.5583

し+動詞+し/4.7504

し+助動詞+し/9.3963


21

</s>/3.2352

26い+動詞+い/5.6019

27

い+名詞+い/8.681

い+動詞+い/8.1168

い+名詞+い/11.196

し+助詞+し/8.0409

し+動詞+し/5.233

し+助動詞+し/9.8789


い+動詞+い/5.3166

い+名詞+い/9.9301

い+動詞+い/7.5406

い+名詞+い/12.127

い+動詞+い/5.9686

い+名詞+い/6.0635

</s>/1.5202

</s>/4.1664

</s>/2.0186

153 / 163

fsu-logo

Speech Recognition

Determinize

To share common prefixes and ensure n–best lists are unique

0 1<s>

2日本+名詞+にっぽん/6.0293

3

日+名詞+び/11.5214

日+名詞+にち/9.2194

5日+名詞+ひ/9.67896

語+名詞+ご/1.293

7


8


9

本+名詞+ほん/10.031

10

本+名詞+もと/17.652


本+名詞+ほん/9.9367

本+名詞+もと/17.558


本+名詞+ほん/10.105

本+名詞+もと/17.727

11が+助詞+が/3.1075

12

が+接続詞+が/11.579

13

が+動詞+が/17.505

が+助詞+が/2.4758

が+接続詞+が/11.789

が+動詞+が/17.714語+名詞+ご/10.143


語+名詞+ご/8.9813


語+名詞+ご/7.5005


14


15


16

難+名詞+なん/11.889

17


18


19


20




難+名詞+なん/13.194


難+形容詞+かた/15.784難+接頭詞+なん/16.19




難+名詞+なん/12.536





21

</s>/3.2352

22

い+動詞+い/8.1168

23

い+名詞+い/11.196

24

し+助詞+し/8.0409

25

し+動詞+し/5.233

26

し+助動詞+し/9.8789

27


い+動詞+い/5.6019

い+名詞+い/8.681

し+助詞+し/7.87

し+動詞+し/5.0621

し+助動詞+し/9.708


し+助詞+し/8.0714

し+動詞+し/5.2634

し+助動詞+し/9.9094


し+助詞+し/7.5583

し+動詞+し/4.7504

し+助動詞+し/9.3963


</s>/4.1664

</s>/2.0186

い+動詞+い/5.3166

い+名詞+い/9.9301

い+動詞+い/7.5406

い+名詞+い/12.127

い+動詞+い/5.9686

い+名詞+い/6.0635

</s>/1.5202

154 / 163

fsu-logo

Speech Recognition

NBest Lists

To get n–best paths run shortestpath

There are n paths in the final machine

0

13-

16-

20

-

26

-

30-

12-

15-

19-

25-

29-

12 -3 </s>/1.5202

4い+動詞+い/7.9631

5

難しい+形容詞+むずかしい/9.31466 が+助詞+が/2.4758

7が+接続詞+が/14.101

8 が+動詞+が/19.368

9 語+名詞+ご/1.9249

10 語+名詞+かたり/9.2452

11 日本+名詞+にっぽん/6.0293<s>

14 日本+名詞+にっぽん/8.6338<s>

17語+名詞+ご/1.293

18 日本+名詞+にっぽん/6.0293<s>

21難し+形容詞+むずかし/15.241

22

が+助詞+が/4.4221

23語+名詞+ご/1.9249

24日本+名詞+にっぽん/6.0293<s>

27 語+名詞+ご/1.29328 日本+名詞+にっぽん/6.0293<s>

155 / 163

fsu-logo

Speech Recognition

Advantages

We have demonstrated how useful application can be rapidlydeveloped using WFSTs

Much easier to extend – for example changing the languagemodel in Chasen would be difficult but trivial in this previousexample

One framework and toolkit for nearly everything – not a mixof domain specific tools

156 / 163

fsu-logo

Speech Recognition

Summary of Section

Detailed description of the construction of an integratedspeech recognition WFST

Conversion of the knowledge sources to WFSTsRules of thumbOptimizationEvaluation on a large vocabulary task

Applying a similar WFST framework to POS tagging

157 / 163

fsu-logo

Speech Recognition

Thank You For Attending

Questions?

158 / 163

fsu-logo

References

References I

[1] C. Allauzen, M. Mohri, M. Riley, and B. Roark. A generalizedconstruction of integrated speech recognition transducers. InProc ICASSP, 2004.

[2] C. Allauzen, M Mohri, and B. Roark. Generalized algorithmsfor constructing statistical language models. In Proc. of 41stAnnual Meeting of the Assocication for ComputationalLinguistics, pages 40–47, 2003.

[3] C. Allauzen, M Riley, J. Schalkwyk, W. Skut, and M. Mohri.OpenFst: A general and efficient weighted finite-statetransducer library. In Proc. of the Ninth InternationalConference on Implementation and Application of Automata,(CIAA 2007), pages 11–23, 2007.

159 / 163

fsu-logo

References

References II

[4] C. M. Allauzen, M. Jansche, and M. Riley. OpenFst: Anopen-source, weighted finite-state transducer library and itsapplications to speech and language. In Tutorial, 2009.

[5] Diamantino Caseiro. Finite-State Methods in AutomaticSpeech Recognition. PhD thesis, Universidade Tecnica deLisboa, 2003.

[6] P. R. Dixon, D. A. Caseiro, T. Oonishi, and S. Furui. TheTitech large vocabulary WFST speech recognition system. InProc. ASRU, pages 1301–1304, 2007.

[7] P. R. Dixon, T. Oonishi, and S. Furui. Fast acousticcomputations using graphics processors. In Proc. ICASSP,pages 4321–4324, 2009.

160 / 163

fsu-logo

References

References III

[8] Paul R. Dixon, Tasuku Oonishi, and Sadaoki Furui.Harnessing graphics processors for the fast computation ofacoustic likelihoods in speech recognition. Comput. SpeechLang., 23(4):510–526, 2009.

[9] Bo-June Hsu and James Glass. Iterative language modelestimation: efficient data structure & algorithms. In Proc.INTERSPEECH, pages 841–844, 2008.

[10] S. Kanthak, H. Ney, M. Riley, and M. Mohri. A comparison oftwo LVR search optimization techniques. In Proc. ICSLP,pages 1309–1312, 2002.

[11] F. Mohri, M .and Pereira, O. Pereira, and M. Riley. Thedesign principles of a weighted finite-state transducer library.Theoretical Computer Science, 231:17–32, 2000.

[12] M. Mohri. Speech recognition lecture notes.http://www.cs.nyu.edu/~mohri/asr07/, 2007.

161 / 163

http://www.cs.nyu.edu/~mohri/asr07/

fsu-logo

References

References IV

[13] M. Mohri. Weighted automata algorithms. Handbook ofweighted automata, 2009.

[14] M. Mohri, F. Pereira, and M. Riley. Weighted finite-statetransducers in speech recognition. Computer Speech andLanguage, 16(1):69–88, 2002.

[15] M. Mohri, F. C. N Pereira, and M. Riley. Speech recognitionwith weighted finite-state transducers. Handbook on SpeechProcessing and Speech Communication, 2008.

[16] T. Oonishi, P. R. Dixon, K. Iwano, and S. Furui.Implementation and evaluation of fast on-the-fly WFSTcomposition algorithms. In Proc. INTERSPEECH, pages2110–2113, 2008.

[17] T. Oonishi, P. R. Dixon, K. Iwano, and S. Furui.Generalization of specialized on-the-fly composition. In Proc.ICASSP, pages 4317–4320, 2009.

162 / 163

fsu-logo

References

References V

[18] M Riley and C. Allauzen. Openfst source code.http://openfst.org/, 2007.

[19] Murat Saralar and Brian Roark. Utterance classification withdiscriminative language modeling. Speech Communication,48(3-4):276 – 287, 2006. Spoken Language Understanding inConversational Systems.

[20] Han Shu and Lee Hetherington. Training of finite-statetransducers and its application to pronunciation modeling.2002.

[21] http://openfst.org/. Openfst.

163 / 163

http://openfst.org/

http://openfst.org/

Documents

Introduction to the use of WFSTs in Speech and Language ... · Covers WFSTs and the algorithms used frequently in speech recognition Sample applications using WFSTs Edit distance