Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
fsu-logo
Introduction Outline
Introduction to the use of WFSTs in Speech andLanguage processing
Paul Dixon and Sadaoki Furui
Department of Computer ScienceTokyo Institute of Technology
APSIPA Conference, 2009
1 / 163
fsu-logo
Introduction Outline
Outline of Tutorial
Introduction
MotivationDescription of some popular toolkits
Introduction to WFSTs and algorithms
Covers WFSTs and the algorithms used frequently in speechrecognition
Sample applications using WFSTs
Edit distanceImplementing a counting WFST using a custom semiring
Recognizing speech with WFSTs
Detailed walkthrough describing the construction of anintegrated speech recognition WFSTPart-of-speech tagging using WFST framework
2 / 163
fsu-logo
Introduction Outline
Acknowledgements
Tasuku Oonishi also at Titech working on the T 3 decoder
WFST introduction section heavily influenced by the followingtutorials and papers [15, 13, 4]
3 / 163
fsu-logo
Introduction Outline
Introduction
4 / 163
fsu-logo
Introduction Outline
Outline of Section
Motivation
Brief survey of popular WFST toolkits
5 / 163
fsu-logo
Introduction Outline
Why use WFSTs?
Economics, time to development, testing and so on
Understanding, go inside look and understand the structuresand operations
Tools for rendering and investigating the internals of WFST -useful when performing analysis and postmortems
Reduce dependency on domain specific toolkits
Write and control all own tools based on WFSTs
6 / 163
fsu-logo
Introduction Outline
Motivation for WFST in Speech
Unified representation of knowledge source
Pre/Post processing stage and even support tools can makeuse of WFST framework
WFST Speech Decoders can be easier to design andimplement
WFST Speech Decoders can be faster [10]
WFST Speech Decoders can be more flexible
7 / 163
fsu-logo
Introduction Outline
Getting Started
Get a WFST toolkit
Many toolkits available, the following are widely used in thespeech community
ATT FSM Libraryhttp://www.research.att.com/~fsmtools/fsm/OpenFst http://www.openfst.org/MIT Fst tools http://people.csail.mit.edu/ilh/fst/FSA toolkit http://www-i6.informatik.rwth-aachen.de/
~kanthak/fsa.html
Start experimenting with tools
8 / 163
fsu-logo
Introduction Outline
More Toolkit Details
OpenFst - Open source toolkit - heavy c++ template code,library or shell commands
MIT FST Toolkit - Open Source
OpenFst and ATT use same text representation conversion toand from MIT trivial. Windows and Linux
9 / 163
fsu-logo
Introduction Outline
ATT FSM toolkit
Closed source
Binaries for (Windows, *nix x86 and amd64 and some bigendian)
Shell commands free for non-commercial use
Companion toolkits for language modeling (grm) anddecoding tools (dcd) (both with more limited platformsupport)
Perhaps the easiest way to get up a WFST speech systemrunning by using the grm and dcd toolkits
Hasn’t been updated for a long time and the Windows versionis lagging a version
10 / 163
fsu-logo
Introduction Outline
OpenFst
Same authors as ATT toolkit
Very Similar feature set to ATT tools
Open source license (Apache 2)
Shell commands or C++ interfaces
Heavy template codeBuild times can be slow
Allows for the creation of user types and semirings
For example ad custom arc with acoustic and language modelscoresAdd the real or expectation semiringsPlug in a custom fst type
No shorthand command line options
Well documented
11 / 163
fsu-logo
Introduction Outline
MIT Fst
Open source license (MIT)
Windows and *nix platforms
Has some extra interesting tools in addition to the standardWFST operations
C++ or shell commands
12 / 163
fsu-logo
Introduction Outline
FSA
Source code is available (see usage restrictions)
Part of the RWTH recognition system
13 / 163
fsu-logo
Weighted Finite State Transducers
Introduction to WFSTs andAlgorithms
14 / 163
fsu-logo
Weighted Finite State Transducers
WFST Operations Covered
Rational OperationsSum (Union)Product (Concatenation)Closure (Kleene Closure)
Unary OperationsReversalInversionProjection
Binary OperationsComposition
Optimization AlgorithmsDeterminizationMinimizationEpsilon RemovalWeight Pushing
Other Commands and utilitiesCompiling, Printing, Drawing and MapArcSort, Shortestpath, Shortertdistance (all by example)
15 / 163
fsu-logo
Weighted Finite State Transducers
WFST Operations not Covered
Intersection
Equivalence
Label pushing
Encode/Decode
Difference
Topological Sort
Epsilon Normalization
Equivalent
Synchronization
Prune
A few others ....
16 / 163
fsu-logo
Weighted Finite State Transducers
Formal Definitions
Formally a weighted transducer T of semiring S is defined as the 8tuple [13, 15]:
T = (Σ,∆,Q, I ,F ,E , λ, ρ) (1)
Where:
Σ is a finite input alphabet
∆ is a finite output alphabet
Q is a finite set of states
I ⊆ Q is the set of initial states
F ⊆ Q is the set of final states
E ⊆ Q × (Σ ∪ {ε})× (∆ ∪ {ε}) is a finite set of transitions
λ is an initial weight function
ρ is the final weight function
17 / 163
fsu-logo
Weighted Finite State Transducers
Semirings
A semiring is a system with two operators ⊕ and ⊗, the tablebelow shows some example semiring that are often used in WFSTtool-kits [13, 15]
Semiring Set ⊕ ⊗ 0 1
Boolean {0, 1} ∨ ∧ 0 1Probability R + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R ∪ {−∞,+∞} min + +∞ 0
Where ∨ logical or and ∧ logical and ⊕log isx ⊕log y = −log (e−x + e−y )
Only the operations needed to:
Compute the weight of a path using the ⊗ operatorCompute the cost of sequence by summing over all the pathsusing the ⊕ operator
18 / 163
fsu-logo
Weighted Finite State Transducers
Semirings
Boolean semiring
Un-weighted machines are equivalent to weighted machinesover the Boolean semiring
Real (Probability) semiring
The weights represent real numbers (or probabilities)
Log semiring
Weights are negative logs of the probabilities
Tropical semiring
Same as log semiring with the ⊕ replaced with the Viterbiapproximation.
Log and tropical semirings have the conventions exp−∞ = 0and − log(0) =∞ [2]
19 / 163
fsu-logo
Weighted Finite State Transducers
Glossary or WFST related terms
Σ∗ (Sigma star) – is the Kleene closure of an alphabet Σ andrepresents the set of all finite strings over the alphabet Σ
Simple path – is a path with no cycle
A semiring is idempotent if x ⊕ x = x for all x ∈ S (Booleanand tropical semirings) [13]
Commutative – ”A commutative semiring has a commutativemultiplicative operator ⊗” [13]
Idempotent – A semiring is idempotent if x ⊕ x = xforallx ∈ S(quote) [13]
Mirror Image – ”The mirror image of a stringx = x1, x2, . . . , xn is the string xR = xn, xn−1, . . . , x1” (quote)[13]
20 / 163
fsu-logo
Weighted Finite State Transducers
Glossary Continued
Deterministic – Has one initial state and at any state no morethan one outgoing arc with any input label
Sequential – Are deterministic on one side
Sub-sequential – Have an additional string at the final state
p-Sub-sequential – Have an p additional strings at the finalstate
Stochastic – The sum of arc weights leaving a state is one
Trim – Trim Transducers have no useless states
21 / 163
fsu-logo
Weighted Finite State Transducers
Glossary Continued
Commutative – a⊗ (b⊕ c) = a⊗ b + a⊗ c = (b⊕ c)⊗ a [18]
Regulated – A sum does not depend on the order of the terms
Unambiguous – For any string x ∈ Σ∗ it has oneaccepting/sucessful path at most
Cycle – A cycle is a path where the initial and states are thesame
ε–cycle – Is a cycle where the input and output labels areepsilons
Successful – Path is a path from the initial to final states
non-accessible – Not reachable from an initial state
non-co-accessible – Not reachable from a final state
22 / 163
fsu-logo
Weighted Finite State Transducers
Transitions
A transition e from the set transitions E is described using thefollowing [2]
0 1/0ip:op/0.5
p[e] denotes the origin state (0 in this example)
n[e] denotes the next state (1 in this example)
i [e] denotes the input label (ip in this example)
o[e] denotes the output label (op in this example)
w [e] denotes the weight (0.5 in this example)
23 / 163
fsu-logo
Weighted Finite State Transducers
Path
A path π is constructed from a sequence of transitions through themachine according to π = e1, . . . , ek
A path π is a member of E∗ (The Kleene closure of E) [13]
p[π] denotes the origin state [13]
n[π] denotes the next state [13]
A cycle is a path where p[π] = n[π] [13]
A path weight in the Log or Tropical semirings is identicalbecause the multiplication operation is identical [2]
w [π] is the weight of the path obtained by using the ⊗ to theconsecutive path weights [13]
P(Q1,Q2) denotes all the paths from state Q1 to state Q2 [13]
24 / 163
fsu-logo
Weighted Finite State Transducers
Weighted Acceptors
In an Acceptor A each transition has an input label
In the weighted case the transitions can be augmented with anoptional weight
P(Q1, x ,Q2) denotes all the paths from state Q1 to state Q2
that accept the string x [13]
JAK (x) denotes the sum of all paths with input sequence x[13]
JAK (x) =⊕
π ∈ P(I , x ,F )λ(p[π])⊗ w [π]⊗ ρ(n[π]) [13]
Used to recognize string
25 / 163
fsu-logo
Weighted Finite State Transducers
Acceptor Example
0
1a/1
b/4
2
a/3
b/1
3/1
b/1
b/3
b/3
b/5
In the real semiring
JAK (a, b) = (1× 1× 1) + (3× 3× 1) = 10
26 / 163
fsu-logo
Weighted Finite State Transducers
Weighted Transducers
In weighted transducers the arcs have input and output labelswith optional weight [13]
P(Q1, x , , yQ2) denotes all the paths from state Q1 to stateQ2 that transduces the string x to the string y [13]
JT K (x , y) denotes the sum of all paths with input sequence x[13]
JT K (x , y) =⊕
π ∈ P(I , x , y ,F )λ(p[π])⊗ w [π]⊗ ρ(n[π]) [4, 13]
Used to perform a mapping between strings with a weight [13]
27 / 163
fsu-logo
Weighted Finite State Transducers
Transducer Example
0
2a:Z/1
b:X/4
1/1
a:X/3
b:Z/1
3/1
b:X/1
b:Z/3
b:Z/3
b:X/5
In the real semiring
JAK (bb,XZ ) = 4× 3× 1 = 10
28 / 163
fsu-logo
Weighted Finite State Transducers
Formats, compiling and drawing
When using the tool-kits several levels of representation
Text level - text files describing the WFST topology andsymbolsBinary level - compiled binary format from the compilation toolGraph level - graphical representation can be one of the manyimage formats generated from graphviz
Introduce the representations
Text level WFSTs need to be compiled before they can beused
Print - to recover the text level
Drawing - to visualize the WFST
29 / 163
fsu-logo
Weighted Finite State Transducers
Compiling FSTs
For OpenFst and ATT FSM Toolkit text format is 3,4 or 5columns
Acceptors have 3 columns optional 4th column the weightsTransducers have 4 columns with optional 5th columnrepresenting the weights
ATT fsmcompile defaults to acceptor format – use -t tospecify transducer
OpenFst fstcompile defaults to transducer format – use-acceptor=true to specify acceptor
Can use a transducer to emulate an acceptor by settings theinput and output labels the same for each arc
Final states single entry on own line with an optional final cost
30 / 163
fsu-logo
Weighted Finite State Transducers
Simple Acceptor Example
Un-weighted acceptordefinition
0 1 Tokyo0 1 Kyoto0 1 Yokohama0 1 Nagoya0 1 Osaka1
Corresponding symbols
Tokyo 1Kyoto 2Yokohama 3Nagoya 4Osaka 5
31 / 163
fsu-logo
Weighted Finite State Transducers
Simple Transducer Example
Weighted transducerdefinition
0 2 a Z 10 2 b X 40 1 a X 30 1 b Z 12 3 b X 12 3 b Z 31 3 b Z 31 3 b X 53
Corresponding input symbols
– 0a 1b 2
Corresponding output symbols
– 0X 1Z 2
32 / 163
fsu-logo
Weighted Finite State Transducers
Printing and Drawing WFSTs
Printing WFST
Can recover the text format from a binary format
Drawing WFSTs
Very useful for understandingDebugging and performing post postmortemsUse GraphvizViewing large graphs is a problem
33 / 163
fsu-logo
Weighted Finite State Transducers
Graphical Representation
0
2a:Z/1
b:X/4
1/1
a:X/3
b:Z/1
3/1
b:X/1
b:Z/3
b:Z/3
b:X/5
States are circles
Arcs are lines
Initial state thick line
Final state two lines on state
Each arc label is represented as in label:out label/weightin labelout labelweight when no weight is shown or given defaults to 1
34 / 163
fsu-logo
Weighted Finite State Transducers
A Few Additional Points
A final state can have outgoing arcs
An initial state can also be a final state
Double bold circle
Not possible with ATT/OpenFst but allowable in general
Non final states can have a costMachines can have more than one initial stateHowever, both of these are easy to simulate
35 / 163
fsu-logo
Weighted Finite State Transducers
Drawing WFSTs with Japanese Labels
Graphviz (dot) had some font hard wiring for postscript
Latest version works with the the pdf backend
0 1重み 2付き 3有限状態 4トランスデューサ
fstdraw -acceptor=true -isymbols=jpn.syms-portrait=true jpn.ofsa | dot -Tpdf > jpn.pdf
36 / 163
fsu-logo
Weighted Finite State Transducers
Null Transitions
Null or epsilon ( ε) transitions
Consume no input symbols when traversing
Generate no output symbols when traversing
Make certain algorithms more complicated
Why they occur:
Inserted by certain algorithmNeeded in transducers to perform one-to-many mapping,cycles...
Can always be removed from an WFA
Impossible to remove from certain WFST, one-to-manymapping, cycles...
Introduce non-determinism
37 / 163
fsu-logo
Weighted Finite State Transducers
Rational Operations
For the presented algorithms:
Useful for combining many simpler machines to create morecomplex ones
Algorithms are efficient and straightforward to implement
Admit on-the-fly construction
Introduce epsilons
38 / 163
fsu-logo
Weighted Finite State Transducers
Sum (Union)
JT 1⊕ T 2K (x , y) = JT 1K(x1, y1)⊕ JT 2K(x2, y2)
0 1s:Sapporo
2a:-
3q:-
4p:-
5o:-
6r:-
7o:-
0 1s:Sendai
2e:-
3N:-
4d:-
5a:-
6i:-
0 1k:Kumamoto
2u:-
3m:-
4a:-
5m:-
6o:-
7t:-
8o:-
fsmunion sapporo.fsm sendai.fsm kumamoto.fsm > cities.fsmfstunion sapporo.fsm sendai.fsm | fstunion - kumamoto.fsm > cities.fsm
0
1s:Sapporo
8-:-
15
-:-
2a:-
9s:Sendai
16k:Kumamoto
3q:- 4p:- 5o:- 6r:- 7o:-
10e:-
11N:-
12d:-
13a:-
14i:-
17u:-
18m:-
19a:-
20m:-
21o:-
22t:-
23o:-
39 / 163
fsu-logo
Weighted Finite State Transducers
Product (Concatenation)
JT 1⊗ T 2K (x , y) =⊕
x = x1x2y = y1y2
JT 1K(x1, y1)⊗ JT 2K(x2, y2)
0 1How
2much
3to
0 1
Tokyo
Kyoto
Yokohama
Nagoya
Osaka
fsmconcat grammar.fsm stations.fsm > gs.fsmfstconcat grammar.fsm stations.fsm > gs.fsm
0 1How
2much
3to
4-
5
Tokyo
Kyoto
Yokohama
Nagoya
Osaka
40 / 163
fsu-logo
Weighted Finite State Transducers
Closure
JT ∗K (x , y) =∞⊕
n = 0JT nK (x , y) [4]
fsmclosure cities.fsm > citiesclosed.fsmfstclosure cities.fsm > citiesclosed.fsm
0
1s:Sapporo
8-:-
15-:-
2a:-
9s:Sendai
16k:Kumamoto
3q:- 4p:-
5o:-
6r:-
7
o:--:-
10e:-
11N:-
12d:-
13a:-
14i:-
-:-
17u:-
18m:- 19a:-
20m:-
21o:-
22
t:-
23
o:-
-:-
24-:-
41 / 163
fsu-logo
Weighted Finite State Transducers
Reversal
Input is a single weighted transducer TOutput is a single reversed weighted transducer TJT K (x , y) = JT K (x , y) [4]
0 1How
2much
3to
4-
5
Tokyo
Kyoto
Yokohama
Nagoya
Osaka
(a) T
0 6-
5
Tokyo
Kyoto
Yokohama
Nagoya
Osaka
12How
3much
4to-
(b) T42 / 163
fsu-logo
Weighted Finite State Transducers
Inversion
T−1 (x , y)) = T (y , x) [13]
Input is a single weighted transducer T
Output is a single weighted transducer T−1 with the inputand output labels reversed
Example based on [13]
0
2a:Z/1
b:X/4
1/1
a:X/3
b:Z/1
3/1
b:X/1
b:Z/3
b:Z/3
b:X/5
(a) T
0
2Z:a/1
X:b/4
1/1
X:a/3
Z:b/1
3/1
X:b/1
Z:b/3
Z:b/3
X:b/5
(b) T−1
43 / 163
fsu-logo
Weighted Finite State Transducers
Projection
Input is a single weighted transducer T
Output a single acceptor with either:∏1 T – The output labels removed∏2 T – The input labels removed
0
2a:Z/1
b:X/4
1/1
a:X/3
b:Z/1
3/1
b:X/1
b:Z/3
b:Z/3
b:X/5
(a) T
0
2a/1
b/4
1/1
a/3
b/1
3/1
b/1
b/3
b/3
b/5
(b)∏
1 T
44 / 163
fsu-logo
Weighted Finite State Transducers
Composition
Very powerful operation
Allows multiple levels of information to be combined
Formal definition
JT1 ◦ T2K ((x , y) =⊕z
JT1K (x , z)⊗ JT2K (z , y)
45 / 163
fsu-logo
Weighted Finite State Transducers
Epsilon Free Composition
Given a state q1 in T 1 and a state q2 in T 2
Cycle through the arc combinations
If an output symbol matches an input symbol
Create a new arc
0 1c:a
2c:b
(a) T1
0
1a:Z
2c:Y
b:X
(b) T2
0 1c:Z
2c:X
(c) T1 ◦ T2
46 / 163
fsu-logo
Weighted Finite State Transducers
Composition with Epsilons Canonical Example
The presence of epsilons makes composition more difficult
Canonical example from Mohri [14]
0 1a:a
2b:-
3c:-
4d:d
(a) T1
0 1a:d
2-:e
3d:a
(b) T2
47 / 163
fsu-logo
Weighted Finite State Transducers
Canonical Example
Create T1 by replacing ε outputs with ε2 adding ε : ε1 selfloops to every stateCreate T2 by replacing ε inputs with ε1 adding ε2 : ε self loopsto every stateCompose use standard procedure and allow the match ε2ε1
c : ε2
ε : ε1 ε : ε1 ε : ε1
b : ε2 d : da : a
ε : ε1 ε : ε1
10 32 4
(a) T1
d : a
ε2 : ε ε2 : ε
ε1 : e
ε2 : ε
a : d
ε2 : ε
10 32
(b) T2
48 / 163
fsu-logo
Weighted Finite State Transducers
εPath Multiplicity Problem
A problem is the composed machine has multiple paths, whenonly one is needed
Therefore costs assigned to paths are wrong in certainsemiring (non-idempotent (e.g. probability,log)
c : e
ε : ec : εd : a
c : εb : ε
b : e
ε : e
a : d
ε : e b : ε
1, 10, 0
2, 1
1, 2
3, 1
2, 2
4, 33, 2
49 / 163
fsu-logo
Weighted Finite State Transducers
Canonical Example
We need to keep just one of the paths
Filter out unwanted paths
c : e
ε : ec : εd : a
c : εb : ε
b : e
ε : e
a : d
ε : e b : ε
1, 10, 0
2, 1
1, 2
3, 1
2, 2
4, 33, 2
50 / 163
fsu-logo
Weighted Finite State Transducers
Composition Filter
The solutions is to use a composition filter
This will restrict the paths that can be made in the composedmachine
This can be represented as an WFST
Where x represents any symbol
x : x
ε2 : ε2ε2 : ε2
x : x
ε1 : ε1
ε1 : ε1
ε2 : ε1
x : x1
0
2
51 / 163
fsu-logo
Weighted Finite State Transducers
Composition using the Filter
It is now possible to perform composition of all 3 togetherusing the standard algorithm
T 1 ◦ F ◦ T 2
c : e
ε : ec : εd : a
c : εb : ε
b : e
ε : e
a : d
ε : e b : ε
1, 10, 0
2, 1
1, 2
3, 1
2, 2
4, 33, 2
52 / 163
fsu-logo
Weighted Finite State Transducers
Points to Consider for Efficient Composition
The following factors can be used to make composition moreefficient.
Sorting or index the output of T1 and the input of T2
ATT – indexed wit fsmconvertOpenFst – sort with fstarcosrt
Favour placing outputs labels early to avoid matching delaysand the generation of dead-end states
ATT FSM toolkit has indexed types that faster provide fastercomposition, for OpenFst use the arc sorting features
53 / 163
fsu-logo
Weighted Finite State Transducers
Determinization
An automaton is deterministic if it meets the followingconditions:
A single initial stateNo two arcs leaving a state with the same input labelOpenFst and ATT FSM library treat ε the same as a normalsymbol
Determinization takes a WFST and attempts to construct adeterministic equivalent
Along with composition and shortest path algorithm extremelyuseful
Huge impact on search network efficiency
Matching is much more efficientCaveats:
Doesn’t terminate in some cases or in a reasonable amount oftime in other casesNot all automata can be determinized
Toolkits are extended to deal with transducers54 / 163
fsu-logo
Weighted Finite State Transducers
Determinization Example
Example taken verbatim from Mohri [13]
0
2a/1
b/4
1/0
a/3
b/1
3/0
b/1
b/3
b/3
b/5
(a) A
0
1/2a/1
2/0
b/1 3/0
b/1
b/3
(b) A′
55 / 163
fsu-logo
Weighted Finite State Transducers
Determinization Example
Generalized versions of prefix tree
Consider the toy lexicon
0
1ky:共存
6a:あの
9g:額絵
14s:サードネス
22k:均一
28r:履歴
34m:見回し
40
r:郎
42
k:危険
47y:ゆうこ
2o::-
7n:-
10a:-
15a::-
23i:-
29e:-
35i:-
41o::-
43i:-
48u:-
3s:- 4o:-
5N:-
8o:-
11k:- 12u:- 13e:-
16d:-
17o:-
18n:- 19e:- 20
s:-21
u:-
24N:-
25i:-
26ts:- 27
u:-
30r:-
31i:-
32k:-
33i:-
36m:-
37a::-
38sh:-
39i:-
44k:- 45
e:-46
N:-
49u:-
50k:-
51o:-
56 / 163
fsu-logo
Weighted Finite State Transducers
Determinization Example
After applying determinization
0
1ky:共存
2s:サードネス
3y:ゆうこ
4g:額絵
5a:あの
6k:-
7r:-
8
m:見回し
9o::-
10a::-
11u:-
12a:-
13n:-
14i:-
15o::郎
16
e:履歴
17
i:-
18s:-
19d:-
20u:-
21k:-
22o:-
23N:均一
24
k:危険
25r:-
26m:-
27o:-
28o:-
29k:-
30u:-
31i:-
32e:-
33i:-
34a::-
35N:-
36n:-
37o:-
38e:-
39ts:-
40N:-
41k:-
42sh:-
43e:-
44u:-
45i:-
46i:-
47s:-
48u:-
57 / 163
fsu-logo
Weighted Finite State Transducers
Mininization
Minimization takes a deterministic WFA and returns anequivalent WFA with the minimal ammount of states
First pushes the costs in the machine
Encodes as a costless acceptor
Performs the minimization
The OpenFst minimize command can accept transducers butunder the hood will convert to an acceptorThe ATT tools require that WFSTs are encoded as acceptors
Seems to have little improvement on speech recognition speed
Makes a small size reduction on speech recognitiontransducers
Choice of semiring can negatively affect performance[15]
58 / 163
fsu-logo
Weighted Finite State Transducers
Minimization Example
Example taken verbatim from Mohri [13]
mm.fsm
0
1
a/1
b/2
c/3
2
d/4
e/5
3/1
e/0.800
f/1
e/4
f/5
(a) A
mmm.fsm
0 1
a/0.040
d/0.799
b/0.080c/0.120
e/1
2/25e/0.799
f/1
(b) A′
59 / 163
fsu-logo
Weighted Finite State Transducers
Minimization Example
Minimization will share the suffixes in the toy lexicon
01
ky:共存
2s:サードネス
3y:ゆうこ
4g:額絵
5a:あの
6k:-
7r:-
8
m:見回し
9o::-
12a::-
18u:-
21a:-
20n:-
24i:-
35
o::郎
28e:履歴
32i:-
10s:-
13d:-
19u:-
22k:-
o:-
25N:均一
26k:危険
29r:-
33m:-
11o:- N:-
14o:-
15n:-
16e:-
17
s:-
u:-
k:-
23u:- e:-
27i:-
e:-ts:-
30i:-
31k:- i:-
34a::-
sh:-
60 / 163
fsu-logo
Weighted Finite State Transducers
Weight Pushing
Weights can either be can pushed towards the initial or finalstates
Tropical semiring example taken verbatim from [13]
0
1
a
b/1
c/5
2
d
e/1
3
e
f/1
e/4
f/5
(a) Orginal
0
1
a
b/1
c/5
2
d/4
e/5
3
e
f/1
e
f/1
(b) Pushed to intial
61 / 163
fsu-logo
Weighted Finite State Transducers
Weight Pushing Also Performs Normalization
Consider the previous example repeated in the real semiring
In the pushed machine:
The weights along the paths are normalizedThe machine is stochastic – the weight leaving a state all sumto oneThe path scores are still the same as the un-pushed machine
0
1a/0
b/1
2
d/0
e/1
3/1
e/0
f/1
e/4
f/5
(a) Orginal
0
1a/0
b/0.100
2
d/0
e/0.899
3/10
e/0
f/1
e/0.444
f/0.555
(b) Pushed to intial
62 / 163
fsu-logo
Weighted Finite State Transducers
Epsilon Removal
Epsilon transitions can harm WFST efficiency
The epsilon removal algorithm attempts to remove these nulltransitions
Use with care as it can lead to explosion in network size
Not all epsilons can be removed
63 / 163
fsu-logo
Weighted Finite State Transducers
Convert
Openfst tool to convert between FST types
Currently no support for converting semirings
Types not well documented so here is a list dumped out fromthe FstRegister. All should be available in both Log andTropical semirings
vector constcompact16 acceptor compact16 string compact16 unweightedcompact16 unweighted acceptor compact16 weighted string compact64 acceptorcompact64 string compact64 unweighted compact64 unweighted acceptorcompact64 weighted string compact8 acceptor compact8 stringcompact8 unweighted ompact8 unweighted acceptor compact8 weighted stringcompact acceptor compact string compact unweightedcompact unweighted acceptor compact weighted string const8const16 const64
64 / 163
fsu-logo
Weighted Finite State Transducers
Map
OpenFst fstmap applies transformations to the arc and final weights asillustrated
0
1/0.5<s>:<s>/8
2/0.5an:an/4
4</s>:</s>/4
3/0.5
other:other/6.0117
an:an/0.25
other:other/0.25
</s>:</s>/0.5
</s>:</s>/0.33301
other:other/0.16699
(a) Orginal
0
1/0.5<s>:<s>/8
2/0.5an:an/4
4</s>:</s>/4
3/0.5
other:other/6.0117
an:an/0.25
other:other/0.25
</s>:</s>/0.5
</s>:</s>/0.33301
other:other/0.16699
(b) Identity 65 / 163
fsu-logo
Weighted Finite State Transducers
Map continued
0
1/2<s>:<s>/0.125
2/2an:an/0.25
4</s>:</s>/0.25
3/2other:other/0.16634
an:an/4
other:other/4
</s>:</s>/2
</s>:</s>/3.0029
other:other/5.9883
(c) Invert
0
1/2.5<s>:<s>/10
2/2.5an:an/6
4/3</s>:</s>/6
3/2.5
other:other/8.0117
an:an/2.25
other:other/2.25
</s>:</s>/2.5
</s>:</s>/2.333
other:other/2.167
(d) Plus (2)
66 / 163
fsu-logo
Weighted Finite State Transducers
Map continued
0
1/0.5<s>:<s>/8
2/0.5an:an/4
4</s>:</s>/4
3/0.5
other:other/6.0117
an:an/0.25
other:other/0.25
</s>:</s>/0.5
</s>:</s>/0.33301
other:other/0.16699
(e) Orginal
0
1<s>:<s>
2an:an
4</s>:</s>
3other:other
an:an
other:other
</s>:</s>
</s>:</s>
other:other
(f) Rmeight67 / 163
fsu-logo
Weighted Finite State Transducers
Map continued
0
1<s>:<s>/16
2an:an/8
4/2</s>:</s>/8
3other:other/12.023
an:an/0.5
other:other/0.5
</s>:</s>
</s>:</s>/0.66602
other:other/0.33398
(g) Times (2)
0
1<s>:<s>/8
2an:an/4
4</s>:</s>/4
3
other:other/6.0117
an:an/0.25
other:other/0.25
5
-:-/0.5
</s>:</s>/0.5
-:-/0.5
-:-
other:other/0.16699
</s>:</s>/0.33301-:-/0.5
(h) Superfinal
68 / 163
fsu-logo
Weighted Finite State Transducers
Section Summary
Introduced WFSTs
Covered the algorithms frequently used in speech recognitionnetwork construction
69 / 163
fsu-logo
Weighted Finite State Transducers
End of Section
Questions?
70 / 163
fsu-logo
Examples
Simple applications usingWFSTs
71 / 163
fsu-logo
Examples
Section Overview
This section will introduce two simple examples of usingWFSTs:
Basic edit distance computation using WFSTsMore complicated counting example with a custom semiring
72 / 163
fsu-logo
Examples
Edit Distance
In this example use the shell level OpenFst tools to calculate theedit distance between two strings. The example demonstratescomposition and shortest path operations[11, 5].
0 1t
2o
3k
4y
5o
(a) WFST for Tokyo
0 1k
2y
3o
4t
5o
(b) WFST for Kyoto
73 / 163
fsu-logo
Examples
Edit Distance - Filter
0
a:-/1-:a/1a:aa:b/2b:-/1-:b/1b:a/2b:b
74 / 163
fsu-logo
Examples
Edit Distance - Composition
fstcompose tokyo.ofst filter.ofst | fstcompose -kyoto.ofst > tfk.ofst
0
1-:k/1
2t:-/1
3t:k/2t:-/1
4t:y/2
5-:y/1
-:k/1
6o:-/1
7o:k/2
-:y/1
o:-/18o:y/2 o:-/1
9-:o/1
10o:ot:-/1
t:o/2
11-:o/1
-:k/1
12k:-/1
13k:k
-:y/1
k:-/114k:y/2
-:o/1
k:-/115k:o/2
o:-/116o:t/2
17
-:t/1
k:-/1
-:t/118k:t/2
t:-/1
t:t
19-:t/1
-:k/1
20
y:-/121
y:k/2
-:y/1
y:-/122y:y
-:o/1
y:-/123y:o/2
-:t/1
y:-/124y:t/2
k:-/125k:o/2
26-:o/1o:-/1
o:o
27
-:o/1
y:-/1
-:o/1
28y:o/2
t:-/1
t:o/2
29-:o/1
-:k/1
30o:-/1 31
o:k/2
-:y/1
o:-/132o:y/2
-:o/1
o:-/133o:o
-:t/1
o:-/134o:t/2
-:o/1
o:-/135
o:o
y:-/1
k:-/1o:-/1
o:-/1
t:-/1
-:k/1
-:y/1
-:o/1
-:t/1
-:o/1
75 / 163
fsu-logo
Examples
Edit Distance - Best Path
fstshortestpath tfk.ofst > best.ofst
7 6t:-/1
5o:-/1
01o:o
2-:t/1
3-:o/1
4y:yk:k
76 / 163
fsu-logo
Examples
Counting Example
In this example
Use C++ and OpenFst to build a grmcount[2] clone
Add a custom semiring to OpenFst
Crete a user defined arc typeBuild a shared objects that is loaded by OpenFst tools
Count n–grams in text corpus
Makes use of the following operations:
Composition between the corpus and counting machineProjection, Epsilon Removal, Determinization andMinimization to generate a compact set of countsComposition and ShortestDistance to extract the count for aspecific n–gram
77 / 163
fsu-logo
Examples
Counting Example
From [2] calculate the expected count usingc (x) =
∑u∈Σ |u|x [A] (x)
Where [A] (x) is the probability of x in A
The below WFST implements the counting where X : Xdenotes an arbitrary transducer [12]
0
a:-/1b:-/1
1/1X:X/1
a:-/1b:-/1
The loops at state 0 consume unwanted input mapping themto epsilons
The machine X will match the desired input(s) patterns
The end loops will then map the unwanted trailing input toepsilons
78 / 163
fsu-logo
Examples
Counting Example
Project outputs and remove epsilons will to give a path foreach match
optionally determinize and minimize for a compactrepresentation
Can be applied to lattices that cannot be enumerated [2]
Extended to class models and used for training WFSTs byMurat and Roark[19]
To compute c(x) for a single n–gram compose with a linearchain automata representing the input x and use the shortestdistance algorithm to obtain the sum over all paths
Or compose with linear chain transducer with epsilons outputsand remove epsilons to give a single state and weight
79 / 163
fsu-logo
Examples
First Add Real Semiring
First Add Real Semiring
In practice can just use the log semiring.
For numerical stabilityAlready registered with the all the tools (printing, drawing ...)
Serves as example to illustrate OpenFst extensibility
Therefore the algorithms and tools should work with thecustom semiring
Real semiring does have practical application, for example Shuand Hethering[20] used the real semiring for EM training ofFSTs
80 / 163
fsu-logo
Examples
Implement Real Weight
Real (probability) semiring +,×, 0, 1OpenFst weight requirements are on the site here [21] andhere[3]
The floatweight type can be used to do most of the boilerplate work
Essential to implement the following static methods Plus,Times, Zero, One and Type
Implemented in real–weight.h
81 / 163
fsu-logo
Examples
Register Custom Arc Type
Now add support for the real semiring in the shell tools
#inc l u d e <f s t / a r c . h>#inc l u d e <f s t / v e c t o r−f s t . h>#inc l u d e <f s t / const−f s t . h>#inc l u d e <f s t / mains . h>#inc l u d e ” r e a l−w e i g h t . h”
us ing namespace f s t ;ex te rn ”C”{
vo id r e a l a r c i n i t ( ){
typede f ArcTpl<RealWeight> Rea lArc ;REGISTER FST ( V e c t o r F s t , Rea lArc ) ;REGISTER FST ( ConstFst , Rea lArc ) ;REGISTER FST MAINS ( Rea lArc ) ;
}}
g++ -O2 -fPIC real-arc.cpp -o real-arc.o -c-I/path/to/openfst-1.1/src/bin/g++ -shared -o real-arc.so real-arc.o
Final step add shared object path to LD_LIBRARY_PATH82 / 163
fsu-logo
Examples
The Counting Utility
acount
Input a text file
Output the compact count machine and
All the intermediate steps and the unigram counts
Usage: acount input.txt output.dir order.int
83 / 163
fsu-logo
Examples
Counting Training Text
Sample Text From APSIPA Siteat the centre of sapporo the city hall and the sapporo station are located at the end of an
alluvial fan at 20m above sea level while the salmon museum in makomanai is at an altitude of 80m
the distance between the two places is about 8 kilometers between the the total superficies
of the alluvial fan is approximately 20 square kilometers the central part of sapporo was
formed 6000 years before by the deposit of earth carried by the toyohira river from jozankei and
was frequently flooded in the 19th century when the river banks were not built yet there is
abundant ground water away from the riverbed due to the river underflows since there is no need
to dig deep wells to draw an high quality water life is easy on this fertile land which was used
for agriculture including the culture of fruit trees
Illustrate the process counting bigram and unigrams on a singlesentence (modified)
the distance between the two places is about 8 kilometers between the
./acount sample.sentance sentance.output 2
84 / 163
fsu-logo
Examples
Real Semiring Arc RegistrationRun fstinfo corpus.ofst, and if something went wrongERROR: FstMainRegister::GetMain: real-arc.so: cannot open shared object file: No such file or directory
ERROR: fstprint: Bad or unknown arc type "real" for this operation (PrintMain)
Otherwise:fst type vector
arc type real
input symbol table none
output symbol table none
# of states 13
# of arcs 12
initial state 0
# of final states 1
# of input/output epsilons 0
# of input epsilons 0
# of output epsilons 0
# of accessible states 13
# of coaccessible states 13
# of connected states 13
# of strongly conn components 13
expanded y
mutable y
acceptor y
input deterministic y
output deterministic y
input/output epsilons n
input epsilons n
output epsilons n
input label sorted y
output label sorted y
weighted n
cyclic n
cyclic at initial state n
top sorted y
accessible y
coaccessible y
string y
85 / 163
fsu-logo
Examples
Input and counting WFSTs
fstdraw -portrait=true -isymbols=words.syms -osymbols=words.syms corpus.ofst | dot -Tpdf > corpus.pdf
0 1the
2distance
3between
4the
5two
6places
7is
8about
98
10kilometers
0
the:-distance:-between:-two:-places:-is:-about:-8:-
kilometers:-
1
distance:distance
between:between
two:two
places:places
is:isabout:about
8:8
kilometers:kilometersthe:the
2
the:the
distance:distance
between:between
two:two
places:places
is:is
about:about
8:8
kilometers:kilometers
the:-distance:-between:-two:-places:-is:-about:-8:-
kilometers:-
the:the
distance:distance
between:between
two:two
places:places
is:is
about:about
8:8
kilometers:kilometers
86 / 163
fsu-logo
Examples
Composed Machine
0
1the:-
2the:the
3
the:the
4distance:-
5
distance:distance 6distance:distance
distance:-
distance:distance
7between:-
8
between:between 9between:between
between:-
between:between
10the:-
11
the:the 12the:the
the:-
the:the
13two:-
14
two:two 15two:two
two:-
two:two
16places:-
17
places:places 18places:places
places:-
places:places
19is:-
20
is:is 21is:is
is:-is:is
22about:-
23
about:about 24about:about
about:-
about:about
258:-
26
8:8
278:8
8:-8:8
28kilometers:kilometers
kilometers:-
kilometers:kilometers
87 / 163
fsu-logo
Examples
Projected Machine
0
1-
2the
3
the
4-
5
distance 6distance
-
distance
7-
8
between 9between
-
between
10-
11
the 12the
-the
13-
14
two 15two
-two
16-
17
places 18places
-
places
19-
20
is 21is
-is
22-
23
about 24about
-
about
25-
26
8
278
-8
28kilometers
-
kilometers
88 / 163
fsu-logo
Examples
Epsilon Removed Machine
0
3
distance
4distance
7the
8the
11places
12places
15
about
16about
19
kilometers
188
17
8
14is
13
is
10two
9two
6between
5between
2the
1the
between
two
is
8
kilometers
about
places
the
distance
89 / 163
fsu-logo
Examples
Optimized Machines
Determinized
0
1/0.5the/4
2/0.5distance/2
3/0.5between/2
4/0.5two/2
5/0.5places/2
6/0.5
is/2
7/0.5
about/2
8/0.5
8/2
9kilometers
10distance/0.25
11two/0.25
12between/0.5
13the/0.5
14places/0.5
15is/0.5
16about/0.5
178/0.5
kilometers/0.5
Minimized
0
1/0.5the/4
2/0.5distance/2
3/0.5between/2
4/0.5two/2
5/0.5places/2
6/0.5
is/2
7/0.5
about/2
8/0.5
8/2
9
kilometers
two/0.25
distance/0.25
between/0.5
the/0.5
places/0.5
is/0.5
about/0.5
8/0.5
kilometers/0.5
90 / 163
fsu-logo
Examples
Compare to grmcount
the distance between the two places is about 8 kilometers between the
acount
0
1/0.5the/4
2/0.5distance/2
3/0.5between/2
4/0.5two/2
5/0.5places/2
6/0.5
is/2
7/0.5
about/2
8/0.5
8/2
9
kilometers
two/0.25
distance/0.25
between/0.5
the/0.5
places/0.5
is/0.5
about/0.5
8/0.5
kilometers/0.5
grmcount
0/0
2/0the/2
3/0distance/1
4/0between/1
5/0two/1
6/0places/1
7/0
is/1
8/0
about/1
9/0
8/1
10/0
kilometers/1
1/0
two/1
distance/1
between/1
the/1
places/1is/1
about/1
8/1
kilometers/1
91 / 163
fsu-logo
Examples
Extracting Counts
Method 1 - ShortestDistance
0 1between
2the
fstcompile --arc type=real --isymbols=words.syms --osymbols=words.syms betweenthe.wfst |
fstcompose min.ofst - | fstshortestdistance --reverse=true
0 21 0.52 1
Method 2 - Projection and Epsilon Removal
0 1between:-
2the:-
fstcompile --arc type=real --isymbols=words.syms --osymbols=words.syms the.wfst |
fstcompose min.ofst -- |fstproject --project output=true |fstrmepsilon| fstprint
0 2
Don’t need to perform the previous optimizations to obtaincounts with this method
92 / 163
fsu-logo
Examples
Compare to a counting utility
Finally, compare acount to the counting tool in the MitLM[9]toolkitand 2alluvial 2fan 2in 2kilometers 2by 2there 2from 2water 2an 3sapporo 3was 3river 3to 3at 4is 6of 7the 17
alluvial 2and 2by 2fan 2from 2in 2kilometers 2there 2water 2an 3river 3sapporo 3to 3was 3at 4is 6of 7the 17
93 / 163
fsu-logo
Examples
Section Summary
Basic edit distance computation using WFSTs
More complicated counting example with a custom semiring
94 / 163
fsu-logo
Examples
End of Section
Questions?
95 / 163
fsu-logo
Speech Recognition
Speech Recognition withWFSTs
96 / 163
fsu-logo
Speech Recognition
Overview of Section
Detailed walkthrough on the construction of real world largevocabulary integrated WFST speech recognition system
Much more practical than other tutorials
Based on the procedure from Mohri [15]
Rules of thumb and things to avoid
Illustrate what happens when these guidelines are and are notused
The final recipe is very small
It should provide a process that builds a good integratedrecognition WFST
97 / 163
fsu-logo
Speech Recognition
Overview of Task
Use Corpus of Spontaneous Japanese to illustrate constructionmethods and toolkits
Standard large vocabulary Japanese task
233 hour of acoustic training data116 minutes of acoustic testing data
Acoustic Models have 3000 states and 128 Gaussian per state
Vocabulary 65k words where a word is a base form – POS Tag– Reading
Evaluate word accuracy after stripping POS tags and reading
Use T 3 decoder with GPU acoustic scoring [6–8]
Decode on a set of beam widths and see how the constructioninfluences the RTF/WER relationship
98 / 163
fsu-logo
Speech Recognition
WFSTs In Speech Recognition
Represent each knowledge source as a WFST, typically
Language Model – G
Lexicon – L
Context Dependency – C
Acoustic Models – H
Combine using composition – H ◦ C ◦ L ◦ G
Optimize final and intermediate WFSTs
Use in decoder
99 / 163
fsu-logo
Speech Recognition
The Process in this Walkthrough
Use the process described in the paper [15]
The cascade will be πdet(C ◦ det(L ◦ G ))
Mostly use the ATT tools for the walkthrough
Will also include a comparison to OpenFst
Should be straightforward to implement using a scriptinglanguage
100 / 163
fsu-logo
Speech Recognition
Optimization Don’t Always Improve Performance
WFSTs allow for highly efficient networks to be constructed
Use as many optimizations as possible
The decoder should go really fast?The search network should get really small?
The construction process can be very fragile
Certain transducers need to be constructed in specific ways
Applying too many optimizations
Often doesn’t improve performance or network sizeCan make performance worseCan make it impossible to construct the final network
101 / 163
fsu-logo
Speech Recognition
Language Model
Language model is converted to WFST
Finite state grammar is trivial
Also possible to efficiently represent an n–gram as WFST[1]
Use epsilons transitions to back-off states to reduce thenetwork size [1, 15]
102 / 163
fsu-logo
Speech Recognition
Trigram Example
P() is the n–gram probability, B() is the back-off weight
The states labels encodes the history
Example from [5]
-
a
a/P(a)b
b/P(b)
-/B(a) abb/P(b|a)
-/B(b)
ba
a/P(a|b)-/B(ab)
a/P(a|ab)
-/B(ba)b/P(b|ba)
103 / 163
fsu-logo
Speech Recognition
Full Bigram Example
Random strings generated (by an FSA) with an alphabet of twowords
b a a a a a a b a a ab bb ba b a b baaa a b a ab ba a
104 / 163
fsu-logo
Speech Recognition
Full Bigram Example
ARPA format LM trained with mitlm using estimate-ngram -o2-t ab.train -wl ab.lm
\data\
ngram 1=4
ngram 2=8
\1-grams:
-0.477121 </s>
-99 <s>-0.176091
-0.477121 b -0.124939
-0.477121 a -0.324511
\2-grams:
-0.477121 <s> b
-0.352183 <s> a
-0.477121 b </s>
-0.477121 b b
-0.477121 b a
-0.579784 a </s>
-0.676694 a b
-0.278754 a a
\end\
105 / 163
fsu-logo
Speech Recognition
Full Bigram Example
Converted to an WFST
<s> bb:1/1.0986
a
a:2/0.81093
-
-:0/0.40546
b:1/1.0986
a:2/1.0986
-:0/0.28768
</s>
</s>:3/1.0986
b:1/1.5581a:2/0.64185
-:0/0.74721</s>:3/1.335
b:1/1.0986
a:2/1.0986
</s>:3/1.0986
106 / 163
fsu-logo
Speech Recognition
Other Points for ARPA n–gram Models
Use the 99 value to set the start state
Use concatenation if you want a sentence begin tag
For end states:
Can’t assume no back-off implies the final state. When using arestricted vocabulary list or SRILM some back-off values mightbe missingInstead detect sentence end tags </s> and set thecorresponding state as finalAlso remember the higher n–grams a b </s> will have afinal state labelled b,<s>Direct them to a unique final state
Remember to convert log base
107 / 163
fsu-logo
Speech Recognition
Walkthrough Language Model
Trigram built from CSJ using ATT GRM Tools
Training set of approx 400,000 sentences
Katz smoothing and shrinking factor of 1
WFST States Arcs
G 299634 1422541
108 / 163
fsu-logo
Speech Recognition
Lexicon
Built from the pronunciation dictionary
Maps sequences of phonemes to words
Create a chain transducer for each word in the lexicon
Take the union
Perform closure
So we can looping through the machine
109 / 163
fsu-logo
Speech Recognition
Lexicon Example
Illustrate the procedure on a random subset from the CSJdictionary
共存 ky o: s o N あの a n o 額絵 g a k u e サードネス s a: d o n e s u 均一 k i N i ts u 履歴 r e r i k i 見回し m i m a: sh i 郎 r o: 危険 k i k e N ゆうこ y u u k o
110 / 163
fsu-logo
Speech Recognition
Chain Transducers
For each of the entries construct a linear chain transducer
0 1ky:共存
2o::-
3s:-
4o:-
5N:-
111 / 163
fsu-logo
Speech Recognition
Lexicon Example
Take the union of all words to build lexicon WFST
0
1ky:共存
6a:あの
9g:額絵
14s:サードネス
22k:均一
28r:履歴
34m:見回し
40
r:郎
42
k:危険
47y:ゆうこ
2o::-
7n:-
10a:-
15a::-
23i:-
29e:-
35i:-
41o::-
43i:-
48u:-
3s:- 4o:-
5N:-
8o:-
11k:- 12u:- 13e:-
16d:-
17o:-
18n:- 19e:- 20
s:-21
u:-
24N:-
25i:-
26ts:- 27
u:-
30r:-
31i:-
32k:-
33i:-
36m:-
37a::-
38sh:-
39i:-
44k:- 45
e:-46
N:-
49u:-
50k:-
51o:-
112 / 163
fsu-logo
Speech Recognition
Lexicon Example
Perform Closure
0
1ky:共存
6a:あの
9g:額絵
14s:サードネス
22k:均一
28r:履歴
34m:見回し
40r:郎
42k:危険
47y:ゆうこ
2o::-
7n:-
10a:-
15a::-
23i:-
29e:-
35i:-
41o::-
43i:-
48u:-
3s:-
4o:-
5N:-
-:-
8o:-
-:-
11k:- 12u:- 13e:-
-:-
16d:-
17o:-
18n:-
19
e:-
20
s:-
21
u:-
-:-
24N:-
25i:-
26ts:-
27
u:-
-:-
30r:-31
i:-32
k:-
33i:-
-:-
36m:- 37
a::- 38sh:- 39i:-
-:-
-:-
44k:-
45e:-
46N:-
-:-
49u:-
50k:-
51o:-
-:-
52-:-
113 / 163
fsu-logo
Speech Recognition
Lexicon - Points
Easier just to perform the previous step as a single script
Simple to do in a single passProbably not a good idea to create 100k small filesEasier to add the auxiliary symbols
Place output labels early in the path
The epsilons will delays in the matching and generate manydead end statesHomophones prevent determinization – add auxiliary symbolsAvoid optimizing the lexicon - because this will push theoutput labels further backSorting (or indexing) arcs is crucial in large system
114 / 163
fsu-logo
Speech Recognition
Good and bad label placement
Create two lexicons:
A lexicon with outputs labels at the start of the chains Ls
A lexicon with outputs labels at the end of the chains Le
Both Ls and Le have the same amount of states and arcs
Compose Ls or Le with G
The labels are indexed on the input of G and the output of L
Ls ◦ G took 4.5 seconds
Le ◦ G took...
115 / 163
fsu-logo
Speech Recognition
Good and bad label placement
Occurs, regardless of filter, back-off epsilon treatment ortoolkit
Repeat again on a 3000 word lexicon
Compose again without connection fsmcompose -d ls.fsmg.fsm
Ls ◦ G trim Ls ◦ G Le ◦ G trim Le ◦ GStates 46258 30710 66520834 66848Arcs 58507 42932 68004350 819747Accessible States 46258 30710 67992311 66848Co-accessible States 30710 30710 66848 66848
116 / 163
fsu-logo
Speech Recognition
Sort and Index Labels
Sorting or Indexing of arcs is also important
Informal comparison of composition with indexed, un-indexedmachines
Un-indexed composition took 4.5 minutes
Indexed composition took approx. 5 seconds
OpenFst will/should complain if arcs aren’t sorted
117 / 163
fsu-logo
Speech Recognition
Lexicon Determinization and Auxiliary Symbols
The lexicon may not be determinizable because ofhomophones
Verified on the walkthrough lexicon anyway
Determinization just runs out of memory
Solution is to add auxiliary symbols to disambiguate thehomophones
118 / 163
fsu-logo
Speech Recognition
Auxiliary Symbols
Consider the toy lexicon with an additional homophone共存 ky o: s o N あの a n o 額絵 g a k u e サードネス s a: d o n e s u 均一 k i N i ts u 履歴 r e r i k i 見回し m i m a: sh i 郎 r o: 危険 k i k e N ゆうこ y u u k o 棄権 k i k e N
Add an auxillary phone #n to each pronunciation and alter nto disambiguate to the homophones
It is now possible to determinize the lexicon
119 / 163
fsu-logo
Speech Recognition
Lexicon Augmented With Auxiliary Symbols
0
1ky:共存
7y:あの
12g:額絵
18s:サードネス
27k:均一
34r:履歴
41
m:見回し
48
r:郎
51
k:危険
57
y:ゆうこ
63k:棄権
2o::-
8o:-
13a:-
19a::-
28i:-
35e:-
42i:-
49o::-
52i:-
58u:-
64i:-
3s:- 4o:- 5N:- 6
#0:-
9n:- 10o:- 11#0:-
14k:-
15u:- 16e:- 17#0:-
20d:-
21o:-
22n:-
23e:- 24s:- 25
u:-26
#0:-
29N:-
30i:-
31ts:- 32
u:- 33#0:-
36r:-
37i:-
38k:-
39i:-
40#0:-
43m:-
44a::-
45sh:-
46i:-
47#0:-
50#0:-
53k:-
54e:- 55
N:- 56#0:-
59u:-
60k:-
61o:-
62#0:-
65k:-
66e:-
67N:-
68#1:-
120 / 163
fsu-logo
Speech Recognition
Side note using ATT automatic symbol insertion
The ATT include a special determinization algorithm toatomically add disambiguation symbols
fsmdeterminize -l 1000 in.fsm > out.fsm
Auxiliary symbols begin from numbers labeled 1000
Print out the wfst to get the symbols for later removal
Even using this algorithm – avoid determinizing the lexiconbefore composition
121 / 163
fsu-logo
Speech Recognition
Lexicon Example
Determinize to generate prefix tree
0
1ky:共存
2s:サードネス
3y:ゆうこ
4g:額絵
5a:あの
6k:-
7r:-
8
m:見回し
9o::-
10a::-
11u:-
12a:-
13n:-
14i:-
15o::郎
16
e:履歴
17
i:-
18s:-
19d:-
20u:-
21k:-
22o:-
23N:均一
24
k:危険
25r:-
26m:-
27o:-
28o:-
29k:-
30u:-
31i:-
32e:-
33i:-
34a::-
35N:-
36n:-
37o:-
38e:-
39ts:-
40N:-
41k:-
42sh:-
43e:-
44u:-
45i:-
46i:-
47s:-
48u:-
122 / 163
fsu-logo
Speech Recognition
Lexicon Example
Minimize to share common suffixes
01
ky:共存
2s:サードネス
3y:ゆうこ
4g:額絵
5a:あの
6k:-
7r:-
8
m:見回し
9o::-
12a::-
18u:-
21a:-
20n:-
24i:-
35
o::郎
28e:履歴
32i:-
10s:-
13d:-
19u:-
22k:-
o:-
25N:均一
26k:危険
29r:-
33m:-
11o:- N:-
14o:-
15n:-
16e:-
17
s:-
u:-
k:-
23u:- e:-
27i:-
e:-ts:-
30i:-
31k:- i:-
34a::-
sh:-
123 / 163
fsu-logo
Speech Recognition
Compose L and G
Optimizing L makes it smaller
But determinization pushes the labels back
We have seen this can severely damage the efficiency ofcomposition
Verified the composition of L or Lopt with G
The composition of any Lopt permutation fails
Confirms it is best not to optimize L at this stage
124 / 163
fsu-logo
Speech Recognition
Determinize L and G
The labels placements are optimized
The auxiliary symbols are in place
However using the Epsilon Matching filter and the epsilons inthe language model will cause problems
One solution is to replace epsilon transitions in G
Replace epsilon back-offs with an alternative symbol (usuallyφ)Add a φ loop to the lexicon
Easier solution is to use a Sequence filter
ATT fsmcompose with –s optionOpenFst defaults to sequence filter
125 / 163
fsu-logo
Speech Recognition
Walkthrough LG
WFST States ArcsG 299634 1422541L 401704 466703LG 1182138 2383541det(LG) 1410016 2613069
126 / 163
fsu-logo
Speech Recognition
Context-Dependency
Map from context dependent sequences to contextindependent sequenceOften the acoustic models contain only subset of tri–phonesFirst construct a network that can handle all possibletri–phonesThen replace with actual tri–phonesAdd arcs according to the following rules:
Use a delayed (and inverted) construction to ensuredeterminzability [15]
Name Type Source Destination Input OutputTri–phone l-c+r l,c c,r r l-c+rRight Bi–phone c+r –,c c,r r c+rLeft Bi–phone l-c l,c c,- (Final) - l-cMono–phone c – (Start) –,c c –
–,c c,– – c
127 / 163
fsu-logo
Speech Recognition
Deterministic Context Dependency Construction
-,-
-,a
-:a
-,b-:b
a,-a:-
a,a
a+a:aa,b
a+b:b
b,-
b:-
b,a
b+a:a
b,bb+b:b
a-a:-
a-a+a:aa-a+b:b
a-b:-
a-b+a:a
a-b+b:b
b-a:-b-a+a:a
b-a+b:b
b-b:-
b-b+a:a
b-b+b:b
128 / 163
fsu-logo
Speech Recognition
Sanity Check Construction
Construct a chain automata from the string ababb
Compose with CD transducer
The resulting WFST maps from CI to CD phones
0 1a:- 2b:a+b 3a:a-b+a 4b:b-a+b 5b:a-b+b 6-:b-b
Additionally check a monophone a
0 1a:- 2-:a
129 / 163
fsu-logo
Speech Recognition
Deterministic Context Dependency Construction withAuxiliary Symbols
Need to deal with the additional auxiliary phones
At each state in CD network add self loops that pass theauxiliary symbols through to the next level
Minimal example on the next slide
After performing the final set of optimizations replace theauxiliary symbols with epsilon symbols
130 / 163
fsu-logo
Speech Recognition
Deterministic Context Dependency Construction withAuxiliary Symbols
-,-
#2:#2#1:#1
-,a
-:a
-,b
-:b
#2:#2#1:#1
a,-
a:-
a,a
a+a:a
a,ba+b:b
#2:#2#1:#1
b,-
b:-
b,a
b+a:a
b,b
b+b:b
#1:#1#2:#2
a-a:-
a-a+a:a#2:#2#1:#1
a-a+b:b
#2:#2#1:#1
a-b:-
a-b+a:a
a-b+b:b
#2:#2#1:#1
b-a:-b-a+a:a
b-a+b:b#2:#2#1:#1
b-b:-
b-b+a:a
b-b+b:b#2:#2#1:#1
131 / 163
fsu-logo
Speech Recognition
Walkthrough CLG
WFST States ArcsG 299634 1422541L 401704 466703LG 1182138 2383541det(LG) 1410016 2613069C det(LG) 1466022 2738532det ( C det (LG)) 1466022 2738532
132 / 163
fsu-logo
Speech Recognition
Final Steps
Final processing step before recognition
Remove auxiliary symbols
Substitute logical Tri–phones with physical Tri–phones
133 / 163
fsu-logo
Speech Recognition
Walkthrough πCLG
WFST States Arcs Input Eps Output EpsG 299634 1422541 199631 199631L 401704 466703 65001 401703LG 1182138 2383541 404840 1182135det(LG) 1410016 2613069 434721 1418229C det(LG) 1466022 2738532 444095 1511787det ( C det (LG)) 1466022 2738532 444095 1511787π det ( C det (LG)) 1466022 2738532 692946 1511787
134 / 163
fsu-logo
Speech Recognition
CLG Illustrations
Toy Language model G
0 1危険棄権 2
見回し履歴
135 / 163
fsu-logo
Speech Recognition
CLG Illustrations
L ◦ G
0 1-:- 2k:危険
21k:棄権
3i:-
22i:-4k:- 5e:- 6N:- 7#10000:-
8-:- 9
m:見回し
15r:履歴
10i:-
16e:-11m:- 12a::- 13sh:- 14i:-
17r:- 18i:-19k:- 20i:-
23k:- 24e:- 25N:- 26#10002:- -:-
136 / 163
fsu-logo
Speech Recognition
CLG Illustrations
det(L ◦ G )
0 1-:-/0 2k:-/-0.69 3i:-/1.904 4k:-/1.904 5e:-/1.904 6N:-/1.904 7#10000:危険/0.693
8#10002:棄権/0.693 9
-:-/0
-:-/010
m:見回し/0
11r:履歴/0
12i:-/0
13e:-/014m:-/0
15r:-/0
16a::-/0
17i:-/0
18sh:-/0
19k:-/0
20/0i:-/0
21/0i:-/0
137 / 163
fsu-logo
Speech Recognition
CLG Illustrations
C ◦ det(L ◦ G )
0 1-:-/0 2-:-/-0.69 3k+i:-/1.904 4k-i+k:-/1.904 5i-k+e:-/1.904 6k-e+N:-/1.904 7#77659:危険/0.693
23#77661:棄権/0.693 8
-:-/0
-:-/09
e-N+m:見回し/0
16e-N+r:履歴/0
10N-m+i:-/0
17N-r+e:-/011m-i+m:-/0 12i-m+a::-/0 13m-a:+sh:-/0 14a:-sh+i:-/0 15/0sh-i:-/0
18r-e+r:-/0 19e-r+i:-/0 20r-i+k:-/021
i-k+i:-/022/0
k-i:-/0
138 / 163
fsu-logo
Speech Recognition
Recognition Performance
68
70
72
74
76
78
80
82
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Acc
urac
y
RTF
det(Cdet(LG))
139 / 163
fsu-logo
Speech Recognition
Comparison of a few Construction Techniques
WFST States Arcs
π det ( C det (LG)) 1466022 2738532Tropical π det ( C det (LG)) 1466022 2738532π min ( det ( C det (LG)) ) 1397962 2643926CLG 2036680 30169707dmake 1205482 2614254
140 / 163
fsu-logo
Speech Recognition
Performance with no Optimizations
20
30
40
50
60
70
80
90
0 1 2 3 4 5 6 7 8 9 10
Acc
urac
y
RTF
det(Cdet(LG))CLG
141 / 163
fsu-logo
Speech Recognition
Construction Semiring
Mohri[15] doesn’t mention the construction semiring
Allauzen[1] used log semiring for composition anddeterminization
The example presented here used the log semiring throughoutfor composition and determinization
Tropical semiring can be slightly faster but accuracy drops fornarrow beams
142 / 163
fsu-logo
Speech Recognition
Construction in the Log or Tropical Semirings
55
60
65
70
75
80
85
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Acc
urac
y
RTF
det(Cdet(LG)) - logdet(Cdet(LG)) - tropical
143 / 163
fsu-logo
Speech Recognition
Comparison with OpenFst
Using the tropical semiring
The network size is identical on either toolkit
55
60
65
70
75
80
85
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Acc
urac
y
RTF
det(Cdet(LG)) - tropical - OpenFstdet(Cdet(LG)) - tropical - ATT
144 / 163
fsu-logo
Speech Recognition
Construction Summary
Presented a construction walkthrough
Simple to implementObtains good performance on a real LVCSR setup
Many many other permutations and combinations
145 / 163
fsu-logo
Speech Recognition
Extensions
On-the-fly
The component WFSTs are composed dynamically as neededby the decoder during searchMore sophisticated technique exists to make this faster [16, 17]
Compose in the acoustic models
In this case add in the H machines
Apply factorization to further reduce WFST size
Take the composition a step further, compose with theacoustic observation O to get O ◦ H ◦ C ◦ L ◦ G and useshortest-path
This would not be practical for large systemsHowever the combination of Composition and Shortestpathcan be used to build many useful applications for speech andlangugage processing
146 / 163
fsu-logo
Speech Recognition
POS Tagger Using Component Networks and On-the-flyComposition
Just like in the speech recognition walk through
Train an n–gram model
Build a lexicon that convert words to tag–words pairs
Compose and determinize
Construct and input transducer from the input
Compose and run to shortest path
Possible to do the composition of the component networkson-the-flyRuntime system will use three WFSTs:
The input sequenceOptimized lexicon – Closed and determinizedLanguage model
Extend to On-the-fly and only expand the parts of thecomposed machine as needed by the shortest-path algorithm
147 / 163
fsu-logo
Speech Recognition
Construct Chain Automata
This is our input sequence
0 1<s> 2日 3本 4語 5が 6難 7し 8い 9</s>
148 / 163
fsu-logo
Speech Recognition
Compose with Lexicon
Compose with looped lexicon
This gives a lattice of possible constructions of the inputsequence
0 1-:- 2<s>:<s> 3-:- 4日:-
5本:-
6-:日+名詞+び
7-:日+名詞+にち
8-:日+名詞+ひ
9-:日本+名詞+にっぽん
10-:--:-
-:-
11-:-
12本:-
13語:--:本+接頭詞+ほん
14-:本+名詞+ほん
15-:本+名詞+もと
16-:語+名詞+ご
17-:語+名詞+かたり-:-
-:-
18-:-
-:- 19が:-
20-:が+助詞+が
21-:が+接続詞+が
22
-:が+動詞+が23
-:-
-:--:-
24難:-
25
し:-
26-:難+名詞+なん
27-:難+形容詞+かた
28
-:難+接頭詞+なん
29
-:難+形容詞+がた
30
い:難しい+形容詞+むずかしい
31-:難し+形容詞+むずかし
32-:難し+形容詞+がたし
33
-:-
-:-
-:-
-:-
34
-:-
35
-:-
-:-
36し:-
37</s>:</s>38い:-
-:し+助詞+し
39い:-
40-:し+動詞+し
41-:し+助動詞+し
42
-:い+動詞+い43-:い+名詞+い
-:しい+動詞+しい
-:-
-:-
-:--:-
149 / 163
fsu-logo
Speech Recognition
Compose with Language Model
Score all the paths in the lattice by composing with thelanguage model
Easy to see many different types of language model can byused
Multiple paths due to epsilon back-offs
0 1-:- 2<s>:<s> 3-:- 4日:-
5
-:-
6本:-
7-:-/2.081
8-:日+名詞+にち/9.2194
9-:日+名詞+ひ/9.6789
10-:-
-:日+名詞+にち/8.0191
-:日+名詞+ひ/8.497711
-:日+名詞+び/9.4405
12-:-
13-:-
14-:-/2.081 15
-:日本+名詞+にっぽん/6.0293
16-:-
17本:-
18本:-
-:日本+名詞+にっぽん/6.8999 19-:-
20本:-
21
-:-/1.7636
-:-/1.9319
22語:-
-:-/1.8574
23-:本+接頭詞+ほん/8.2844
24-:本+名詞+ほん/8.1731
25
-:本+名詞+もと/15.795
26
-:-/3.0529 27-:語+名詞+ご/1.293
28-:-
29-:-
30-:-
-:語+名詞+ご/7.0526
31-:語+名詞+かたり/8.7972
32-:-
33語:-
34語:-
35語:-
36-:-
37が:-
-:-/3.0905
-:-/1.9287
-:-/0.44796
38が:-
39
-:-/2.40340
-:が+助詞+が/3.1075
-:-/2.6126
-:が+助詞+が/2.4758-:が+助詞+が/3.6953
41
-:が+接続詞+が/9.176142-:が+動詞+が/15.102
43-:-
44-:-
45-:-
46難:-
47難:-
48難:-
49
-:-/2.3763
50し:-
51
-:難+名詞+なん/11.889
-:-/1.0882
52し:-
-:-/0.43021
53し:- -:難+名詞+なん/12.106
54-:難+形容詞+かた/14.696
55
-:難+接頭詞+なん/15.102
56
-:難+形容詞+がた/15.795
57
-:-/2.376358
い:難しい+形容詞+むずかしい/7.5998
59-:-
-:-/1.0882
-:-/0.43021
60-:-
61-:-
62-:-
い:難しい+形容詞+むずかしい/9.0333
63-:難し+形容詞+むずかし/12.011
64-:難し+形容詞+がたし/15.102 65
-:-
66し:-
67し:-
68し:-
69し:-
70-:-
71-:-
72
-:-/2.0334
73
</s>:</s>/3.2352
74
-:-/0.92999
75
い:-
-:-/0.7591
76
い:-
-:-/0.96042
77い:-
-:-/0.44738
78い:-
79い:-
80い:-
</s>:</s>/2.937981
-:し+助詞+し/7.1109
82-:し+動詞+し/4.303
83
-:し+助動詞+し/8.9489
84
-:-/0.92999
-:-/0.7591
-:-/0.96042
-:-/0.44738
85
-:-/2.7129
-:-/0.19807
86-:-
87-:-
88-:-
89-:しい+動詞+しい/13.492
90-:い+動詞+い/5.4038
91
-:い+名詞+い/8.482992
い:-
93い:-
94い:-
95-:-
96-:-
97-:-
-:-/1.4471
-:い+動詞+い/5.3166
-:-/3.6442
-:い+動詞+い/7.5406
-:-/0.5648
-:い+名詞+い/6.0635 -:-/0.39499
</s>:</s>/1.5202
-:-/2.0396
</s>:</s>/4.1664-:-/0.66302
</s>:</s>/2.0186
150 / 163
fsu-logo
Speech Recognition
Further Processing
The optimizations have removed the synchronization betweeninput and outputs
Just want the output sequences
Project output and remove epsilons
Determinize and use shortestpath to obtain n–best lists
151 / 163
fsu-logo
Speech Recognition
Project Output
To convert to an automata representing word sequences
0 1-:- 2<s>:<s> 3-:- 4-:-
5
-:-
6-:-
7-:-/2.081
8日+名詞+にち:日+名詞+にち/9.2194
9日+名詞+ひ:日+名詞+ひ/9.6789
10-:-
日+名詞+にち:日+名詞+にち/8.0191
日+名詞+ひ:日+名詞+ひ/8.497711
日+名詞+び:日+名詞+び/9.4405
12-:-
13-:-
14-:-/2.081 15
日本+名詞+にっぽん:日本+名詞+にっぽん/6.0293
16-:-
17-:-
18-:-
日本+名詞+にっぽん:日本+名詞+にっぽん/6.8999 19-:-
20-:-
21
-:-/1.7636
-:-/1.9319
22-:-
-:-/1.8574
23本+接頭詞+ほん:本+接頭詞+ほん/8.2844
24本+名詞+ほん:本+名詞+ほん/8.1731
25
本+名詞+もと:本+名詞+もと/15.795
26
-:-/3.0529 27語+名詞+ご:語+名詞+ご/1.293
28-:-
29-:-
30-:-
語+名詞+ご:語+名詞+ご/7.0526
31語+名詞+かたり:語+名詞+かたり/8.7972
32-:-
33-:-
34-:-
35-:-
36-:-
37-:-
-:-/3.0905
-:-/1.9287-:-/0.44796
38-:-
39
-:-/2.40340
が+助詞+が:が+助詞+が/3.1075
-:-/2.6126
が+助詞+が:が+助詞+が/2.4758
が+助詞+が:が+助詞+が/3.6953
41
が+接続詞+が:が+接続詞+が/9.176142が+動詞+が:が+動詞+が/15.102
43-:-
44-:-
45-:-
46-:-
47-:-
48-:-
49
-:-/2.3763
50-:-
51
難+名詞+なん:難+名詞+なん/11.889
-:-/1.0882
52-:-
-:-/0.43021
53-:- 難+名詞+なん:難+名詞+なん/12.106
54難+形容詞+かた:難+形容詞+かた/14.696
55
難+接頭詞+なん:難+接頭詞+なん/15.102
56
難+形容詞+がた:難+形容詞+がた/15.795
57
-:-/2.376358
難しい+形容詞+むずかしい:難しい+形容詞+むずかしい/7.5998
59-:-
-:-/1.0882
-:-/0.43021
60-:-
61-:-
62-:-
難しい+形容詞+むずかしい:難しい+形容詞+むずかしい/9.0333
63難し+形容詞+むずかし:難し+形容詞+むずかし/12.011
64難し+形容詞+がたし:難し+形容詞+がたし/15.10265
-:-
66-:-
67-:-
68-:-
69-:-
70-:-
71-:-
72
-:-/2.0334
73
</s>:</s>/3.2352
74
-:-/0.92999
75
-:-
-:-/0.7591
76
-:-
-:-/0.96042
77-:-
-:-/0.44738
78-:-
79-:-
80-:-
</s>:</s>/2.937981
し+助詞+し:し+助詞+し/7.1109
82し+動詞+し:し+動詞+し/4.303
83
し+助動詞+し:し+助動詞+し/8.9489
84
-:-/0.92999
-:-/0.7591
-:-/0.96042
-:-/0.44738
85
-:-/2.7129
-:-/0.19807
86-:-
87-:-
88-:-
89しい+動詞+しい:しい+動詞+しい/13.492
90い+動詞+い:い+動詞+い/5.4038
91
い+名詞+い:い+名詞+い/8.482992
-:-
93-:-
94-:-
95-:-
96-:-
97-:-
-:-/1.4471
い+動詞+い:い+動詞+い/5.3166
-:-/3.6442
い+動詞+い:い+動詞+い/7.5406
-:-/0.5648
い+名詞+い:い+名詞+い/6.0635 -:-/0.39499
</s>:</s>/1.5202
-:-/2.0396
</s>:</s>/4.1664-:-/0.66302
</s>:</s>/2.0186
152 / 163
fsu-logo
Speech Recognition
Remove Epsilons
To give a compact automata
0 1<s>
2日+名詞+にち/9.2194
3日+名詞+ひ/9.6789
4日+名詞+び/11.521
5日本+名詞+にっぽん/6.0293
6
本+接頭詞+ほん/10.048
7
本+名詞+ほん/9.9367
8
本+名詞+もと/17.558
本+接頭詞+ほん/10.216
本+名詞+ほん/10.105
本+名詞+もと/17.727
本+接頭詞+ほん/10.142
本+名詞+ほん/10.031
本+名詞+もと/17.652
10
語+名詞+かたり/11.85
9
語+名詞+ご/1.293
語+名詞+かたり/11.888
語+名詞+ご/10.143
語+名詞+かたり/10.726
語+名詞+ご/8.9813
語+名詞+かたり/9.2452
語+名詞+ご/7.5005
12
が+接続詞+が/11.789
13が+動詞+が/17.714
11
が+助詞+が/2.4758
が+接続詞+が/11.579
が+動詞+が/17.505
が+助詞+が/3.1075
15
難+形容詞+かた/15.784
16
難+接頭詞+なん/16.19
17
難+形容詞+がた/16.883
18
難しい+形容詞+むずかしい/10.121
20
難し+形容詞+がたし/16.19
19
難し+形容詞+むずかし/13.099
14
難+名詞+なん/13.194
難+形容詞+かた/15.126
難+接頭詞+なん/15.532
難+形容詞+がた/16.225
難しい+形容詞+むずかしい/9.4635
難し+形容詞+がたし/15.532
難し+形容詞+むずかし/12.441
難+名詞+なん/12.536
難+形容詞+かた/17.073
難+接頭詞+なん/17.478
難+形容詞+がた/18.171
難しい+形容詞+むずかしい/7.5998
難し+形容詞+がたし/17.478
難し+形容詞+むずかし/14.387
難+名詞+なん/11.889
22
し+助詞+し/7.87 23
し+動詞+し/5.0621
24
し+助動詞+し/9.708
25
しい+動詞+しい/14.251
し+助詞+し/8.0714
し+動詞+し/5.2634
し+助動詞+し/9.9094
しい+動詞+しい/14.453
し+助詞+し/7.5583
し+動詞+し/4.7504
し+助動詞+し/9.3963
しい+動詞+しい/13.94
21
</s>/3.2352
26い+動詞+い/5.6019
27
い+名詞+い/8.681
い+動詞+い/8.1168
い+名詞+い/11.196
し+助詞+し/8.0409
し+動詞+し/5.233
し+助動詞+し/9.8789
しい+動詞+しい/14.422
い+動詞+い/5.3166
い+名詞+い/9.9301
い+動詞+い/7.5406
い+名詞+い/12.127
い+動詞+い/5.9686
い+名詞+い/6.0635
</s>/1.5202
</s>/4.1664
</s>/2.0186
153 / 163
fsu-logo
Speech Recognition
Determinize
To share common prefixes and ensure n–best lists are unique
0 1<s>
2日本+名詞+にっぽん/6.0293
3
日+名詞+び/11.5214
日+名詞+にち/9.2194
5日+名詞+ひ/9.67896
語+名詞+ご/1.293
7
語+名詞+かたり/11.85
8
本+接頭詞+ほん/10.142
9
本+名詞+ほん/10.031
10
本+名詞+もと/17.652
本+接頭詞+ほん/10.048
本+名詞+ほん/9.9367
本+名詞+もと/17.558
本+接頭詞+ほん/10.216
本+名詞+ほん/10.105
本+名詞+もと/17.727
11が+助詞+が/3.1075
12
が+接続詞+が/11.579
13
が+動詞+が/17.505
が+助詞+が/2.4758
が+接続詞+が/11.789
が+動詞+が/17.714語+名詞+ご/10.143
語+名詞+かたり/11.888
語+名詞+ご/8.9813
語+名詞+かたり/10.726
語+名詞+ご/7.5005
語+名詞+かたり/9.2452
14
難しい+形容詞+むずかしい/7.5998
15
難し+形容詞+むずかし/14.387
16
難+名詞+なん/11.889
17
難し+形容詞+がたし/17.478
18
難+形容詞+かた/17.073
19
難+接頭詞+なん/17.478
20
難+形容詞+がた/18.171
難しい+形容詞+むずかしい/10.121
難し+形容詞+むずかし/13.099
難+名詞+なん/13.194
難し+形容詞+がたし/16.19
難+形容詞+かた/15.784難+接頭詞+なん/16.19
難+形容詞+がた/16.883
難しい+形容詞+むずかしい/9.4635
難し+形容詞+むずかし/12.441
難+名詞+なん/12.536
難し+形容詞+がたし/15.532
難+形容詞+かた/15.126
難+接頭詞+なん/15.532
難+形容詞+がた/16.225
21
</s>/3.2352
22
い+動詞+い/8.1168
23
い+名詞+い/11.196
24
し+助詞+し/8.0409
25
し+動詞+し/5.233
26
し+助動詞+し/9.8789
27
しい+動詞+しい/14.422
い+動詞+い/5.6019
い+名詞+い/8.681
し+助詞+し/7.87
し+動詞+し/5.0621
し+助動詞+し/9.708
しい+動詞+しい/14.251
し+助詞+し/8.0714
し+動詞+し/5.2634
し+助動詞+し/9.9094
しい+動詞+しい/14.453
し+助詞+し/7.5583
し+動詞+し/4.7504
し+助動詞+し/9.3963
しい+動詞+しい/13.94
</s>/4.1664
</s>/2.0186
い+動詞+い/5.3166
い+名詞+い/9.9301
い+動詞+い/7.5406
い+名詞+い/12.127
い+動詞+い/5.9686
い+名詞+い/6.0635
</s>/1.5202
154 / 163
fsu-logo
Speech Recognition
NBest Lists
To get n–best paths run shortestpath
There are n paths in the final machine
0
13-
16-
20
-
26
-
30-
12-
15-
19-
25-
29-
12 -3 </s>/1.5202
4い+動詞+い/7.9631
5
難しい+形容詞+むずかしい/9.31466 が+助詞+が/2.4758
7が+接続詞+が/14.101
8 が+動詞+が/19.368
9 語+名詞+ご/1.9249
10 語+名詞+かたり/9.2452
11 日本+名詞+にっぽん/6.0293<s>
14 日本+名詞+にっぽん/8.6338<s>
17語+名詞+ご/1.293
18 日本+名詞+にっぽん/6.0293<s>
21難し+形容詞+むずかし/15.241
22
が+助詞+が/4.4221
23語+名詞+ご/1.9249
24日本+名詞+にっぽん/6.0293<s>
27 語+名詞+ご/1.29328 日本+名詞+にっぽん/6.0293<s>
155 / 163
fsu-logo
Speech Recognition
Advantages
We have demonstrated how useful application can be rapidlydeveloped using WFSTs
Much easier to extend – for example changing the languagemodel in Chasen would be difficult but trivial in this previousexample
One framework and toolkit for nearly everything – not a mixof domain specific tools
156 / 163
fsu-logo
Speech Recognition
Summary of Section
Detailed description of the construction of an integratedspeech recognition WFST
Conversion of the knowledge sources to WFSTsRules of thumbOptimizationEvaluation on a large vocabulary task
Applying a similar WFST framework to POS tagging
157 / 163
fsu-logo
Speech Recognition
Thank You For Attending
Questions?
158 / 163
fsu-logo
References
References I
[1] C. Allauzen, M. Mohri, M. Riley, and B. Roark. A generalizedconstruction of integrated speech recognition transducers. InProc ICASSP, 2004.
[2] C. Allauzen, M Mohri, and B. Roark. Generalized algorithmsfor constructing statistical language models. In Proc. of 41stAnnual Meeting of the Assocication for ComputationalLinguistics, pages 40–47, 2003.
[3] C. Allauzen, M Riley, J. Schalkwyk, W. Skut, and M. Mohri.OpenFst: A general and efficient weighted finite-statetransducer library. In Proc. of the Ninth InternationalConference on Implementation and Application of Automata,(CIAA 2007), pages 11–23, 2007.
159 / 163
fsu-logo
References
References II
[4] C. M. Allauzen, M. Jansche, and M. Riley. OpenFst: Anopen-source, weighted finite-state transducer library and itsapplications to speech and language. In Tutorial, 2009.
[5] Diamantino Caseiro. Finite-State Methods in AutomaticSpeech Recognition. PhD thesis, Universidade Tecnica deLisboa, 2003.
[6] P. R. Dixon, D. A. Caseiro, T. Oonishi, and S. Furui. TheTitech large vocabulary WFST speech recognition system. InProc. ASRU, pages 1301–1304, 2007.
[7] P. R. Dixon, T. Oonishi, and S. Furui. Fast acousticcomputations using graphics processors. In Proc. ICASSP,pages 4321–4324, 2009.
160 / 163
fsu-logo
References
References III
[8] Paul R. Dixon, Tasuku Oonishi, and Sadaoki Furui.Harnessing graphics processors for the fast computation ofacoustic likelihoods in speech recognition. Comput. SpeechLang., 23(4):510–526, 2009.
[9] Bo-June Hsu and James Glass. Iterative language modelestimation: efficient data structure & algorithms. In Proc.INTERSPEECH, pages 841–844, 2008.
[10] S. Kanthak, H. Ney, M. Riley, and M. Mohri. A comparison oftwo LVR search optimization techniques. In Proc. ICSLP,pages 1309–1312, 2002.
[11] F. Mohri, M .and Pereira, O. Pereira, and M. Riley. Thedesign principles of a weighted finite-state transducer library.Theoretical Computer Science, 231:17–32, 2000.
[12] M. Mohri. Speech recognition lecture notes.http://www.cs.nyu.edu/~mohri/asr07/, 2007.
161 / 163
fsu-logo
References
References IV
[13] M. Mohri. Weighted automata algorithms. Handbook ofweighted automata, 2009.
[14] M. Mohri, F. Pereira, and M. Riley. Weighted finite-statetransducers in speech recognition. Computer Speech andLanguage, 16(1):69–88, 2002.
[15] M. Mohri, F. C. N Pereira, and M. Riley. Speech recognitionwith weighted finite-state transducers. Handbook on SpeechProcessing and Speech Communication, 2008.
[16] T. Oonishi, P. R. Dixon, K. Iwano, and S. Furui.Implementation and evaluation of fast on-the-fly WFSTcomposition algorithms. In Proc. INTERSPEECH, pages2110–2113, 2008.
[17] T. Oonishi, P. R. Dixon, K. Iwano, and S. Furui.Generalization of specialized on-the-fly composition. In Proc.ICASSP, pages 4317–4320, 2009.
162 / 163
fsu-logo
References
References V
[18] M Riley and C. Allauzen. Openfst source code.http://openfst.org/, 2007.
[19] Murat Saralar and Brian Roark. Utterance classification withdiscriminative language modeling. Speech Communication,48(3-4):276 – 287, 2006. Spoken Language Understanding inConversational Systems.
[20] Han Shu and Lee Hetherington. Training of finite-statetransducers and its application to pronunciation modeling.2002.
[21] http://openfst.org/. Openfst.
163 / 163