Finite-state Machines: Theory and ApplicationsWeighted Finite-state Automata
Thomas Hanneforth
Institut fur LinguistikUniversitat Potsdam
December 10, 2008
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 1 / 132
Overview
1 Semirings2 Weighted finite-state automata3 Semiring properties4 Closure properties and algebra of weighted finite-state automata5 Shortest-distance algorithms6 Equivalence transformations
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 2 / 132
Outline
1 Semirings
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 3 / 132
SemiringsAbstract weights
We add weights to finite-state machines to enhance their capabilities.
The notion of a weight is an abstract one: everything which fulfills theconditions of an abstract algebraic structure called a semiring can beused as a weight in a FSM.So weights can be:
I Real numbersI ProbabilitiesI DistancesI StringsI Feature structuresI SetsI MatricesI ...
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 5 / 132
SemiringsMonoid
Definition (Monoid)An algebraic structure 〈W, �, 1〉 is a monoid if:
1 � is a binary operation on W : � : W ×W 7→W2 � is associative: ∀x, y, z ∈W : x � y � z = (x � y) � z = x � (y � z)3 1 is the identity element for �: ∀x ∈W : x � 1 = 1 � x = x
Example (The following structures are monoids:)〈R,+, 0〉 (addition of real numbers)〈R, ·, 1〉 (multiplication of real numbers)〈R ∪ {∞},min,∞〉 (minimum of real numbers)〈2M ,∪, ∅〉 and 〈2M ,∩,M〉 for each set M〈Σ∗, ·, ε〉 (concatenation monoid)〈2Σ∗ , ·, {ε}〉 (formal language monoid)〈F ,t,⊥〉 (unification of feature structures, ⊥ = empty feature structure)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 6 / 132
SemiringsCommutative Monoid
Definition (Commutative monoid)A monoid 〈W, �, 1〉 is called commutative if ∀x, y ∈W : x � y = y � x
ExampleThe following monoids are commutative:
〈R,+, 0〉〈R, ·, 1〉〈R ∪ {∞},min,∞〉,〈F ,t,⊥〉,〈2M ,∪, ∅〉 and 〈2M ,∩,M〉 for each set M
The following monoids are not commutative:
〈Σ∗, ·, ε〉〈2Σ∗ , ·, {ε}〉
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 7 / 132
SemiringsDefinition
Definition (Semiring)
An algebraic structure K = 〈W,⊕,⊗, 0, 1〉 is a semiring if it fulfills thefollowing conditions:
1 〈W,⊕, 0〉 is a commutative monoid with 0 as the identity element for ⊕.2 〈W,⊗, 1〉 is a monoid with 1 as the identity element for ⊗.3 ⊗ distributes over ⊕:∀x, y, z ∈W : x⊗ (y ⊕ z) = (x⊗ y)⊕ (x⊗ z) (left distributivity)∀x, y, z ∈W : (y ⊕ z)⊗ x = (y ⊗ x)⊕ (z ⊗ x) (right distributivity)
4 0 is an annihilator for ⊗: ∀w ∈W,w ⊗ 0 = 0⊗ w = 0.
Notational conventionW is a nonempty set, called the carrier set of K.In the following, we will identify a semiring K with its carrier set.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 8 / 132
SemiringsCommon semirings
Example (Common semirings)
〈W,⊕,⊗,0,1〉 Name〈{t, f},∨,∧, f, t〉 boolean semiring〈R,+, ·, 0, 1〉 real semiring〈[0, 1],+, ·, 0, 1〉 probabilistic semiring〈R+∞,min,+,∞, 0〉 tropical semiring
〈[0, 1],max, ·, 0, 1〉 Viterbi semiring〈R∞,⊕log,+,∞, 0〉 log semiring x⊕log y = −ln(e−x + e−y)〈R−∞,max,+,−∞, 0〉 arctic semiring〈[0, 1],max,min, 0, 1〉 fuzzy semiring〈2Σ∗ ,∪, ·, ∅, {ε}〉 concatenation semiring 2Σ∗ = formal languages〈Σ∗ ∪ {s∞},∧, ·, s∞, ε〉 string semiring ∧ = longest common prefix〈2M ,∪,∩, ∅,M〉 set semiring M is an arbitrary set〈2F ,∪,t, ∅, {⊥}〉 unification semiring F = set of feature structures
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 9 / 132
SemiringsString semiring
String semiringThe string semiring is defined as 〈Σ∗ ∪ {s∞},∧, ·, s∞, ε〉. That means:
The longest common prefix operation ∧ plays the role of abstractaddition:x ∧ y = z if x = zx′ and y = zy′ (z is of maximum length)
We augment Σ∗ with a special string s∞ – the infinite string – which isthe identity element of ∧:∀x ∈ Σ∗ ∪ {s∞} : x ∧ s∞ = s∞ ∧ x = x
Examplecat ∧ car = cacat ∧ dog = εdog ∧ dogs = dogcat ∧ s∞ = cat
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 10 / 132
SemiringsIsomorphic semirings
Definition (Isomorphic semirings)
A semiring isomorphism is a function f such that f and f−1 are semiringhomomorphisms.Let K1 = 〈W1,⊕1,⊗1, 01, 11〉 and K2 = 〈W2,⊕2,⊗2, 02, 12〉 be semirings.A function f : K1 7→ K2 is a semiring homomorphism if:
f(11) = 12
f(a⊕1 b) = f(a)⊕2 f(b), for all a and b in K1
f(a⊗1 b) = f(a)⊗2 f(b) for all a and b in K1
Example (Isomorphic semirings)
The tropical and Viterbi semirings are isomorphic under the −ln().The real and log semirings are isomorphic under the ln() function.
Practical importance of semiring isomorphismsNormally, we −log-transform probabilities to avoid numerical underflow.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 11 / 132
SemiringsOther interesting semirings: matrix semiring
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 12 / 132
SemiringsOther interesting semirings: entropy semiring
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 13 / 132
Outline
2 Weighted finite-state automata
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 14 / 132
Weighted Finite-state AutomataWeighted finite-state acceptors
Definition (Weighted finite-state acceptor)A weighted finite-state acceptor (WFSA) A = 〈Σ, Q, q0, F, E, λ, ρ〉 over asemiring K is a 7-tuple with:
Σ, the finite input alphabetQ, the finite set of statesq0 ∈ Q, the start stateF ⊆ Q the set of final statesE ⊆ Q× (Σ ∪ {ε})×K ×Q, the set of transitionsλ ∈ K, the initial weight andρ : F 7→ K, the final weight function
NotationIf t is a transition ∈ E, we denote with w[t] the weight of t and l[t] the label∈ Σ ∪ {ε} of t.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 16 / 132
Weighted Finite-state AutomataWeighted finite-state acceptors
Example (WFSA accepting/generating politicians’ speeches)
� ����
����
���
���
���
Under the assumption that we add weights along a path (and in the endadd the weight of the reached final state), this WFSA defines thefollowing weighted language:L(Ablabla) = {bla→ 1, blabla→ 2, blablabla→ 3, . . . , blan → n}.This means: Ablabla can measure the length of these speeches.
This in turn shows: Weighted finite-state automata can count!
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 17 / 132
Weighted Finite-state AutomataWeighted finite-state acceptors
Example (WFSA mapping words to word indices)
�
����
��
���
����
�
�
�
���
�� ��
�����
���
�����
��
���
����
����
���
���
�����
�����
���
�
���
���
���
���
���
��
����
���
Again, if we add weights along a path we obtain for each accepted worda unique number, for example: fox 7→ 2 + 4 + 0 + 0 = 6.
This technique is also called perfect hashing.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 18 / 132
Weighted Finite-state AutomataWeighted finite-state transducers
Definition (Weighted finite-state transducer)A weighted finite-state transducer (WFST) A = 〈Σ,∆, Q, q0, F, E, λ, ρ〉 overa semiring K is a 8-tuple with
Σ, the finite input alphabet
∆, the finite output alphabet
Q, the finite set of states
q0 ∈ Q, the start state
F ⊆ Q the set of final states
E ⊆ Q× (Σ ∪ {ε})× (∆ ∪ {ε})×K ×Q, the set of transitions
λ ∈ K, the initial weight and
ρ : F 7→ K, the final weight function mapping final states to elements inK
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 19 / 132
Weighted Finite-state AutomataWeighted finite-state transducers
Example
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 20 / 132
Weighted Finite-state AutomataWeight of a string - the Master formula
Definition (Master formula)Let A = 〈Σ, Q, q0, F, E, λ, ρ〉 be a WFSA and K = 〈W,⊕,⊗, 0, 1〉 thesemiring associated with A. Let π = t1t2 . . . tk be a path in A, that is, asequence of adjacent transitions.Let ω[π] be the abstract multiplication of the weights along π:ω[π] = w[t1]⊗ w[t2]⊗ . . .⊗ w[tk].Let x ∈ Σ∗ be a string, such that x = l[t1] · l[t2] · . . . · l[tk].Let Π(P, x, P ′) be the set of paths starting at states in P , ending in states inP ′ and thereby deriving the string x.The weight JAK(x) assigned to a given string x is defined as follows:
JAK(x) =⊕
f∈F,π∈Π({q0},x,{f})
λ⊗ ω[π]⊗ ρ(f)
JAK(x) = 0, if Π({q0}, x, F ) = ∅.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 21 / 132
Weighted Finite-state AutomataWhat does the Master formula mean?
For each path labeled with x, starting at the start state and reaching afinal state, we combine a constant weight λ, the weight of the individualtransitions and the weight assigned to that final state by abstractmultiplication.
There may be more than one path for x, if A is non-deterministic.In that case, we abstractly add the weights of all paths.
This in turn means: if A is deterministic there is at most on path for x inA. The contribution of
⊕can then be neglected.
If there is no path for x, the weight assigned to x is 0.Thus, 0 – the identity element of ⊕ – signals, that a certain word x is notaccepted by A.
Note that JAK(x) is a function mapping x ∈ Σ∗ to elements in K.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 22 / 132
Weighted Finite-state AutomataMaster formula
Example (Weight interpretation)
�
�
�����
������
�
�����
�����
���������
���������
����������
����
�����
WFSA A with numerical weights(λ = 1 in all cases)
JAK(ba) =
Weights interpreted in the tropical semiring:min({0.5+0.7+0.4, 0.6+0.6+0.7}) = 1.6
Weights interpreted in the probabilistic SR:0.5 · 0.7 · 0.4 + 0.6 · 0.6 · 0.7 = 0.392
Weights interpreted in the Viterbi semiring:max({0.5 · 0.7 · 0.4, 0.6 · 0.6 · 0.7}) = 0.252
Weights interpreted in the (max,min) SR:max({min({0.5, 0.7, 0.4}),
min({0.6, 0.6, 0.7})}) = 0.6
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 23 / 132
Weighted Finite-state AutomataMaster formula
Example (Alieben representing some forms of German verb lieben)
� ������
������
�����
����� ��
�����
������
�
�����
� ������
������
�������
� ������
� ������
�
���� �
� ������
� ������
� ������
�����
�����
������
��
�����
��
�����
�����
�����
�������
����
����
JAliebenK(liebst) = ε · ε · ε · V erb · ε · ε · pres 2 sg
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 24 / 132
Weighted Finite-state AutomataWhy are semirings a suitable weight structure for finite-state automata?
The fact that the semiring is composed out of two monoids gives us:I two identity elements 0 and 1: 0 signals that some string is not part of the
language accepted by the WFSM, and 1, which gives us a notion of a“trivial” weight
I a high degree of freedom in which order to compute weights (byassociativity)
Commutativity of ⊕ gives us the freedom to consider the paths accordingto the master formula in any order
Distributivity of ⊗ enables us to factor out common “weight prefixes” ofdifferent paths (this property is for example important for weighteddeterminization)
The annihilation relationship between ⊗ and 0 causes any path whichcontains a transition with weight 0 to be invalid
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 25 / 132
Weighted Finite-state AutomataSignificance of semirings
Semirings play an import role in processing tasks based on finite-stateautomata, but are not limited to that.
The tropical semiring is the classical semiring of all kinds ofshortest-distance / best-analysis problems
The probabilistic semiring plays a important role in statistical languageprocessing for computing string probabilities.
The same is true for the Viterbi semiring, since it is related to processingtasks where we are interested in the analysis with the highest probability.
The string semiring is used in string processing tasks, computationalmorphology.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 26 / 132
Weighted Finite-state AutomataWeighted rational languages
Weighted rational languages as monoid homomorphisms
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 27 / 132
Weighted Finite-state AutomataWeighted regular relations
Definition (Weighted regular relation)1
2
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 28 / 132
Outline
3 Semiring properties
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 29 / 132
Semiring propertiesClosureAn important operation on semiring elements is the closure operation ∗.
Definition (Closure)Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring. The closure a∗ of a ∈W is defined as:
a∗ =∞⊕n=0
an = 1⊕ a⊕ (a⊗ a)⊕ (a⊗ a⊗ a) . . . = 1⊕ a⊕ a2 ⊕ a3 . . .
Example (Closures in different semirings)Semiring a a∗
tropical 0.5 0.5∗ = 0 min 0.5 min (0.5 + 0.5) min . . . = 0Viterbi 0.5 0.5∗ = 1 max 0.5 max (0.5 · 0.5) max . . . = 1real 1
2
(12
)∗ = 1 + 12 + 1
4 + 18 + . . . = 2
string cat cat∗ = ε ∧ cat ∧ cat · cat ∧ . . . = εconcatenation {a, b} {a, b}∗ = {ε} ∪ {a, b} ∪ ({a, b} · {a, b}) ∪ . . . =
{ε, a, b, aa, ab, ba, bb, . . .}unification {[f :a]} {[f :a]}∗ = {⊥} ∪ {[f :a]} ∪ . . . = {⊥, [f :a]}
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 31 / 132
Semiring propertiesSemirings differ with respect a number of formal properties.
Definition (Semiring properties)
Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring.
Finiteness: K is said to be finite if W is finite.
Commutativity: K is said to be commutative if ⊗ is commutative:∀x, y ∈W : x⊗ y = y ⊗ x. Note that ⊕ is commutative by definition.
Idempotency: K is said to be idempotent if ⊕ is idempotent:∀x ∈W : x⊕ x = x
Boundedness: K is said to be bounded if 1 is an annihilator for ⊕:∀x ∈W : 1⊕ x = x⊕ 1 = 1k-Closedness: K is said to be k-closed (for a fixed integer k) if
∀x ∈W :k⊕
n=0
xn =k+1⊕n=0
xn
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 32 / 132
Semiring properties
Properties of individual semiringsThe tropical, Viterbi and string semirings are idempotent, bounded and0-closed:
+idempotent +bounded 0-closedtropical a min a = a a min 0 = 0 a0 = a1 = 0 min a = 0Viterbi a max a = a a max 1 = 1 a0 = a1 = 1 max a = 1string a ∧ a = a a ∧ ε = ε a0 = a1 = ε ∧ a = ε
The real semiring is not idempotent, not bounded and not k-closed:
−idempotent −bounded −k-closedReal a+ a = 2a 6= a a+ 1 6= a ∞∑
n=0
an =
{∞ if a ≥ 1
11−a if a < 1
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 33 / 132
Semiring properties
Theorem (Bounded semirings)1 In a bounded semiring, a∗ = 1.2 Every 0-closed semiring is bounded.
Proof.1
a∗ =∞⊕n=0
an =
2
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 34 / 132
Semiring propertiesClosed semiring
Definition (Closed semiring)
Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring. K is called closed if
Every finite or infinite sum of elements ai ∈W is in W too:
∀{a1, . . . , an} ⊆W :n⊕i=1
ai ∈W
Associativity and commutativity hold for countable sums:
Distributivity hold for countable sums:
∀{a1, . . . , an}, {b1, . . . , bm} ⊆W :n⊕i=1
ai ⊗m⊕j=1
bj =n⊕i=1
( m⊕j=1
(ai ⊗ bj))
NoteClosed semirings play an important role in shortest-distance algorithms.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 35 / 132
Semiring properties
Definition (Semiring properties)
Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring.
Left divisibility: K is said to be left divisible if ∀x ∈W , x 6= 0,∃y ∈W such that y ⊗ x = 1 (that is, every element has a left inverse)
Weak left divisibility: K is said to be weakly left divisible if∀x, y ∈W,x⊕ y 6= 0, ∃z ∈W : x = (x⊕ y)⊗ z.If z is unique we write z = (x⊕ y)−1 ⊗ x.
0-divisor-freeness: K is said to be 0-divisor-free if¬∃x, y ∈W,x, y 6= 0, such that x⊗ y = 00-sum-freeness: K is said to be 0-sum-free if¬∃x, y ∈W,x, y 6= 0, such that x⊕ y = 0
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 36 / 132
Semiring propertiesExample (Weak left divisibility)
∀x,y ∈W,x⊕ y 6= 0, ∃z ∈W : x = (x⊕ y)⊗ z.
In the tropical semiring: x = 0.5, y = 0.2, z = 0.3(0.5 min 0.2) + 0.3 = 0.5−(0.5 min 0.2) + 0.5 = −0.2 + 0.5 = 0.3a−1 = −a
In the real semiring: x = 310 , y = 2
10 , z = 35
( 310 + 2
10) · 35 = 3
101
310
+ 210
· 310 = 2 · 3
10 = 35
a−1 = 1a
In the string semiring: x = cats, y = cars, z = ts(cats ∧ car) · ts = ca · ts = cats(cats ∧ car)−1 · cats = ca−1 · cats = tsa−1 = inverse strings
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 37 / 132
Semiring propertiesWeak left divisibility and determinization of WFSAs
Weak left divisibility is an important property for determinizing weightedautomata.
Example (Determinization in the tropical semiring)
�
������
�
�����
���������
����
�����
Nondeterministic WFSA realizing theweighted language {ab 7→ 0.9, ac 7→ 0.6}
� ��������
�����������
�����
������
Deterministic WFSA
The value of the c-transition in the deterministic WFSA is computed in the following way:−(0.5 min 0.3) + 0.5 + 0.1 = 0.3
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 38 / 132
Semiring propertiesLeft and right semirings
Definition (Left and right semirings)A semiring K is called a left semiring if it is left distributive.A semiring K is called a right semiring if it is right distributive.
Example (String semiring)The string semiring is only a left semiring:x · (cat ∧ car) = xcat ∧ xcar = xca(cat ∧ car) · s = ca · s = cas 6= cats ∧ cars = ca
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 39 / 132
Semiring propertiesNatural order
Definition (Natural order)Let K be an idempotent semiring. We may define a partial order ≤K – calledthe natural order – on the elements in K in the following way:
a ≤K b ≡ a⊕ b = a
Theorem (≤K is a partial order)≤K is reflexive, antisymmetric and transitive.
Proof.1 Reflexivity:2 Antisymmetry:3 Transitivity:
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 40 / 132
Semiring propertiesNatural order
Based on the notion of a natural order we can define more semiring properties:
DefinitionLet K = 〈W,⊕,⊗, 0, 1〉 be a semiring. Let ≤K be a partial order on theelements in K.
Negativity: K is said to be negative if 1 ≤K 0.K is said to be positive if 0 ≤K 1.
Monotonicity: K is said to be monotonic if for all x, y, z ∈W :(x ≤K y)→ (x⊕ z ≤K y ⊕ z)(x ≤K y)→ (x⊗ z ≤K y ⊗ z)(x ≤K y)→ (z ⊗ x ≤K z ⊗ y)Superiority: K is said to be superior if for all x, y ∈W :x ≤K x⊗ y andy ≤K x⊗ y
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 41 / 132
Semiring propertiesNatural order: significance
Superiority and monotonicity are important properties for shortest-distancealgorithms on WFSA:
Superiority, that is x ≤K x⊗ y, means that the weight x of a string willnot get better if you multiply it with the weight y of another transition.
Monotonicity ensures an optimal substructure of shortest-distanceproblems: a problem where we have to decide whether x⊗ z ≤K y ⊗ zholds, can be reduced to the question whether x ≤K y holds.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 42 / 132
Semiring properties
Theorem1 Every superior semiring is negative.2 Every bounded semiring is idempotent.
Proof.1
∀a ∈ K : 1 ≤ 1⊗ a = a (by superiority)∀a ∈ K : a ≤ 0⊗ a = 0 (by superiority and annihilator property)∀a ∈ K : 1 ≤ a ≤ 0
2
a⊕ 1 = a (Boundedness)1⊕ 1 = 1 (a = 1)a⊕ a = a (multiplication with a)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 43 / 132
Semiring propertiesSummary: Semirings and their properties
〈W,⊕,⊗,0,1〉 Properties〈{t, f},∨,∧, f, t〉 boolean semiring: finite, commutative, positive, idem-
potent, bounded, 0-closed, closed〈R,+, ·, 0, 1〉 real semiring: infinite, commutative, non-idempotent,
monotonic, non-bounded, non-k-closed, non-closed〈R∞,⊕log,+,∞, 0〉 log semiring: infinite, commutative, non-idempotent,
monotonic, non-bounded, non-k-closed, non-closed〈R+∞,min,+,∞, 0〉 tropical semiring: infinite, commutative, negative,
idempotent, monotonic, 0-closed, bounded〈[0, 1],max, ·, 0, 1〉 Viterbi semiring: infinite, commutative, negative, idem-
potent, monotonic, 0-closed, bounded〈R−∞,max,+,−∞, 0〉 arctic semiring〈[0, 1],max,min, 0, 1〉 fuzzy semiring
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 44 / 132
Semiring propertiesSummary: Semirings and their properties
〈W,⊕,⊗,0,1〉 Properties〈2Σ∗ ,∪, ·, ∅, {ε}〉 concatenation semiring: infinite, non-commutative,
negative, idempotent, non-k-closed, closed, natural or-der: ⊆
〈Σ∗,∧, ·, ∅, ε〉 string semiring: infinite, left semiring, non-commutative, negative, idempotent, monotonic,bounded, 0-closed, natural order: is prefix of
〈2F ,∪,t, ∅, {⊥}〉 unification semiring: infinite, left semiring, commu-tative, negative, idempotent, monotonic, bounded, 1-closed, natural order: v (subsumption)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 45 / 132
Semiring propertiesSemiring properties and FSM algebra
Some operations of the FSM algebra require certain properties of theunderlying semiring:
Intersection and composition are only defined for FSMs based oncommutative semirings.
Composition of WFSTs with ε : x or x : ε transitions must beparameterized for non-idempotent semirings.
Shortest-distance algorithms usually require idempotent and negativesemirings
Weighted determinization is only defined for weakly left-divisiblesemirings.
Making an FSM connected (remove all paths with path weight 0) usuallyrequires 0-divisor-freeness
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 46 / 132
Outline
4 Closure properties and algebra of weighted finite-state automataWeighted closureWeighted intersectionWeighted compositionAlgebra of WRLs and WRTs
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 47 / 132
Closure properties and algebra of WFSA
Weighted finite-state acceptors are closed under the following operations:
Union
Concatenation
Closure
Intersection
Difference with unweighted finite-state acceptors
Reversal
Substitution / homomorphism
Cross product
Weighted finite-state acceptors are not closed under:
Complementation
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 49 / 132
Closure properties and algebra of WFSM
The set of weighted finite state transducers is closed under
Union
Concatenation
Closure
Reversal
Projection (note that this leads to FSAs)
Composition
Inversion
Weighted finite state transducers are not closed under
Complementation
Intersection (but acyclic and ε-free transducers are)
Difference
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 50 / 132
Closure properties and algebra of WFSAWeighted closure
Definition (Weighted closure)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 be a weighted finite-state acceptor withunderlying semiring K = 〈W,⊕,⊗, 0, 1〉.The closure of A – denoted by A∗ – is defined as:
A∗ = 〈Q ∪ {q′0},Σ, q′0, F ∪ {q′0}, E ∪ Eε, λ, ρ′〉
with
Eε = {〈q, ε, ρ(q), q0〉 | ∀q ∈ F} ∪ {〈q′0, ε, 1, q0〉}
and
ρ′(q) =
{ρ(q) ∀q ∈ F1 q = q′0
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 51 / 132
Closure properties and algebra of WFSAWeighted closure
Example (Weighted closure in the tropical semiring)
���
� ������
���
�
���
���
A
��� ����
���
���
���
����
���
��
���
���
A∗
Example (Weighted closure in the probabilistic semiring)
�����
A
�����
�����
������
A∗
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 52 / 132
Closure properties and algebra of WFSAWeighted intersection
Definition (Intersection of two weighted regular languages)Let A1 and A2 be two weighted finite-state acceptors over alphabets Σ1 andΣ2, resp. The intersection A1 ∩A2 is defined language-theoretically in thefollowing way:
∀x ∈ Σ∗1 ∩ Σ∗2 : JA1 ∩A2K(x) = JA1K(x)⊗ JA2K(x)
NotesIf one of the automata does not accept x, the corresponding weight is 0.By the annihilation property of the semiring, JA1∩A2K(x) will be also 0.
The weight combining operation must indeed be ⊗.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 53 / 132
Closure properties and algebra of WFSAWeighted intersection
Definition (Weighted intersection of two finite-state acceptors)Let A1 = 〈Q1,Σ1, q01 , F1, E1, λ1, ρ1〉 and A2 = 〈Q2,Σ2, q02 , F2, E2, λ2, ρ2〉be two WFSA. A1∩A2, the intersection of A1 and A2, is a weighted acceptor:
A = 〈Q1 ×Q2,Σ1 ∩ Σ2, 〈q01 , q02〉, F1 × F2, E, λ1 ⊗ λ2, ρ〉 where
〈〈p, q〉, a, w1 ⊗ w2, 〈p′, q′〉〉 ∈ Eif 〈p, a, w1, p
′〉 ∈ E1 and 〈q, a, w2, q′〉 ∈ E2, for all a ∈ Σ1 ∩ Σ2.
ρ(〈p, q〉) = ρ1(p)⊗ ρ2(q),∀p ∈ F1, ∀q ∈ F2
CommentsThe individual weights of the two operands are abstractly multipliedduring intersection.Note that weighted intersection – unlike unweighted intersection – is nolonger idempotent: A ∩A 6= A.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 54 / 132
Closure properties and algebra of WFSAWeighted intersection
Example (Weighted intersection in the probabilistic SR)
� ������ ������ ������ ����� ��������� ∩�
������
�����
���������
����
������
�����
����
=
� ������ ������ ����� ����� ����� ����
NoteIntersecting (or composing) a trivially (1-) weighted, acyclic WFSA representing an input string (or a finiteset of input strings) with a cyclic weighted finite-state machine is the typical scenario for HMM-styleprocessing tasks like tagging etc.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 55 / 132
Closure properties and algebra of WFSAWeighted intersection: why is it only possible in commutative semirings?
1 Let A1 and A2 be two ε-free WFSA.2 Suppose that some input x is in A1 ∩A2 with JA1 ∩A2K(x) 6= 0.3 Let t11 . . . t
1n and t21 . . . t
2n be accepting paths in A1 and A2, respectively
4 The algebraic definition of intersection is:JA1 ∩A2K(x) = (JA1K(x)⊗ JA2K(x).In our example that means:JA1 ∩A2K(x) = (w[t11]⊗ . . .⊗ w[t1n])⊗ (w[t21]⊗ . . .⊗ w[t2n])
5 The weight of the corresponding path in A1 ∩A2 is(w[t11]⊗ w[t21])⊗ (w[t12]⊗ w[t22])⊗ . . .⊗ (w[t1n]⊗ w[t2n])
6 But 4. and 5. are the same only if the underlying semiring of A1 and A2
is commutative.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 56 / 132
Closure properties and algebra of WFSAWeighted intersection in non-commutative semirings?
This result limits the usefulness of intersection and composition innon-commutative semirings like the string semiring.
On the other hand, if one of the operands in A1 ∩A2 is triviallyweighted, then no problem arises.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 57 / 132
Closure properties and algebra of WFSAWeighted composition
Definition (Composition of two weighted regular relations)Let T1 and T2 be two weighted finite-state transducers over alphabets Σ1, ∆1
and Σ2, ∆2, resp. The composition T1 ◦ T2 is defined in the following way:
∀x ∈ Σ∗1, y ∈ ∆∗2 : JT1 ◦ T2K(x, y) =⊕
z∈∆∗1∩Σ∗2
JT1K(x, z)⊗ JT2K(z, y)
ExampleConsider the two weighted relations in the real semiringR1 = {〈abcd, ad〉 7→ 1} andR2 = {〈ad, dea〉 7→ 1}.According to the definition above,R1 ◦R2 = {〈abcd, dea〉 7→ 1} with z = ad.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 58 / 132
Closure properties and algebra of WFSAWeighted composition
Definition (Composition of two weighted finite-state transducers)Let T1 = 〈Q1,Σ1,∆1, q01 , F1, E1, λ1, ρ1〉 andT2 = 〈Q2,Σ2,∆2, q02 , F2, E2, λ2, ρ2〉 be two WFSTs.T1 ◦ T2, the composition of T1 and T2, is a transducer:
T = 〈Q1 ×Q2,Σ1,∆2, 〈q01 , q02〉, F1 × F2, E ∪Eε ∪Ei,ε ∪Eo,ε, λ1 ⊗ λ2, ρ〉where
1 E = {〈〈p, q〉, a, b, w1 ⊗ w2, 〈p′, q′〉〉 |∃c ∈ ∆1 ∩ Σ2 : 〈p, a, c, w1, p
′〉 ∈ E1 ∧ 〈q, c, b, w2, q′〉 ∈ E2}
2 Eε = {〈〈p, q〉, a, b, w1 ⊗ w2, 〈p′, q′〉〉 |〈p, a, ε, w1, p
′〉 ∈ E1 ∧ 〈q, ε, b, w2, q′〉 ∈ E2}
3 Ei,ε = {〈〈p, q〉, ε, a, w, 〈p, q′〉〉 | 〈q, ε, a, w, q′〉 ∈ E2 ∧ p ∈ Q1}4 Eo,ε = {〈〈p, q〉, a, ε, w, 〈p′, q〉〉 | 〈p, a, ε, w, p′〉 ∈ E1 ∧ q ∈ Q2}5 ρ(〈p, q〉) = ρ1(p)⊗ ρ2(q),∀p ∈ F1, ∀q ∈ F2
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 59 / 132
Closure properties and algebra of WFSAWeighted composition
Example (Weighted composition in the prob. SR (non-idempotent))
� ��������
��������
������
������������
T1
◦ � ��������
�������
�����������
T2
����� �����������
����������
��� �����
����
�����
�����
����
������������
�����������
������ ��������������
T1 ◦ T2
T1 ◦ T2 corresponds to the weighted relation {〈abcd, dea〉 7→ 4}
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 60 / 132
Closure properties and algebra of WFSAWeighted composition: ε-filter FSTs
The problem is, that there are 4 paths deriving the string pair bc : e withweight 1.0
By the master formula, paths weights are added
In idempotent semirings this doesn’t cause harm, since a⊕ a = a.But in non-idempotent semirings, we are faced with a problem
Solution: introduce a filter such that all paths except one are deleted
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 61 / 132
Closure properties and algebra of WFSAWeighted composition: ε-filter FSTs
Example (ε-filter)
�
�����
������������
����
�����
������
�����
����
ε-filter Fε
ε-transitions in T1 are renamed to ε1, while ε-transitions in T2 arerenamed to ε2
The composition T1 ◦ T2 is replaced by T ′1 ◦ Fε ◦ T ′2where T ′1 and T ′2 are the ε-renamed versions of the original operands.Note that ε-renaming and filtering may be incorporated into thecomposition algorithm.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 62 / 132
Closure properties and algebra of WFSAWeighted composition: ε-filter FSTs
Example (Composition with ε-filter)
�
�������
������
�������
�������
�������
������
�������
�����
�������
T ′1
�
�������
������
�������
�������
�������
�����
�������
T ′2
������� ������������
������
����
���������
�����
�����
� ����������
� ���������
����� ��������
T ′1 ◦ Fε ◦ T ′2The final connect-step (which will remove the gray states and transitions) has been omitted.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 63 / 132
Closure properties and algebra of WFSAAlgebra of WRLs and WRTs
Definition (Algebra of WRLs)Let L and M denote two WRLs mapping Σ∗ → K.
singleton {u}(x) = 1 iff u = x
union (sum) (L ∪M)(x) = L(x)⊕M(x)
concatenation (L ·M)(x) =⊕uv=x
L(u)⊗M(v)
scaling kL(u) = k ⊗ L(u)
power L0(ε) = 1
L0(x 6= ε) = 0
Ln+1(x) = (L · Ln)(x)
closure L∗(x) =⊕k≥0
Lk(x)
crossproduct (L×M)(x, y) = L(x)⊗M(y)
intersection (L ∩M)(x) = L(x)⊗M(x)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 64 / 132
Closure properties and algebra of WFSAAlgebra of WRLs and WRTs
Definition (Algebra of WRTs)Let S and Q denote two WRTs mapping Σ∗ ×∆∗ → K and ∆∗ × Γ∗ → K, respectively.
singleton {(u, v)}(x, y) = 1 iff u = x and v = y
union (sum) (S ∪ Q)(x, y) = S(x, y)⊕ Q(x, y)
concatenation (S · Q)(w, z) =⊕
uv=w,xy=z
S(u, x)⊗ Q(v, y)
scaling kQ(u, v) = k ⊗ Q(u, v)
power Q0(x, y) =
{1 if x = ε ∧ y = ε
0 otherwise
Qn+1(x, y) = (Q · Qn)(x, y)
closure Q∗(x, y) =⊕k≥0
Qk(x, y)
composition (S ◦ Q)(x, y) =⊕z∈∆∗
S(x, z)⊗ Q(z, y)
1st projection π1(S)(x) =⊕y∈∆∗
S(x, y)
2nd projection π2(S)(y) =⊕x∈Σ∗
S(x, y)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 65 / 132
Closure properties and algebra of WFSAAlgebra of WRLs and WRTs
Example (Application)
Q[L](x) =
π2(ID(L) ◦ Q)(x)
π2(⊕x∈∆∗
ID(L(x))⊗ Q(x, y))
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 66 / 132
Outline
5 Distance algorithmsAll-pairs-distances algorithmSingle-source-distances algorithms
DAG-Distance
Computing best pathsε-RemovalWeight PushingWeighted minimization
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 67 / 132
Distance algorithms
(Shortest-)Distance algorithms are very important in NLP fordisambiguating results.There are generally two classes of algorithms:
I All-pair-distance algorithms compute the distance between every pair ofstates.
I Single-source-distance algorithms compute the distance from a specificstate – most often the start state – to any other state.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 69 / 132
Distance algorithms
Definition (Semiring distance)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 a WFSA with underlying semiringK = 〈W,⊕,⊗, 0, 1〉.Let Π(p, q) denote the (possibly infinite) set of paths originating in p andending in q. If π = t1 . . . tk is a path in Π(p, q), let w[π] denote the weight ofthis path: w[π] = w[t1]⊗ w[t2]⊗ . . .⊗ w[tk].The distance ∆(p, q) between states p, q ∈ Q is defined in the following way:
∆(p, q) =⊕
π∈Π(p,q)
w[π]
In idempotent semirings we call the distance between p and q also theshortest distance.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 70 / 132
Distance algorithmsClosed semiring: recapitulation of the definition
Definition (Closed semiring)
Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring. K is called closed if
Every finite or infinite sum of elements ai ∈W is in W too:
∀{a1, . . . , an} ⊆W :n⊕i=1
ai ∈W
Associativity and commutativity hold for countable sums:
Distributivity hold for countable sums:
∀{a1, . . . , an}, {b1, . . . , bm} ⊆W :n⊕i=1
ai ⊗m⊕j=1
bj =n⊕i=1
( m⊕j=1
(ai ⊗ bj))
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 71 / 132
Distance algorithmsAll-pairs-distances algorithm
All-pairs-distances(A)
Require: A WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 with closed semiring K = 〈W,⊕,⊗, 0, 1〉Ensure: A matrix d|Q| of size |Q| × |Q| with the distances between states qi and qj .1: for i← 1 to |Q| do2: for j ← 1 to |Q| do3: d0[i, j]← w[i, j]4: for k ← 1 to |Q| do5: for i← 1 to |Q| do6: for j ← 1 to |Q| do7: d(k)[i, j]← d(k−1)[i, j]⊕
(d(k−1)[i, k]⊗ (d(k−1)[k, k])∗ ⊗ d(k−1)[k, j]
)Definition (Initial distance)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 a WFSA with closed semiring K = 〈W,⊕,⊗, 0, 1〉.The initial distance w[i, j] between states qi and qj ∈ Q is defined as:
w[i, j] =
⊕
〈qi,a,w,qj〉∈Ew if i 6= j
1⊕⊕
〈qi,a,w,qj〉∈Ew if i = j
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 72 / 132
Distance algorithmsAll-pairs-distance algorithm
NotesLines 1 to 3 initialize the matrix. There are four cases to consider:
I We know that by definition every state is reachable from itself by ε.Since the weight associated with ε is 1, we initialize d[i, i] with 1.
I If in addition there is some transition 〈qi, a, w, qi〉 ∈ E, we abstractly addw to d[i, i].
I For the state pairs qi 6= qj there are two possibilities: if there is a transitionfrom qi to qi with weight w in E, d[i, j] is w. Otherwise, d[i, j] is 0.
Lines 4 to 7 compute a sequence of matrices (see next slide).
The time complexity is in O(|Q|3), since the algorithm uses three nestedloops.
The space complexity is in O(|Q|2), since the algorithm uses a|Q| × |Q|-matrix.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 73 / 132
Distance algorithmsAll-pairs-distance algorithm: idea
Let π be a (possibly infinite) path between states qi and qj such that the index lof all intermediate states ql is less or equal to some constant k (1 ≤ k ≤ |Q|).Now consider state qk:
If qk is not an intermediate state on path π, then all the indices of allintermediate states on π are less or equal to k − 1.If qk is an intermediate state on path π, the indices of the intermediatestates of paths π1 = qi qk, π2 = qk qk and π3 = qk qj are thenall less or equal to k − 1.
��
���������
���
�����
�����
Recursive equation:
d(k)[i, j] =
{w[i, j] if k = 0
d(k−1)[i, j]⊕(d(k−1)[i, k]⊗ (d(k−1)[k, k])∗ ⊗ d(k−1)[k, j]
)if k > 0
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 74 / 132
Distance algorithmsAll-pairs-distance algorithm: modificationsSince a∗ equals 1 in bounded semirings, we may simplify line 7 of thealgorithm as follows:
d(k)[i, j]← d(k−1)[i, j]⊕(d(k−1)[i, k]⊗ d(k−1)[k, j]
)Example (Path weight computation in bounded semirings)Tropical semiring:
d(k)[i, j]← d(k−1)[i, j] min(d(k−1)[i, k] + d(k−1)[k, j]
)Viterbi semiring:
d(k)[i, j]← d(k−1)[i, j] max(d(k−1)[i, k] · d(k−1)[k, j]
)Boolean semiring:
d(k)[i, j]← d(k−1)[i, j] ∨(d(k−1)[i, k] ∧ d(k−1)[k, j]
)Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 75 / 132
Distance algorithmsAll-pairs-distance algorithm: negative cost cycles
Suppose, we admit negative weights in the tropical semiring:〈R∞,min,+,∞, 0〉.Then the semiring is no longer closed.Nevertheless, negative weights may be allowed in WFSA under certaincircumstances.
Definition (Closed WFSA)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 a WFSA with tropical semiring〈R∞,min,+,∞, 0〉.A is called closed if the weight of each cycle in A is ≥ 0.
Example (Negative cost cycle)
� ������ ������
����� ����
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 76 / 132
Distance algorithmsAll-pairs-distance algorithm
Example (All-pairs-distance algorithm in the tropical SR)
�
���
�
�
�
�
��
��
�
��
�
d0
=
0 3 8 ∞ −4∞ 0 ∞ 1 7∞ 4 0 ∞ ∞2 ∞ −5 0 ∞∞ ∞ ∞ 6 0
d1
=
0 3 8 ∞ −4∞ 0 ∞ 1 7∞ 4 0 ∞ ∞2 5 −5 0 −2∞ ∞ ∞ 6 0
d2
=
0 3 8 4 −4∞ 0 ∞ 1 7∞ 4 0 5 112 5 −5 0 −2∞ ∞ ∞ 6 0
d3
=
0 3 8 4 −4∞ 0 ∞ 1 7∞ 4 0 5 112 −1 −5 0 −2∞ ∞ ∞ 6 0
d4
=
0 3 −1 4 −43 0 −4 1 −17 4 0 5 32 −1 −5 0 −28 5 1 6 0
d5
=
0 1 −3 2 −43 0 −4 1 −17 4 0 5 32 −1 −5 0 −28 5 1 6 0
Note that we neglect the weights of the final states and the transition symbols.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 77 / 132
Distance algorithmsAll-pairs-distance algorithm and matrix multiplication
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 78 / 132
Distance algorithmsAll-pairs-distance algorithm: applications
The all-pairs-distance algorithm has – depending on the semiring – a numberof interpretations:
In idempotent semirings based on real numbers (tropical, Viterbi), itcomputes the shortest distances / highest probabilities between eachpair of states.
In the boolean semiring, it computes the reflexive and transitive closureof a binary relation.
In the concatenation semiring, it corresponds to Kleene’s algorithm tocompute a regular expression for some given finite-state automaton.
NoteNote that the concatenation semiring is not bounded. Instead of computingthe closure r∗ for some regular expression r, eventually resulting in an infiniteset, we simply let r∗ denote itself. If we denote ∪ by |, we get the regularexpression semiring.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 79 / 132
Distance algorithmsAll-pairs-distance algorithm: Kleene’s algorithm
Example (All-pairs-distance algorithm in the RE semiring)
� ��
���
�
d0
=
ε a ∅b ε c∅ ∅ (d|ε)
d1
=
ε a ∅b (ε|ba) c∅ ∅ (d|ε)
d2
=
(ε|a(ba∗)b) (a|a(ba∗)(ε|ba)) a(ba∗)c(b|(ε|ba)(ba∗)b) ((ε|ba)|(ε|ba)(ba∗)(ε|ba)) (c|(ε|ba)(ba∗)c)
∅ ∅ (d|ε)
d3
=
(ε|a(ba∗)b) (a|a(ba∗)(ε|ba)) (a(ba∗)c|a(ba∗)c(d∗)(d|ε))(b|(ε|ba)(ba∗)b) ((ε|ba)|(ε|ba)(ba∗)(ε|ba)) ((c|(ε|ba)(ba∗)c)|(c|(ε|ba)(ba∗)c)(d∗)(d|ε))
∅ ∅ ((d|ε)|(d|ε)(d∗)(d|ε))
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 80 / 132
Distance algorithmsAll-pairs-distance algorithm: Kleene’s algorithm
Kleene’s algorithmRequire: A FSA A = 〈Q,Σ, q0, F, E〉Ensure: A to A equivalent regular expression1: All-pairs-distances(A)2: return
⋃q∈F
d|Q|[q0, q]
Example
� ��
���
�
Since state 2 and 3 are final states, we consider cells d3[1, 2] and d3[1, 3].The corresponding regular expression is (a|a(ba∗)(ε|ba)) | (a(ba∗)c|a(ba∗)c(d∗)(d|ε)).
CorollaryThe structure 〈FSA,∪, ·, ∅, {ε}〉 is a semiring.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 81 / 132
Distance algorithmsSingle-source-distance algorithms
Single-source-distance algorithms compute the distances from a givenstate q (in most cases the start state q0) to every other state in the WFSA.They have a lot of applications:
I Computing the best analysis (tropical semiring)I Computing the path with the highest probability (Viterbi semiring)I Statistical language processing: forward algorithm (probabilistic semiring)
There are a number of different algorithms, depending of the acyclicityof the input WFSA and the semiring:
1 Lawler’s algorithm2 Dijkstra’s algorithm3 Bellman-Ford algorithm4 Mohri’s algorithm
Mohri’s algorithm generalizes algorithms 1 to 3.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 82 / 132
Distance algorithmsSingle-source-shortest-distance: initialization
Initialize-Single-Source(A, q0)
Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉Ensure:1: d[q0]← 12: for each q ∈ Q− {q0} do3: d[q]← 04: Π[q]← nil
Single-source-distance algorithms maintain a vector d such that upontermination d[q] will hold the distance from the start state q0 to q for allq ∈ Q.
Initially, d is initialized with 0 for all states except the start state, sincethe distances are not yet established.
Note that single-source-distance algorithms maintain a second vector Πwhich holds the predecessor of each state. We use it to actually constructthe best path. Initially, there are no predecessors
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 83 / 132
Distance algorithmsSingle-source-shortest-distance: Relaxation
All single-source-shortest-distance algorithms are based on a principlecalled relaxation:
I Suppose, we consider a transition t from state p to state q with weight wI If the distance d[q] from the start state q0 to q becomes better if we includet in our best path, we update d[q] accordingly.
Note that this procedure requires some ordering of the semiring elements.Recall that the natural order is based on the idempotence of the semiring.
Relax(〈p, a, w, q〉)1: if d[p]⊗ w ≤K d[q] then2: d[q]← d[p]⊗ w3: Π[q]← p
NoteThe first two lines of Relax() can be formulated more concisely:
d[q]← d[q]⊕ (d[p]⊗ w)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 84 / 132
Distance algorithmsSingle-source-shortest-distance: path properties
Theorem (Path properties in bounded semirings)Let A be a WFSM and K a bounded semiring.
1 Triangle inequalityFor every transition p w−→ q ∈ E : ∆(s, q) ≤K ∆(s, p)⊗ w
2 Upper-bound propertyFor all p ∈ Q, ∆(s, p) ≤K d[p].
3 No-path propertyIf there is no path from s to p, then d[p] = ∆(s, p) = 0
4 Convergence propertyIf s p→ q is a shortest path in A and if d[p] = ∆(s, p) beforerelaxation of p→ q, then d[q] = ∆(s, q) at any time afterwards.
5 Path-relaxation propertyIf π = 〈p0, p1, . . . pk〉 is a shortest path from s = p0 to pk, and thetransitions of π are relaxed in the order p0 → p1, p1 → p2, . . .,pk−1 → pk, then d[pk] = ∆(s, pk).
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 85 / 132
Distance algorithmsDAG-Distance: topological sort
Definition (Topological order)Let A be a finite automaton with state set Q and transition set E.The topological ordering of Q is a sequence q1, q2, . . . , q|Q| such thatwhenever E contains a transition qi → qj , i < j.
NoteOnly the state set of acyclic FSMs can be topologically ordered.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 86 / 132
Distance algorithmsDAG-Distance: topological sort
Definition (Topological sort)Let A be an acyclic WFSM with state set Q and transition set E. A is said tobe topologically sorted if whenever E contains a transition p→ q, then q > p.
Example (Topologically sorted WFSA)
�
����
����
�
��
��
����
���
��
���
����
����
���
����
���
���
����
���
���
NotesA cyclic WFSM cannot be topologically sorted.
An acyclic WSFM can be topologically sorted in O(|Q|+ |E|).
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 87 / 132
Distance algorithmsDAG-Distance(A, d)
DAG-Distance(A, d)
Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 in adjacency list representation.Ensure: A vector d such that ∀q ∈ Q : d[q] = ∆(q0, q)1: Topological-Sort(A)2: Initialize-Single-Source(A, q0)3: for each q ∈ Q in topological order do4: for each t ∈ Adj[q] do5: Relax(t)
PropertiesAssumes an adjacency list representation: Adj[q] contains the outgoing transitions of q.
Works only for acyclic WFSA, but then for any semiring (whether bounded or not).
Complexity: O(|Q|+ |E|).
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 88 / 132
Distance algorithmsDAG-Distance: example
Example (DAG-Distance in the tropical semiring)The topological order is: r s t x y z
���
����
����
�
��
�
���� ��
�
���
��
After Initialize-Single-Source
���
����
����
�
��
�
��� �
�
���
��
Exploring state r: d[s] and d[t] become better
���
����
����
�
��
�
�� � ��
�
�
��
Exploring state s
���
����
����
�
���
�
���� ��
�
�
��
Exploring state t
���
����
����
�
���
�
�� � ���
�
�
��
Exploring state x: d[y] becomes better
���
����
����
�
���
�
�� � ���
�
�
��
Exploring state y: d[z] becomes better
���
����
����
�
���
�
�� � ���
�
�
��
Exploring state z
���
����
����
�
���
�
�� � ���
�
�
��
Distances after termination
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 89 / 132
Distance algorithmsDijkstra’s algorithm
Dijkstra(A, d)
Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 in adjacency list representation.Ensure: A vector d such that ∀q ∈ Q : d[q] = ∆(q0, q)1: Initialize-Single-Source(A, q0)2: Enqueue(PQ, q0)3: processed← ∅4: while PQ 6= ∅ do5: p← Extract-Min(PQ)6: if p /∈ processed then7: processed← processed ∪ {p}8: for each 〈p, a, w, q〉 ∈ Adj[p] do9: if q /∈ processed then
10: Relax(〈p, a, w, q〉)11: if q ∈ PQ then12: Update(q, PQ)13: else14: Insert(q, PQ)
PropertiesA bounded semiring is assumed.Complexity: O(|Q|log|Q|+ |E|) (with a priority queue based on a Fibonacci heap).
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 90 / 132
Distance algorithmsDijkstra’s algorithm: example
Example (Dijkstra’s algorithm in the Viterbi semiring)
���
���
������
���
���
������
��
����
��
���������
���
������
After Initialize-Single-Source
���
���
��������
���
���
������
��
����
��
���������
���
������
After extracting state 0
���
���
��������
������
���
������
��
��
����
���
���������
��
������
After extracting state 1
���
���
��������
�������
���
������
���
��
����
��
������
���
���
���
������
After extracting state 2
���
���
��������
�������
���
������
���
��
�������
��
������
������
���
������
After extracting state 3
���
���
��������
�������
���
������
���
��
�������
��
������
������
���
����������
After extracting state 5
���
���
��������
�������
���
������
���
��
�������
��
������
������
���
����������
After extracting state 4
���
���
��������
�������
���
������
���
��
�������
��
������
������
���
����������
After extracting state 6
���
���
��������
�������
���
������
���
��
�������
��
������
������
���
����������
Distances after termination
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 91 / 132
Distance algorithmsBellman-Ford algorithm
Bellman-Ford(A, d)
Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉Ensure: A vector d such that ∀q ∈ Q : d[q] = ∆(q0, q).
Returns false, if a negative cost cycle exists, otherwise true.1: Initialize-Single-Source(A, q0)2: for i← 1 to |Q| − 1 do3: for each t ∈ E do4: Relax(t)5: for each 〈p, a, w, q〉 ∈ E do6: if ¬(dist[q] ≤ dist[p]⊗ w) then7: return false8: return true
PropertiesWorks for non-bounded semirings but the weight of each cycle in A must be bounded
Detects non-bounded cycles (for example negative cost cycles in the tropical semiring)
Complexity: O((|Q||E|)).
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 92 / 132
Distance algorithmsBellman-Ford algorithm
Example (Bellman-Ford algorithm in the tropical semiring)
���
����
���
��
���
���
��
�
��
�
�
After Initialize-Single-Source
���
����
���
��
���
���
��
�
��
�
�
After pass 1
���
����
���
��
���
����
�
�
�
�
�
After pass 2
���
����
���
��
���
����
�
�
�
�
�
After pass 3
���
����
���
� �
���
����
�
�
�
�
�
After pass 4
���
����
���
� �
���
����
�
�
�
�
�
After termination
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 93 / 132
Distance algorithmsBellman-Ford algorithm: how does algorithm work?
Since we assume a bounded (that is 0-closed) semiring, every shortestpath in A is acyclic.
If A has |Q| states, every acyclic path cannot have more than |Q| − 1transitions.
Path-relaxation property
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 94 / 132
Distance algorithmsBellman-Ford algorithm: why does the negative cost cycle check work?
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 95 / 132
Distance algorithmsMohri’s algorithm for arbitrary semirings
They key idea to generalize the single-source-distance algorithms alreadyseen is, that they all are based on a certain queue discipline, that is, the statesare processed in a certain order:
DAG-distance uses a queue which contains the states in topologicalorder. Every state is processed once.
The queue in Dijkstra’s algorithm contains the states in best-first order.Every state is processed once.
The queue in Bellman-Ford’s algorithm contains the states in any order,but every state is enqueued |Q| − 1 times.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 96 / 132
Distance algorithmsMohri’s algorithm for arbitrary semirings
Single-Source-SR-Distances(A, s, d)
Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 in adjacency list representation and a source state s.Ensure: A vector d such that ∀q ∈ Q : d[q] = ∆(s, q).1: for each q ∈ Q do2: d[q]← r[q]← 03: d[s]← r[s]← 14: Enqueue(S, s)5: while S 6= ∅ do6: q ← Dequeue(S)7: r′ ← r[q]8: r[q]← 09: for each 〈q, a, w, q′〉 ∈ Adj[q] do
10: rw ← r′ ⊗ w11: if d[q′] 6= d[q′]⊕ rw then12: d[q′]← d[q′]⊕ rw13: r[q′]← r[q′]⊕ rw14: if q′ /∈ S then15: Enqueue(S, q′)16: d[s]← 1
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 97 / 132
Distance algorithmsMohri’s algorithm for arbitrary semirings
NotesLines 1 to 3 initialize d[q] and r[q] for all q ∈ Q\{s} to 0.d[s] and r[q] are initialized with 1.
Lines 5 to 15 define a while loop in which a state queue is processed.The states in the queue may be processed in any order, but the queue mayalso make use of semiring properties (boundedness) or acyclicity of theWFSA.
Lines 10 to 13 define the actual relaxation step. Since we do not requirebounded semirings, the test with ≤ is replaced by 6=.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 98 / 132
Distance algorithmsMohri’s algorithm for arbitrary semirings: Queue disciplinesPossible queue disciplines are:
1 Topological order: the states in the queue are ordered by the in-degreeof a state, that is, the number of incoming transitions. This disciplinemay be chosen in the case of acyclic WFSA.
2 Topological sort order: the queue is ordered by state number. Thisdiscipline may be chosen in the case of acyclic WFSA which aretopologically sorted.
3 Best-first: the states in the queue are ordered by the natural order of theweights in d.
4 Bellman-Ford: every state is enqueued |Q| − 1 times.5 Strongly-connected components: the queue contains subqueues which
hold the states of the strongly-connected components. The main queue isordered by the component graph.
6 LIFO or FIFO7 Arbitrary
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 99 / 132
Distance algorithmsMohri’s algorithm for arbitrary semirings
Example (Tropical semiring)
�
�����������
���������
����
������
�����
����
State q 0 1 2 3d[q] 0 ∞ ∞ ∞r[q] 0 ∞ ∞ ∞
Queue [ 0 ]
State q 0 1 2 3d[q] 0 0.2 ∞ 0.8r[q] ∞ 0.2 ∞ 0.8
Queue [ 3 1 ]
State q 0 1 2 3d[q] 0 0.2 ∞ 0.8r[q] ∞ 0.2 ∞ ∞
Queue [ 1 ]
State q 0 1 2 3d[q] 0 0.2 0.8 0.5r[q] ∞ ∞ 0.8 0.5
Queue [ 3 2 ]
State q 0 1 2 3d[q] 0 0.2 0.8 0.5r[q] ∞ ∞ 0.8 ∞
Queue [ 2 ]
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 100 / 132
Distance algorithmsMohri’s algorithm for arbitrary semirings: Approximation in non-k-closed semirings
Example
� ������
���
State q 0 1d[q] 1 0r[q] 1 0
State q 0 1d[q] 1 1r[q] 0 1
State q 0 1d[q] 1 1.5r[q] 0 0.5
State q 0 1d[q] 1 1.75r[q] 0 0.25
State q 0 1d[q] 1 1.875r[q] 0 0.125
State q 0 1d[q] 1 1.9375r[q] 0 0.0625
State q 0 1d[q] 1 1.96875r[q] 0 0.03125
State q 0 1d[q] 1 1.984375r[q] 0 0.015625
State q 0 1d[q] 1 1.9921875r[q] 0 0.0078125
State q 0 1d[q] 1 1.9960938r[q] 0 0.00390625
State q 0 1d[q] 1 1.9980469r[q] 0 0.001953125
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 101 / 132
Distance algorithmsComputing best paths
After using one of the shortest-distance algorithms, we calculate the weight ofthe best path in the following way:
Definition (Best path)
Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 be a WFSA with K = 〈W,⊕,⊗, 0, 1〉 as theunderlying semiring.The weight of the best path in A is defined as follows:
best-path(A) =⊕q∈F
(λ⊗∆(q0, q)⊗ ρ(p)
)The actual best path can be reconstructed using the predecessor vector Π.
NoteNote that this corresponds to the master formula, except that the labels alongthe different paths are neglected.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 102 / 132
Distance algorithmsComputing best paths
Example (Best path in the Viterbi semiring)
�
�����
������
�
�����
����
�����
�����
����
��������
�����
�������
A
� ������
������
����
������
�������
best-path(A)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 103 / 132
Distance algorithmsComputing best paths: application in morphological disambiguation
Example (Morphological disambiguation in the tropical semiring)
�
�
���
�
���
�
���
�
�������
�
������
��
����
��
� ��
����
����������
�����
���
�� ��
���
����
�����
�����
�����
���� ����� ����� ��
����
����
��������������
����
�����
����
����
�����
������� �� ������ ����
The tropical WFSA above represents two morphological analyses of German word Kindergarten:Kindergarten [NN SemClass=k_g_artef_geb Gender=masc Number=sg Case=nom_acc_dat] <0>Kind/N\er#Garten [NN SemClass=k_raumort Gender=masc Number=sg Case=nom_acc_dat] <12>
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 104 / 132
Distance algorithmsComputing best paths
NotesNote that the notion of a best path is only defined in idempotentsemirings.
But this is a necessary condition, not a sufficient one:the string semiring is idempotent, but a best path need not exist.
Example (Best path operation in the string semiring)
�
�����
�
����
��
����
The abstract sum over all paths in this WFSA – that is computing the longest prefix of all paths – isxyz ∧ xzz = x. But there is no successful path for x.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 105 / 132
Shortest-distance algorithmsComputing n-best paths
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 106 / 132
Equivalence transformations
A number of equivalence transformations are defined for WFSAs/WFSTs:
ε-Removal
Determinization
Pushing
Minimization
ε-Normalization (only for WFSTs)
Synchronization (only for WFSTs)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 108 / 132
Equivalence transformationsε-Removal
For weighted automata, the strategy of computing ε-closures does notwork any longer.
Definition (ε-distance)Let A be a WFSM with state set Q, transition set E and alphabet Σ.Let Π(p, x, q) the set of paths from p ∈ Q to q ∈ Q labeled with x ∈ Σ∗.The ε-distance ∆ε(p, q) is defined as:
∆ε(p, q) =⊕
π∈Π(p,ε,q)
w[π]
Define ∆ε(p) as {〈q,∆ε(p, q)〉 | ∃q ∈ Q : ∆ε(p, q) 6= 0}.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 109 / 132
Equivalence transformationsε-Removal
Example (Abigrams)
�
�
���
������
��
���
������ �
��
�������
����� �
��������
�
���
�����������
����������
��
���������
��
������
���������
�����
����
���
�
���
�
���
���
���
���
��
�������
��
������������
�����������
���� �������
�� ���
��������
�������
���
���
�����
������
�����
Abigrams representing bigram counts in the real semiring
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 110 / 132
Equivalence transformationsε-Removal
Example (Abigrams after ε-removal)
�
�
�������
�
����
����������������� �������
�
������
�
�������
�
����
�
�������
�
������
��
���������
�
���������
��
������
��
������
��
�������� �������
��
��� ������
�
����������
��
��������
rmε(Abigrams)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 111 / 132
Equivalence transformationsε-removal algorithm
rmε(A)
Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 in adjacency list representationwith underlying semiring 〈W,⊕,⊗, 0, 1〉.
Ensure: An equivalent WFSA A′ without ε-transitions1: for each p ∈ Q do2: for each 〈q, w〉 ∈ ∆ε(p) do3: for each 〈q, a, w′, r〉 ∈ Adj[q], a 6= ε do4: E ← E ∪ {〈p, a, w ⊗ w′, r〉}5: if q ∈ F then6: if p ∈ F then7: ρ(p)← ρ(p)⊕ (w ⊗ ρ(q))8: else9: F ← F ∪ {p}
10: ρ(p)← w ⊗ ρ(q)11: E ← E\{〈p, ε, w, q〉 ∈ E}12: Connect(A)13: return A
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 112 / 132
Equivalence transformationsε-removal scheme
�
������
����������
������
Suppose the ε-distance ∆ε(p, q) between states p and q is w.
Suppose q has an outgoing transition q© a / w′−−−−−→ r© with a ∈ Σ.
Then, a new transition p© a / w⊗w′−−−−−−−→ r© is added.Note that the following boundary cases are possible:p = q, q = r or even p = q = r.Since q may become inaccessible after removal of the ε-transitions(line 11), A is made connected in line 12.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 113 / 132
Equivalence transformationsε-removal schemeSuppose q is a final state. There are two possible cases:
p /∈ F :
� ������������� ⇒
���������� �������������
p ∈ F :
������� ������������� ⇒
������������������ ������������
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 114 / 132
Equivalence transformationsε-removal algorithm: complexity
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 115 / 132
Equivalence transformationsWeighted determinization
Determinization of weighted FSAs and FSTs is based on an extension ofthe classical power set construction.
States in deterministic finite-state machines are constructed as sets ofpairs 〈q, w〉 such that q is a state of the input WFSM and w is a weight,the so called delayed weight.
The underlying semiring is required to be weakly left divisible becausewe try to realize “safe” weights as early as possible.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 116 / 132
Equivalence transformationsWeighted determinization: algorithm
Determinize-WFSA(A1, A2)
Require: WFSA A1 = 〈Q1,Σ, q01 , F1, δ1, σ1, λ, ρ1〉Ensure: WFSA A2 = 〈Q2,Σ, q02 , F2, δ2, σ2, λ, ρ2〉1: Q2 ← F2 ← ∅2: q02 ← {〈q01 , 1〉}3: Enqueue(Queue, q02 )4: while Queue 6= ∅ do5: q2 ← Dequeue(Queue)6: if ∃〈q, w〉 ∈ q2 ∧ q ∈ F1 then7: F2 ← F2 ∪ {q2}8: ρ2(q2)←
⊕q∈F1
w ⊗ ρ1(q)
9: for each a ∈ Σ : 〈p, x〉 ∈ q2 ∧ δ1(p, a) 6= ∅ do10: J2(a)← {〈q, w, q′〉 | q′ ∈ δ1(q, a) ∧ 〈q, w〉 ∈ q2}11: J1(a)← {〈q, w〉 | ∃q′ : 〈q, w, q′〉 ∈ J2(a)}12: σ2(q2, a)←
⊕〈q,w〉∈J1(a)
w ⊗⊕
q′∈δ1(q,a)
σ1(q, a, q′)
13: δ2(q2, a)←⋃
〈q,w,q′〉∈J2(a)
{σ2(q2, a)−1 ⊗ w ⊗ σ1(q, a, q′)}
14: if q2 /∈ Q2 then15: Q2 ← Q2 ∪ {q2}16: Enqueue(Queue, q2)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 117 / 132
Equivalence transformationsWeighted determinization: algorithm
NotesLine 3 enqueues the start state q02 of the deterministic WFSA
Lines 4 to 16 create new states
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 118 / 132
Equivalence transformationsWeighted determinization
Example (Determinization in the tropical semiring)
�
������
�
�����
�����
�����
����
���������
��������� ��������������������
����
������������ ���
����
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 119 / 132
Equivalence transformationsWeighted determinization
Example (Determinization in the real semiring)
�
������
�
�����
����
�����
����
���������
��������� ��������������������
����
�������������������
�����
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 120 / 132
Equivalence transformationsWeighted determinization: non-determinizable WFSAs
Not every WFSA has an equivalent deterministic WFSA.
That means that the class of deterministic WFSA is properly contained inthe class of non-deterministic WFSA.
Example (Non-determinizable WFSA in the tropical semiring)
�
������
�
�����
�����
�����
����
���� �����
��������� ��������������������
���������������������
��������������������� ����
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 121 / 132
Equivalence transformationsWeighted determinization: non-determinizable WFSAs
Example (Non-determinizable WFSA in the probabilistic semiring)
�
����������
������
�����
�����
������
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 122 / 132
Equivalence transformationsWeighted determinization: twins property
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 123 / 132
Equivalence transformationsWeight Pushing
Definition (Weight Pushing)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 a WFSA.Define the potential pot[q] for all states q ∈ Q as follows:
pot[q] =⊕
p∈F,π∈Π(q,p)
w[π]⊗ ρ(p)
The equivalent WFSA push-weights(A) is defined as:push-weights(A) = 〈Q,Σ, q0, F, E
′, λ′, ρ′〉, where:
ρ′(q) = pot[q]−1 ⊗ ρ(q),∀q ∈ QE′ = {〈p, a, pot[p]−1 ⊗ w ⊗ pot[q], q〉 | 〈p, a, w, q〉 ∈ E}λ′ = λ⊗ pot[q0]
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 124 / 132
Equivalence transformationsWeight Pushing
Example (Weight Pushing in the tropical semiring)
�
�
���
����
�
���
�����
��
����
��
��
���
�������
�������
����
�� ���
������
����� � �����
A
�
�
���
����
�
���
�����
��
����
��
��
���
�������
�������
����
������
������
����� � �����
push-weights(A)
Consider the transition with weight 0 between states 3 and 5.The new weight w′ of this transition is computed as follows:
w′ = −pot[3] + 0 + pot[5] = −2 + 0 + 5 = 3
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 125 / 132
Equivalence transformationsWeight Pushing
Example (Weight Pushing in the real SR: trigram probabilities)
�
�
���
����
�
���
����
�����
��
���
�����
���
�
����
�����
���
���������
��������
���������
���������
��� ���
�����
���
����
���
�
�
������
�������
�
����
�����
��������
��
�������
�������
�
������
�������
���
������
� ��������
������
�������
�������
������� �
���
������
��
���
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 126 / 132
Equivalence transformationsWeight Pushing
Weight pushing is always possible, regardless whether the input WFSAis deterministic or not.In case of deterministic WFSA, weight pushing leads to a canonical ornormal form of the input WFSA A:
I For every x ∈ Σ∗ there is at most a single path in A.I On that path, the path weight is pushed as much as possible towards the
start state.
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 127 / 132
Equivalence transformationsWeighted minimization
Weighted-Minimization(A)
Require: A deterministic WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉Ensure: A to A equivalent minimal WFSA A′
1: Push-Weights(A)2: Encode(A,EM)3: Unweighted-Minimization(A)4: Decode(A,EM)5: return A
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 128 / 132
Equivalence transformationsWeighted minimization
Example (Weight minimization in the tropical semiring)
�
�
���
����
�
���
�����
��
����
��
��
���
�������
�������
����
������
������
����� � �����
A
�
�
���
����
�
���
�����
��
����
��
��
���
�������
�������
����
������
������
����� � �����
Weighted-Minimization(A)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 129 / 132
Outline
6 Literature
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 130 / 132
Literature
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 131 / 132
Versions
31.10.2008: version 0.1 (initial version)
14.11.2008: version 0.2 (added shortest-distance algorithms)
23.11.2008: version 0.3 (changed Dijkstra algorithm, added shortest pathproperties, added ε-filter example)
1.12.2008: version 0.4 (added equivalence transformation section)
Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 132 / 132