132
Finite-state Machines: Theory and Applications Weighted Finite-state Automata Thomas Hanneforth Institut f ¨ ur Linguistik Universit¨ at Potsdam December 10, 2008 Thomas Hanneforth (Universit¨ at Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 1 / 132

Finite-state Machines: Theory and Applications - Weighted ... · Finite-state Machines: Theory and Applications Weighted Finite-state Automata Thomas Hanneforth Institut f¨ur Linguistik

  • Upload
    vungoc

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

Finite-state Machines: Theory and ApplicationsWeighted Finite-state Automata

Thomas Hanneforth

Institut fur LinguistikUniversitat Potsdam

December 10, 2008

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 1 / 132

Overview

1 Semirings2 Weighted finite-state automata3 Semiring properties4 Closure properties and algebra of weighted finite-state automata5 Shortest-distance algorithms6 Equivalence transformations

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 2 / 132

Outline

1 Semirings

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 3 / 132

Semirings

SemiringsAbstract weights

We add weights to finite-state machines to enhance their capabilities.

The notion of a weight is an abstract one: everything which fulfills theconditions of an abstract algebraic structure called a semiring can beused as a weight in a FSM.So weights can be:

I Real numbersI ProbabilitiesI DistancesI StringsI Feature structuresI SetsI MatricesI ...

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 5 / 132

SemiringsMonoid

Definition (Monoid)An algebraic structure 〈W, �, 1〉 is a monoid if:

1 � is a binary operation on W : � : W ×W 7→W2 � is associative: ∀x, y, z ∈W : x � y � z = (x � y) � z = x � (y � z)3 1 is the identity element for �: ∀x ∈W : x � 1 = 1 � x = x

Example (The following structures are monoids:)〈R,+, 0〉 (addition of real numbers)〈R, ·, 1〉 (multiplication of real numbers)〈R ∪ {∞},min,∞〉 (minimum of real numbers)〈2M ,∪, ∅〉 and 〈2M ,∩,M〉 for each set M〈Σ∗, ·, ε〉 (concatenation monoid)〈2Σ∗ , ·, {ε}〉 (formal language monoid)〈F ,t,⊥〉 (unification of feature structures, ⊥ = empty feature structure)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 6 / 132

SemiringsCommutative Monoid

Definition (Commutative monoid)A monoid 〈W, �, 1〉 is called commutative if ∀x, y ∈W : x � y = y � x

ExampleThe following monoids are commutative:

〈R,+, 0〉〈R, ·, 1〉〈R ∪ {∞},min,∞〉,〈F ,t,⊥〉,〈2M ,∪, ∅〉 and 〈2M ,∩,M〉 for each set M

The following monoids are not commutative:

〈Σ∗, ·, ε〉〈2Σ∗ , ·, {ε}〉

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 7 / 132

SemiringsDefinition

Definition (Semiring)

An algebraic structure K = 〈W,⊕,⊗, 0, 1〉 is a semiring if it fulfills thefollowing conditions:

1 〈W,⊕, 0〉 is a commutative monoid with 0 as the identity element for ⊕.2 〈W,⊗, 1〉 is a monoid with 1 as the identity element for ⊗.3 ⊗ distributes over ⊕:∀x, y, z ∈W : x⊗ (y ⊕ z) = (x⊗ y)⊕ (x⊗ z) (left distributivity)∀x, y, z ∈W : (y ⊕ z)⊗ x = (y ⊗ x)⊕ (z ⊗ x) (right distributivity)

4 0 is an annihilator for ⊗: ∀w ∈W,w ⊗ 0 = 0⊗ w = 0.

Notational conventionW is a nonempty set, called the carrier set of K.In the following, we will identify a semiring K with its carrier set.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 8 / 132

SemiringsCommon semirings

Example (Common semirings)

〈W,⊕,⊗,0,1〉 Name〈{t, f},∨,∧, f, t〉 boolean semiring〈R,+, ·, 0, 1〉 real semiring〈[0, 1],+, ·, 0, 1〉 probabilistic semiring〈R+∞,min,+,∞, 0〉 tropical semiring

〈[0, 1],max, ·, 0, 1〉 Viterbi semiring〈R∞,⊕log,+,∞, 0〉 log semiring x⊕log y = −ln(e−x + e−y)〈R−∞,max,+,−∞, 0〉 arctic semiring〈[0, 1],max,min, 0, 1〉 fuzzy semiring〈2Σ∗ ,∪, ·, ∅, {ε}〉 concatenation semiring 2Σ∗ = formal languages〈Σ∗ ∪ {s∞},∧, ·, s∞, ε〉 string semiring ∧ = longest common prefix〈2M ,∪,∩, ∅,M〉 set semiring M is an arbitrary set〈2F ,∪,t, ∅, {⊥}〉 unification semiring F = set of feature structures

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 9 / 132

SemiringsString semiring

String semiringThe string semiring is defined as 〈Σ∗ ∪ {s∞},∧, ·, s∞, ε〉. That means:

The longest common prefix operation ∧ plays the role of abstractaddition:x ∧ y = z if x = zx′ and y = zy′ (z is of maximum length)

We augment Σ∗ with a special string s∞ – the infinite string – which isthe identity element of ∧:∀x ∈ Σ∗ ∪ {s∞} : x ∧ s∞ = s∞ ∧ x = x

Examplecat ∧ car = cacat ∧ dog = εdog ∧ dogs = dogcat ∧ s∞ = cat

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 10 / 132

SemiringsIsomorphic semirings

Definition (Isomorphic semirings)

A semiring isomorphism is a function f such that f and f−1 are semiringhomomorphisms.Let K1 = 〈W1,⊕1,⊗1, 01, 11〉 and K2 = 〈W2,⊕2,⊗2, 02, 12〉 be semirings.A function f : K1 7→ K2 is a semiring homomorphism if:

f(11) = 12

f(a⊕1 b) = f(a)⊕2 f(b), for all a and b in K1

f(a⊗1 b) = f(a)⊗2 f(b) for all a and b in K1

Example (Isomorphic semirings)

The tropical and Viterbi semirings are isomorphic under the −ln().The real and log semirings are isomorphic under the ln() function.

Practical importance of semiring isomorphismsNormally, we −log-transform probabilities to avoid numerical underflow.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 11 / 132

SemiringsOther interesting semirings: matrix semiring

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 12 / 132

SemiringsOther interesting semirings: entropy semiring

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 13 / 132

Outline

2 Weighted finite-state automata

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 14 / 132

Weighted finite-state automata

Weighted Finite-state AutomataWeighted finite-state acceptors

Definition (Weighted finite-state acceptor)A weighted finite-state acceptor (WFSA) A = 〈Σ, Q, q0, F, E, λ, ρ〉 over asemiring K is a 7-tuple with:

Σ, the finite input alphabetQ, the finite set of statesq0 ∈ Q, the start stateF ⊆ Q the set of final statesE ⊆ Q× (Σ ∪ {ε})×K ×Q, the set of transitionsλ ∈ K, the initial weight andρ : F 7→ K, the final weight function

NotationIf t is a transition ∈ E, we denote with w[t] the weight of t and l[t] the label∈ Σ ∪ {ε} of t.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 16 / 132

Weighted Finite-state AutomataWeighted finite-state acceptors

Example (WFSA accepting/generating politicians’ speeches)

� ����

����

���

���

���

Under the assumption that we add weights along a path (and in the endadd the weight of the reached final state), this WFSA defines thefollowing weighted language:L(Ablabla) = {bla→ 1, blabla→ 2, blablabla→ 3, . . . , blan → n}.This means: Ablabla can measure the length of these speeches.

This in turn shows: Weighted finite-state automata can count!

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 17 / 132

Weighted Finite-state AutomataWeighted finite-state acceptors

Example (WFSA mapping words to word indices)

����

��

���

����

���

�� ��

�����

���

�����

��

���

����

����

���

���

�����

�����

���

���

���

���

���

���

��

����

���

Again, if we add weights along a path we obtain for each accepted worda unique number, for example: fox 7→ 2 + 4 + 0 + 0 = 6.

This technique is also called perfect hashing.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 18 / 132

Weighted Finite-state AutomataWeighted finite-state transducers

Definition (Weighted finite-state transducer)A weighted finite-state transducer (WFST) A = 〈Σ,∆, Q, q0, F, E, λ, ρ〉 overa semiring K is a 8-tuple with

Σ, the finite input alphabet

∆, the finite output alphabet

Q, the finite set of states

q0 ∈ Q, the start state

F ⊆ Q the set of final states

E ⊆ Q× (Σ ∪ {ε})× (∆ ∪ {ε})×K ×Q, the set of transitions

λ ∈ K, the initial weight and

ρ : F 7→ K, the final weight function mapping final states to elements inK

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 19 / 132

Weighted Finite-state AutomataWeighted finite-state transducers

Example

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 20 / 132

Weighted Finite-state AutomataWeight of a string - the Master formula

Definition (Master formula)Let A = 〈Σ, Q, q0, F, E, λ, ρ〉 be a WFSA and K = 〈W,⊕,⊗, 0, 1〉 thesemiring associated with A. Let π = t1t2 . . . tk be a path in A, that is, asequence of adjacent transitions.Let ω[π] be the abstract multiplication of the weights along π:ω[π] = w[t1]⊗ w[t2]⊗ . . .⊗ w[tk].Let x ∈ Σ∗ be a string, such that x = l[t1] · l[t2] · . . . · l[tk].Let Π(P, x, P ′) be the set of paths starting at states in P , ending in states inP ′ and thereby deriving the string x.The weight JAK(x) assigned to a given string x is defined as follows:

JAK(x) =⊕

f∈F,π∈Π({q0},x,{f})

λ⊗ ω[π]⊗ ρ(f)

JAK(x) = 0, if Π({q0}, x, F ) = ∅.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 21 / 132

Weighted Finite-state AutomataWhat does the Master formula mean?

For each path labeled with x, starting at the start state and reaching afinal state, we combine a constant weight λ, the weight of the individualtransitions and the weight assigned to that final state by abstractmultiplication.

There may be more than one path for x, if A is non-deterministic.In that case, we abstractly add the weights of all paths.

This in turn means: if A is deterministic there is at most on path for x inA. The contribution of

⊕can then be neglected.

If there is no path for x, the weight assigned to x is 0.Thus, 0 – the identity element of ⊕ – signals, that a certain word x is notaccepted by A.

Note that JAK(x) is a function mapping x ∈ Σ∗ to elements in K.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 22 / 132

Weighted Finite-state AutomataMaster formula

Example (Weight interpretation)

�����

������

�����

�����

���������

���������

����������

����

�����

WFSA A with numerical weights(λ = 1 in all cases)

JAK(ba) =

Weights interpreted in the tropical semiring:min({0.5+0.7+0.4, 0.6+0.6+0.7}) = 1.6

Weights interpreted in the probabilistic SR:0.5 · 0.7 · 0.4 + 0.6 · 0.6 · 0.7 = 0.392

Weights interpreted in the Viterbi semiring:max({0.5 · 0.7 · 0.4, 0.6 · 0.6 · 0.7}) = 0.252

Weights interpreted in the (max,min) SR:max({min({0.5, 0.7, 0.4}),

min({0.6, 0.6, 0.7})}) = 0.6

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 23 / 132

Weighted Finite-state AutomataMaster formula

Example (Alieben representing some forms of German verb lieben)

� ������

������

�����

����� ��

�����

������

�����

� ������

������

�������

� ������

� ������

���� �

� ������

� ������

� ������

�����

�����

������

��

�����

��

�����

�����

�����

�������

����

����

JAliebenK(liebst) = ε · ε · ε · V erb · ε · ε · pres 2 sg

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 24 / 132

Weighted Finite-state AutomataWhy are semirings a suitable weight structure for finite-state automata?

The fact that the semiring is composed out of two monoids gives us:I two identity elements 0 and 1: 0 signals that some string is not part of the

language accepted by the WFSM, and 1, which gives us a notion of a“trivial” weight

I a high degree of freedom in which order to compute weights (byassociativity)

Commutativity of ⊕ gives us the freedom to consider the paths accordingto the master formula in any order

Distributivity of ⊗ enables us to factor out common “weight prefixes” ofdifferent paths (this property is for example important for weighteddeterminization)

The annihilation relationship between ⊗ and 0 causes any path whichcontains a transition with weight 0 to be invalid

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 25 / 132

Weighted Finite-state AutomataSignificance of semirings

Semirings play an import role in processing tasks based on finite-stateautomata, but are not limited to that.

The tropical semiring is the classical semiring of all kinds ofshortest-distance / best-analysis problems

The probabilistic semiring plays a important role in statistical languageprocessing for computing string probabilities.

The same is true for the Viterbi semiring, since it is related to processingtasks where we are interested in the analysis with the highest probability.

The string semiring is used in string processing tasks, computationalmorphology.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 26 / 132

Weighted Finite-state AutomataWeighted rational languages

Weighted rational languages as monoid homomorphisms

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 27 / 132

Weighted Finite-state AutomataWeighted regular relations

Definition (Weighted regular relation)1

2

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 28 / 132

Outline

3 Semiring properties

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 29 / 132

Semiring properties

Semiring propertiesClosureAn important operation on semiring elements is the closure operation ∗.

Definition (Closure)Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring. The closure a∗ of a ∈W is defined as:

a∗ =∞⊕n=0

an = 1⊕ a⊕ (a⊗ a)⊕ (a⊗ a⊗ a) . . . = 1⊕ a⊕ a2 ⊕ a3 . . .

Example (Closures in different semirings)Semiring a a∗

tropical 0.5 0.5∗ = 0 min 0.5 min (0.5 + 0.5) min . . . = 0Viterbi 0.5 0.5∗ = 1 max 0.5 max (0.5 · 0.5) max . . . = 1real 1

2

(12

)∗ = 1 + 12 + 1

4 + 18 + . . . = 2

string cat cat∗ = ε ∧ cat ∧ cat · cat ∧ . . . = εconcatenation {a, b} {a, b}∗ = {ε} ∪ {a, b} ∪ ({a, b} · {a, b}) ∪ . . . =

{ε, a, b, aa, ab, ba, bb, . . .}unification {[f :a]} {[f :a]}∗ = {⊥} ∪ {[f :a]} ∪ . . . = {⊥, [f :a]}

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 31 / 132

Semiring propertiesSemirings differ with respect a number of formal properties.

Definition (Semiring properties)

Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring.

Finiteness: K is said to be finite if W is finite.

Commutativity: K is said to be commutative if ⊗ is commutative:∀x, y ∈W : x⊗ y = y ⊗ x. Note that ⊕ is commutative by definition.

Idempotency: K is said to be idempotent if ⊕ is idempotent:∀x ∈W : x⊕ x = x

Boundedness: K is said to be bounded if 1 is an annihilator for ⊕:∀x ∈W : 1⊕ x = x⊕ 1 = 1k-Closedness: K is said to be k-closed (for a fixed integer k) if

∀x ∈W :k⊕

n=0

xn =k+1⊕n=0

xn

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 32 / 132

Semiring properties

Properties of individual semiringsThe tropical, Viterbi and string semirings are idempotent, bounded and0-closed:

+idempotent +bounded 0-closedtropical a min a = a a min 0 = 0 a0 = a1 = 0 min a = 0Viterbi a max a = a a max 1 = 1 a0 = a1 = 1 max a = 1string a ∧ a = a a ∧ ε = ε a0 = a1 = ε ∧ a = ε

The real semiring is not idempotent, not bounded and not k-closed:

−idempotent −bounded −k-closedReal a+ a = 2a 6= a a+ 1 6= a ∞∑

n=0

an =

{∞ if a ≥ 1

11−a if a < 1

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 33 / 132

Semiring properties

Theorem (Bounded semirings)1 In a bounded semiring, a∗ = 1.2 Every 0-closed semiring is bounded.

Proof.1

a∗ =∞⊕n=0

an =

2

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 34 / 132

Semiring propertiesClosed semiring

Definition (Closed semiring)

Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring. K is called closed if

Every finite or infinite sum of elements ai ∈W is in W too:

∀{a1, . . . , an} ⊆W :n⊕i=1

ai ∈W

Associativity and commutativity hold for countable sums:

Distributivity hold for countable sums:

∀{a1, . . . , an}, {b1, . . . , bm} ⊆W :n⊕i=1

ai ⊗m⊕j=1

bj =n⊕i=1

( m⊕j=1

(ai ⊗ bj))

NoteClosed semirings play an important role in shortest-distance algorithms.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 35 / 132

Semiring properties

Definition (Semiring properties)

Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring.

Left divisibility: K is said to be left divisible if ∀x ∈W , x 6= 0,∃y ∈W such that y ⊗ x = 1 (that is, every element has a left inverse)

Weak left divisibility: K is said to be weakly left divisible if∀x, y ∈W,x⊕ y 6= 0, ∃z ∈W : x = (x⊕ y)⊗ z.If z is unique we write z = (x⊕ y)−1 ⊗ x.

0-divisor-freeness: K is said to be 0-divisor-free if¬∃x, y ∈W,x, y 6= 0, such that x⊗ y = 00-sum-freeness: K is said to be 0-sum-free if¬∃x, y ∈W,x, y 6= 0, such that x⊕ y = 0

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 36 / 132

Semiring propertiesExample (Weak left divisibility)

∀x,y ∈W,x⊕ y 6= 0, ∃z ∈W : x = (x⊕ y)⊗ z.

In the tropical semiring: x = 0.5, y = 0.2, z = 0.3(0.5 min 0.2) + 0.3 = 0.5−(0.5 min 0.2) + 0.5 = −0.2 + 0.5 = 0.3a−1 = −a

In the real semiring: x = 310 , y = 2

10 , z = 35

( 310 + 2

10) · 35 = 3

101

310

+ 210

· 310 = 2 · 3

10 = 35

a−1 = 1a

In the string semiring: x = cats, y = cars, z = ts(cats ∧ car) · ts = ca · ts = cats(cats ∧ car)−1 · cats = ca−1 · cats = tsa−1 = inverse strings

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 37 / 132

Semiring propertiesWeak left divisibility and determinization of WFSAs

Weak left divisibility is an important property for determinizing weightedautomata.

Example (Determinization in the tropical semiring)

������

�����

���������

����

�����

Nondeterministic WFSA realizing theweighted language {ab 7→ 0.9, ac 7→ 0.6}

� ��������

�����������

�����

������

Deterministic WFSA

The value of the c-transition in the deterministic WFSA is computed in the following way:−(0.5 min 0.3) + 0.5 + 0.1 = 0.3

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 38 / 132

Semiring propertiesLeft and right semirings

Definition (Left and right semirings)A semiring K is called a left semiring if it is left distributive.A semiring K is called a right semiring if it is right distributive.

Example (String semiring)The string semiring is only a left semiring:x · (cat ∧ car) = xcat ∧ xcar = xca(cat ∧ car) · s = ca · s = cas 6= cats ∧ cars = ca

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 39 / 132

Semiring propertiesNatural order

Definition (Natural order)Let K be an idempotent semiring. We may define a partial order ≤K – calledthe natural order – on the elements in K in the following way:

a ≤K b ≡ a⊕ b = a

Theorem (≤K is a partial order)≤K is reflexive, antisymmetric and transitive.

Proof.1 Reflexivity:2 Antisymmetry:3 Transitivity:

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 40 / 132

Semiring propertiesNatural order

Based on the notion of a natural order we can define more semiring properties:

DefinitionLet K = 〈W,⊕,⊗, 0, 1〉 be a semiring. Let ≤K be a partial order on theelements in K.

Negativity: K is said to be negative if 1 ≤K 0.K is said to be positive if 0 ≤K 1.

Monotonicity: K is said to be monotonic if for all x, y, z ∈W :(x ≤K y)→ (x⊕ z ≤K y ⊕ z)(x ≤K y)→ (x⊗ z ≤K y ⊗ z)(x ≤K y)→ (z ⊗ x ≤K z ⊗ y)Superiority: K is said to be superior if for all x, y ∈W :x ≤K x⊗ y andy ≤K x⊗ y

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 41 / 132

Semiring propertiesNatural order: significance

Superiority and monotonicity are important properties for shortest-distancealgorithms on WFSA:

Superiority, that is x ≤K x⊗ y, means that the weight x of a string willnot get better if you multiply it with the weight y of another transition.

Monotonicity ensures an optimal substructure of shortest-distanceproblems: a problem where we have to decide whether x⊗ z ≤K y ⊗ zholds, can be reduced to the question whether x ≤K y holds.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 42 / 132

Semiring properties

Theorem1 Every superior semiring is negative.2 Every bounded semiring is idempotent.

Proof.1

∀a ∈ K : 1 ≤ 1⊗ a = a (by superiority)∀a ∈ K : a ≤ 0⊗ a = 0 (by superiority and annihilator property)∀a ∈ K : 1 ≤ a ≤ 0

2

a⊕ 1 = a (Boundedness)1⊕ 1 = 1 (a = 1)a⊕ a = a (multiplication with a)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 43 / 132

Semiring propertiesSummary: Semirings and their properties

〈W,⊕,⊗,0,1〉 Properties〈{t, f},∨,∧, f, t〉 boolean semiring: finite, commutative, positive, idem-

potent, bounded, 0-closed, closed〈R,+, ·, 0, 1〉 real semiring: infinite, commutative, non-idempotent,

monotonic, non-bounded, non-k-closed, non-closed〈R∞,⊕log,+,∞, 0〉 log semiring: infinite, commutative, non-idempotent,

monotonic, non-bounded, non-k-closed, non-closed〈R+∞,min,+,∞, 0〉 tropical semiring: infinite, commutative, negative,

idempotent, monotonic, 0-closed, bounded〈[0, 1],max, ·, 0, 1〉 Viterbi semiring: infinite, commutative, negative, idem-

potent, monotonic, 0-closed, bounded〈R−∞,max,+,−∞, 0〉 arctic semiring〈[0, 1],max,min, 0, 1〉 fuzzy semiring

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 44 / 132

Semiring propertiesSummary: Semirings and their properties

〈W,⊕,⊗,0,1〉 Properties〈2Σ∗ ,∪, ·, ∅, {ε}〉 concatenation semiring: infinite, non-commutative,

negative, idempotent, non-k-closed, closed, natural or-der: ⊆

〈Σ∗,∧, ·, ∅, ε〉 string semiring: infinite, left semiring, non-commutative, negative, idempotent, monotonic,bounded, 0-closed, natural order: is prefix of

〈2F ,∪,t, ∅, {⊥}〉 unification semiring: infinite, left semiring, commu-tative, negative, idempotent, monotonic, bounded, 1-closed, natural order: v (subsumption)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 45 / 132

Semiring propertiesSemiring properties and FSM algebra

Some operations of the FSM algebra require certain properties of theunderlying semiring:

Intersection and composition are only defined for FSMs based oncommutative semirings.

Composition of WFSTs with ε : x or x : ε transitions must beparameterized for non-idempotent semirings.

Shortest-distance algorithms usually require idempotent and negativesemirings

Weighted determinization is only defined for weakly left-divisiblesemirings.

Making an FSM connected (remove all paths with path weight 0) usuallyrequires 0-divisor-freeness

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 46 / 132

Outline

4 Closure properties and algebra of weighted finite-state automataWeighted closureWeighted intersectionWeighted compositionAlgebra of WRLs and WRTs

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 47 / 132

Closure properties and algebra

of weighted finite-state automata

Closure properties and algebra of WFSA

Weighted finite-state acceptors are closed under the following operations:

Union

Concatenation

Closure

Intersection

Difference with unweighted finite-state acceptors

Reversal

Substitution / homomorphism

Cross product

Weighted finite-state acceptors are not closed under:

Complementation

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 49 / 132

Closure properties and algebra of WFSM

The set of weighted finite state transducers is closed under

Union

Concatenation

Closure

Reversal

Projection (note that this leads to FSAs)

Composition

Inversion

Weighted finite state transducers are not closed under

Complementation

Intersection (but acyclic and ε-free transducers are)

Difference

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 50 / 132

Closure properties and algebra of WFSAWeighted closure

Definition (Weighted closure)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 be a weighted finite-state acceptor withunderlying semiring K = 〈W,⊕,⊗, 0, 1〉.The closure of A – denoted by A∗ – is defined as:

A∗ = 〈Q ∪ {q′0},Σ, q′0, F ∪ {q′0}, E ∪ Eε, λ, ρ′〉

with

Eε = {〈q, ε, ρ(q), q0〉 | ∀q ∈ F} ∪ {〈q′0, ε, 1, q0〉}

and

ρ′(q) =

{ρ(q) ∀q ∈ F1 q = q′0

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 51 / 132

Closure properties and algebra of WFSAWeighted closure

Example (Weighted closure in the tropical semiring)

���

� ������

���

���

���

A

��� ����

���

���

���

����

���

��

���

���

A∗

Example (Weighted closure in the probabilistic semiring)

�����

A

�����

�����

������

A∗

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 52 / 132

Closure properties and algebra of WFSAWeighted intersection

Definition (Intersection of two weighted regular languages)Let A1 and A2 be two weighted finite-state acceptors over alphabets Σ1 andΣ2, resp. The intersection A1 ∩A2 is defined language-theoretically in thefollowing way:

∀x ∈ Σ∗1 ∩ Σ∗2 : JA1 ∩A2K(x) = JA1K(x)⊗ JA2K(x)

NotesIf one of the automata does not accept x, the corresponding weight is 0.By the annihilation property of the semiring, JA1∩A2K(x) will be also 0.

The weight combining operation must indeed be ⊗.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 53 / 132

Closure properties and algebra of WFSAWeighted intersection

Definition (Weighted intersection of two finite-state acceptors)Let A1 = 〈Q1,Σ1, q01 , F1, E1, λ1, ρ1〉 and A2 = 〈Q2,Σ2, q02 , F2, E2, λ2, ρ2〉be two WFSA. A1∩A2, the intersection of A1 and A2, is a weighted acceptor:

A = 〈Q1 ×Q2,Σ1 ∩ Σ2, 〈q01 , q02〉, F1 × F2, E, λ1 ⊗ λ2, ρ〉 where

〈〈p, q〉, a, w1 ⊗ w2, 〈p′, q′〉〉 ∈ Eif 〈p, a, w1, p

′〉 ∈ E1 and 〈q, a, w2, q′〉 ∈ E2, for all a ∈ Σ1 ∩ Σ2.

ρ(〈p, q〉) = ρ1(p)⊗ ρ2(q),∀p ∈ F1, ∀q ∈ F2

CommentsThe individual weights of the two operands are abstractly multipliedduring intersection.Note that weighted intersection – unlike unweighted intersection – is nolonger idempotent: A ∩A 6= A.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 54 / 132

Closure properties and algebra of WFSAWeighted intersection

Example (Weighted intersection in the probabilistic SR)

� ������ ������ ������ ����� ��������� ∩�

������

�����

���������

����

������

�����

����

=

� ������ ������ ����� ����� ����� ����

NoteIntersecting (or composing) a trivially (1-) weighted, acyclic WFSA representing an input string (or a finiteset of input strings) with a cyclic weighted finite-state machine is the typical scenario for HMM-styleprocessing tasks like tagging etc.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 55 / 132

Closure properties and algebra of WFSAWeighted intersection: why is it only possible in commutative semirings?

1 Let A1 and A2 be two ε-free WFSA.2 Suppose that some input x is in A1 ∩A2 with JA1 ∩A2K(x) 6= 0.3 Let t11 . . . t

1n and t21 . . . t

2n be accepting paths in A1 and A2, respectively

4 The algebraic definition of intersection is:JA1 ∩A2K(x) = (JA1K(x)⊗ JA2K(x).In our example that means:JA1 ∩A2K(x) = (w[t11]⊗ . . .⊗ w[t1n])⊗ (w[t21]⊗ . . .⊗ w[t2n])

5 The weight of the corresponding path in A1 ∩A2 is(w[t11]⊗ w[t21])⊗ (w[t12]⊗ w[t22])⊗ . . .⊗ (w[t1n]⊗ w[t2n])

6 But 4. and 5. are the same only if the underlying semiring of A1 and A2

is commutative.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 56 / 132

Closure properties and algebra of WFSAWeighted intersection in non-commutative semirings?

This result limits the usefulness of intersection and composition innon-commutative semirings like the string semiring.

On the other hand, if one of the operands in A1 ∩A2 is triviallyweighted, then no problem arises.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 57 / 132

Closure properties and algebra of WFSAWeighted composition

Definition (Composition of two weighted regular relations)Let T1 and T2 be two weighted finite-state transducers over alphabets Σ1, ∆1

and Σ2, ∆2, resp. The composition T1 ◦ T2 is defined in the following way:

∀x ∈ Σ∗1, y ∈ ∆∗2 : JT1 ◦ T2K(x, y) =⊕

z∈∆∗1∩Σ∗2

JT1K(x, z)⊗ JT2K(z, y)

ExampleConsider the two weighted relations in the real semiringR1 = {〈abcd, ad〉 7→ 1} andR2 = {〈ad, dea〉 7→ 1}.According to the definition above,R1 ◦R2 = {〈abcd, dea〉 7→ 1} with z = ad.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 58 / 132

Closure properties and algebra of WFSAWeighted composition

Definition (Composition of two weighted finite-state transducers)Let T1 = 〈Q1,Σ1,∆1, q01 , F1, E1, λ1, ρ1〉 andT2 = 〈Q2,Σ2,∆2, q02 , F2, E2, λ2, ρ2〉 be two WFSTs.T1 ◦ T2, the composition of T1 and T2, is a transducer:

T = 〈Q1 ×Q2,Σ1,∆2, 〈q01 , q02〉, F1 × F2, E ∪Eε ∪Ei,ε ∪Eo,ε, λ1 ⊗ λ2, ρ〉where

1 E = {〈〈p, q〉, a, b, w1 ⊗ w2, 〈p′, q′〉〉 |∃c ∈ ∆1 ∩ Σ2 : 〈p, a, c, w1, p

′〉 ∈ E1 ∧ 〈q, c, b, w2, q′〉 ∈ E2}

2 Eε = {〈〈p, q〉, a, b, w1 ⊗ w2, 〈p′, q′〉〉 |〈p, a, ε, w1, p

′〉 ∈ E1 ∧ 〈q, ε, b, w2, q′〉 ∈ E2}

3 Ei,ε = {〈〈p, q〉, ε, a, w, 〈p, q′〉〉 | 〈q, ε, a, w, q′〉 ∈ E2 ∧ p ∈ Q1}4 Eo,ε = {〈〈p, q〉, a, ε, w, 〈p′, q〉〉 | 〈p, a, ε, w, p′〉 ∈ E1 ∧ q ∈ Q2}5 ρ(〈p, q〉) = ρ1(p)⊗ ρ2(q),∀p ∈ F1, ∀q ∈ F2

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 59 / 132

Closure properties and algebra of WFSAWeighted composition

Example (Weighted composition in the prob. SR (non-idempotent))

� ��������

��������

������

������������

T1

◦ � ��������

�������

�����������

T2

����� �����������

����������

��� �����

����

�����

�����

����

������������

�����������

������ ��������������

T1 ◦ T2

T1 ◦ T2 corresponds to the weighted relation {〈abcd, dea〉 7→ 4}

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 60 / 132

Closure properties and algebra of WFSAWeighted composition: ε-filter FSTs

The problem is, that there are 4 paths deriving the string pair bc : e withweight 1.0

By the master formula, paths weights are added

In idempotent semirings this doesn’t cause harm, since a⊕ a = a.But in non-idempotent semirings, we are faced with a problem

Solution: introduce a filter such that all paths except one are deleted

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 61 / 132

Closure properties and algebra of WFSAWeighted composition: ε-filter FSTs

Example (ε-filter)

�����

������������

����

�����

������

�����

����

ε-filter Fε

ε-transitions in T1 are renamed to ε1, while ε-transitions in T2 arerenamed to ε2

The composition T1 ◦ T2 is replaced by T ′1 ◦ Fε ◦ T ′2where T ′1 and T ′2 are the ε-renamed versions of the original operands.Note that ε-renaming and filtering may be incorporated into thecomposition algorithm.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 62 / 132

Closure properties and algebra of WFSAWeighted composition: ε-filter FSTs

Example (Composition with ε-filter)

�������

������

�������

�������

�������

������

�������

�����

�������

T ′1

�������

������

�������

�������

�������

�����

�������

T ′2

������� ������������

������

����

���������

�����

�����

� ����������

� ���������

����� ��������

T ′1 ◦ Fε ◦ T ′2The final connect-step (which will remove the gray states and transitions) has been omitted.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 63 / 132

Closure properties and algebra of WFSAAlgebra of WRLs and WRTs

Definition (Algebra of WRLs)Let L and M denote two WRLs mapping Σ∗ → K.

singleton {u}(x) = 1 iff u = x

union (sum) (L ∪M)(x) = L(x)⊕M(x)

concatenation (L ·M)(x) =⊕uv=x

L(u)⊗M(v)

scaling kL(u) = k ⊗ L(u)

power L0(ε) = 1

L0(x 6= ε) = 0

Ln+1(x) = (L · Ln)(x)

closure L∗(x) =⊕k≥0

Lk(x)

crossproduct (L×M)(x, y) = L(x)⊗M(y)

intersection (L ∩M)(x) = L(x)⊗M(x)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 64 / 132

Closure properties and algebra of WFSAAlgebra of WRLs and WRTs

Definition (Algebra of WRTs)Let S and Q denote two WRTs mapping Σ∗ ×∆∗ → K and ∆∗ × Γ∗ → K, respectively.

singleton {(u, v)}(x, y) = 1 iff u = x and v = y

union (sum) (S ∪ Q)(x, y) = S(x, y)⊕ Q(x, y)

concatenation (S · Q)(w, z) =⊕

uv=w,xy=z

S(u, x)⊗ Q(v, y)

scaling kQ(u, v) = k ⊗ Q(u, v)

power Q0(x, y) =

{1 if x = ε ∧ y = ε

0 otherwise

Qn+1(x, y) = (Q · Qn)(x, y)

closure Q∗(x, y) =⊕k≥0

Qk(x, y)

composition (S ◦ Q)(x, y) =⊕z∈∆∗

S(x, z)⊗ Q(z, y)

1st projection π1(S)(x) =⊕y∈∆∗

S(x, y)

2nd projection π2(S)(y) =⊕x∈Σ∗

S(x, y)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 65 / 132

Closure properties and algebra of WFSAAlgebra of WRLs and WRTs

Example (Application)

Q[L](x) =

π2(ID(L) ◦ Q)(x)

π2(⊕x∈∆∗

ID(L(x))⊗ Q(x, y))

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 66 / 132

Outline

5 Distance algorithmsAll-pairs-distances algorithmSingle-source-distances algorithms

DAG-Distance

Computing best pathsε-RemovalWeight PushingWeighted minimization

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 67 / 132

Distance algorithms

in weighted finite-state automata

Distance algorithms

(Shortest-)Distance algorithms are very important in NLP fordisambiguating results.There are generally two classes of algorithms:

I All-pair-distance algorithms compute the distance between every pair ofstates.

I Single-source-distance algorithms compute the distance from a specificstate – most often the start state – to any other state.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 69 / 132

Distance algorithms

Definition (Semiring distance)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 a WFSA with underlying semiringK = 〈W,⊕,⊗, 0, 1〉.Let Π(p, q) denote the (possibly infinite) set of paths originating in p andending in q. If π = t1 . . . tk is a path in Π(p, q), let w[π] denote the weight ofthis path: w[π] = w[t1]⊗ w[t2]⊗ . . .⊗ w[tk].The distance ∆(p, q) between states p, q ∈ Q is defined in the following way:

∆(p, q) =⊕

π∈Π(p,q)

w[π]

In idempotent semirings we call the distance between p and q also theshortest distance.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 70 / 132

Distance algorithmsClosed semiring: recapitulation of the definition

Definition (Closed semiring)

Let K = 〈W,⊕,⊗, 0, 1〉 be a semiring. K is called closed if

Every finite or infinite sum of elements ai ∈W is in W too:

∀{a1, . . . , an} ⊆W :n⊕i=1

ai ∈W

Associativity and commutativity hold for countable sums:

Distributivity hold for countable sums:

∀{a1, . . . , an}, {b1, . . . , bm} ⊆W :n⊕i=1

ai ⊗m⊕j=1

bj =n⊕i=1

( m⊕j=1

(ai ⊗ bj))

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 71 / 132

Distance algorithmsAll-pairs-distances algorithm

All-pairs-distances(A)

Require: A WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 with closed semiring K = 〈W,⊕,⊗, 0, 1〉Ensure: A matrix d|Q| of size |Q| × |Q| with the distances between states qi and qj .1: for i← 1 to |Q| do2: for j ← 1 to |Q| do3: d0[i, j]← w[i, j]4: for k ← 1 to |Q| do5: for i← 1 to |Q| do6: for j ← 1 to |Q| do7: d(k)[i, j]← d(k−1)[i, j]⊕

(d(k−1)[i, k]⊗ (d(k−1)[k, k])∗ ⊗ d(k−1)[k, j]

)Definition (Initial distance)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 a WFSA with closed semiring K = 〈W,⊕,⊗, 0, 1〉.The initial distance w[i, j] between states qi and qj ∈ Q is defined as:

w[i, j] =

〈qi,a,w,qj〉∈Ew if i 6= j

1⊕⊕

〈qi,a,w,qj〉∈Ew if i = j

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 72 / 132

Distance algorithmsAll-pairs-distance algorithm

NotesLines 1 to 3 initialize the matrix. There are four cases to consider:

I We know that by definition every state is reachable from itself by ε.Since the weight associated with ε is 1, we initialize d[i, i] with 1.

I If in addition there is some transition 〈qi, a, w, qi〉 ∈ E, we abstractly addw to d[i, i].

I For the state pairs qi 6= qj there are two possibilities: if there is a transitionfrom qi to qi with weight w in E, d[i, j] is w. Otherwise, d[i, j] is 0.

Lines 4 to 7 compute a sequence of matrices (see next slide).

The time complexity is in O(|Q|3), since the algorithm uses three nestedloops.

The space complexity is in O(|Q|2), since the algorithm uses a|Q| × |Q|-matrix.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 73 / 132

Distance algorithmsAll-pairs-distance algorithm: idea

Let π be a (possibly infinite) path between states qi and qj such that the index lof all intermediate states ql is less or equal to some constant k (1 ≤ k ≤ |Q|).Now consider state qk:

If qk is not an intermediate state on path π, then all the indices of allintermediate states on π are less or equal to k − 1.If qk is an intermediate state on path π, the indices of the intermediatestates of paths π1 = qi qk, π2 = qk qk and π3 = qk qj are thenall less or equal to k − 1.

��

���������

���

�����

�����

Recursive equation:

d(k)[i, j] =

{w[i, j] if k = 0

d(k−1)[i, j]⊕(d(k−1)[i, k]⊗ (d(k−1)[k, k])∗ ⊗ d(k−1)[k, j]

)if k > 0

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 74 / 132

Distance algorithmsAll-pairs-distance algorithm: modificationsSince a∗ equals 1 in bounded semirings, we may simplify line 7 of thealgorithm as follows:

d(k)[i, j]← d(k−1)[i, j]⊕(d(k−1)[i, k]⊗ d(k−1)[k, j]

)Example (Path weight computation in bounded semirings)Tropical semiring:

d(k)[i, j]← d(k−1)[i, j] min(d(k−1)[i, k] + d(k−1)[k, j]

)Viterbi semiring:

d(k)[i, j]← d(k−1)[i, j] max(d(k−1)[i, k] · d(k−1)[k, j]

)Boolean semiring:

d(k)[i, j]← d(k−1)[i, j] ∨(d(k−1)[i, k] ∧ d(k−1)[k, j]

)Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 75 / 132

Distance algorithmsAll-pairs-distance algorithm: negative cost cycles

Suppose, we admit negative weights in the tropical semiring:〈R∞,min,+,∞, 0〉.Then the semiring is no longer closed.Nevertheless, negative weights may be allowed in WFSA under certaincircumstances.

Definition (Closed WFSA)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 a WFSA with tropical semiring〈R∞,min,+,∞, 0〉.A is called closed if the weight of each cycle in A is ≥ 0.

Example (Negative cost cycle)

� ������ ������

����� ����

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 76 / 132

Distance algorithmsAll-pairs-distance algorithm

Example (All-pairs-distance algorithm in the tropical SR)

���

��

��

��

d0

=

0 3 8 ∞ −4∞ 0 ∞ 1 7∞ 4 0 ∞ ∞2 ∞ −5 0 ∞∞ ∞ ∞ 6 0

d1

=

0 3 8 ∞ −4∞ 0 ∞ 1 7∞ 4 0 ∞ ∞2 5 −5 0 −2∞ ∞ ∞ 6 0

d2

=

0 3 8 4 −4∞ 0 ∞ 1 7∞ 4 0 5 112 5 −5 0 −2∞ ∞ ∞ 6 0

d3

=

0 3 8 4 −4∞ 0 ∞ 1 7∞ 4 0 5 112 −1 −5 0 −2∞ ∞ ∞ 6 0

d4

=

0 3 −1 4 −43 0 −4 1 −17 4 0 5 32 −1 −5 0 −28 5 1 6 0

d5

=

0 1 −3 2 −43 0 −4 1 −17 4 0 5 32 −1 −5 0 −28 5 1 6 0

Note that we neglect the weights of the final states and the transition symbols.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 77 / 132

Distance algorithmsAll-pairs-distance algorithm and matrix multiplication

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 78 / 132

Distance algorithmsAll-pairs-distance algorithm: applications

The all-pairs-distance algorithm has – depending on the semiring – a numberof interpretations:

In idempotent semirings based on real numbers (tropical, Viterbi), itcomputes the shortest distances / highest probabilities between eachpair of states.

In the boolean semiring, it computes the reflexive and transitive closureof a binary relation.

In the concatenation semiring, it corresponds to Kleene’s algorithm tocompute a regular expression for some given finite-state automaton.

NoteNote that the concatenation semiring is not bounded. Instead of computingthe closure r∗ for some regular expression r, eventually resulting in an infiniteset, we simply let r∗ denote itself. If we denote ∪ by |, we get the regularexpression semiring.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 79 / 132

Distance algorithmsAll-pairs-distance algorithm: Kleene’s algorithm

Example (All-pairs-distance algorithm in the RE semiring)

� ��

���

d0

=

ε a ∅b ε c∅ ∅ (d|ε)

d1

=

ε a ∅b (ε|ba) c∅ ∅ (d|ε)

d2

=

(ε|a(ba∗)b) (a|a(ba∗)(ε|ba)) a(ba∗)c(b|(ε|ba)(ba∗)b) ((ε|ba)|(ε|ba)(ba∗)(ε|ba)) (c|(ε|ba)(ba∗)c)

∅ ∅ (d|ε)

d3

=

(ε|a(ba∗)b) (a|a(ba∗)(ε|ba)) (a(ba∗)c|a(ba∗)c(d∗)(d|ε))(b|(ε|ba)(ba∗)b) ((ε|ba)|(ε|ba)(ba∗)(ε|ba)) ((c|(ε|ba)(ba∗)c)|(c|(ε|ba)(ba∗)c)(d∗)(d|ε))

∅ ∅ ((d|ε)|(d|ε)(d∗)(d|ε))

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 80 / 132

Distance algorithmsAll-pairs-distance algorithm: Kleene’s algorithm

Kleene’s algorithmRequire: A FSA A = 〈Q,Σ, q0, F, E〉Ensure: A to A equivalent regular expression1: All-pairs-distances(A)2: return

⋃q∈F

d|Q|[q0, q]

Example

� ��

���

Since state 2 and 3 are final states, we consider cells d3[1, 2] and d3[1, 3].The corresponding regular expression is (a|a(ba∗)(ε|ba)) | (a(ba∗)c|a(ba∗)c(d∗)(d|ε)).

CorollaryThe structure 〈FSA,∪, ·, ∅, {ε}〉 is a semiring.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 81 / 132

Distance algorithmsSingle-source-distance algorithms

Single-source-distance algorithms compute the distances from a givenstate q (in most cases the start state q0) to every other state in the WFSA.They have a lot of applications:

I Computing the best analysis (tropical semiring)I Computing the path with the highest probability (Viterbi semiring)I Statistical language processing: forward algorithm (probabilistic semiring)

There are a number of different algorithms, depending of the acyclicityof the input WFSA and the semiring:

1 Lawler’s algorithm2 Dijkstra’s algorithm3 Bellman-Ford algorithm4 Mohri’s algorithm

Mohri’s algorithm generalizes algorithms 1 to 3.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 82 / 132

Distance algorithmsSingle-source-shortest-distance: initialization

Initialize-Single-Source(A, q0)

Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉Ensure:1: d[q0]← 12: for each q ∈ Q− {q0} do3: d[q]← 04: Π[q]← nil

Single-source-distance algorithms maintain a vector d such that upontermination d[q] will hold the distance from the start state q0 to q for allq ∈ Q.

Initially, d is initialized with 0 for all states except the start state, sincethe distances are not yet established.

Note that single-source-distance algorithms maintain a second vector Πwhich holds the predecessor of each state. We use it to actually constructthe best path. Initially, there are no predecessors

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 83 / 132

Distance algorithmsSingle-source-shortest-distance: Relaxation

All single-source-shortest-distance algorithms are based on a principlecalled relaxation:

I Suppose, we consider a transition t from state p to state q with weight wI If the distance d[q] from the start state q0 to q becomes better if we includet in our best path, we update d[q] accordingly.

Note that this procedure requires some ordering of the semiring elements.Recall that the natural order is based on the idempotence of the semiring.

Relax(〈p, a, w, q〉)1: if d[p]⊗ w ≤K d[q] then2: d[q]← d[p]⊗ w3: Π[q]← p

NoteThe first two lines of Relax() can be formulated more concisely:

d[q]← d[q]⊕ (d[p]⊗ w)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 84 / 132

Distance algorithmsSingle-source-shortest-distance: path properties

Theorem (Path properties in bounded semirings)Let A be a WFSM and K a bounded semiring.

1 Triangle inequalityFor every transition p w−→ q ∈ E : ∆(s, q) ≤K ∆(s, p)⊗ w

2 Upper-bound propertyFor all p ∈ Q, ∆(s, p) ≤K d[p].

3 No-path propertyIf there is no path from s to p, then d[p] = ∆(s, p) = 0

4 Convergence propertyIf s p→ q is a shortest path in A and if d[p] = ∆(s, p) beforerelaxation of p→ q, then d[q] = ∆(s, q) at any time afterwards.

5 Path-relaxation propertyIf π = 〈p0, p1, . . . pk〉 is a shortest path from s = p0 to pk, and thetransitions of π are relaxed in the order p0 → p1, p1 → p2, . . .,pk−1 → pk, then d[pk] = ∆(s, pk).

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 85 / 132

Distance algorithmsDAG-Distance: topological sort

Definition (Topological order)Let A be a finite automaton with state set Q and transition set E.The topological ordering of Q is a sequence q1, q2, . . . , q|Q| such thatwhenever E contains a transition qi → qj , i < j.

NoteOnly the state set of acyclic FSMs can be topologically ordered.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 86 / 132

Distance algorithmsDAG-Distance: topological sort

Definition (Topological sort)Let A be an acyclic WFSM with state set Q and transition set E. A is said tobe topologically sorted if whenever E contains a transition p→ q, then q > p.

Example (Topologically sorted WFSA)

����

����

��

��

����

���

��

���

����

����

���

����

���

���

����

���

���

NotesA cyclic WFSM cannot be topologically sorted.

An acyclic WSFM can be topologically sorted in O(|Q|+ |E|).

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 87 / 132

Distance algorithmsDAG-Distance(A, d)

DAG-Distance(A, d)

Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 in adjacency list representation.Ensure: A vector d such that ∀q ∈ Q : d[q] = ∆(q0, q)1: Topological-Sort(A)2: Initialize-Single-Source(A, q0)3: for each q ∈ Q in topological order do4: for each t ∈ Adj[q] do5: Relax(t)

PropertiesAssumes an adjacency list representation: Adj[q] contains the outgoing transitions of q.

Works only for acyclic WFSA, but then for any semiring (whether bounded or not).

Complexity: O(|Q|+ |E|).

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 88 / 132

Distance algorithmsDAG-Distance: example

Example (DAG-Distance in the tropical semiring)The topological order is: r s t x y z

���

����

����

��

���� ��

���

��

After Initialize-Single-Source

���

����

����

��

��� �

���

��

Exploring state r: d[s] and d[t] become better

���

����

����

��

�� � ��

��

Exploring state s

���

����

����

���

���� ��

��

Exploring state t

���

����

����

���

�� � ���

��

Exploring state x: d[y] becomes better

���

����

����

���

�� � ���

��

Exploring state y: d[z] becomes better

���

����

����

���

�� � ���

��

Exploring state z

���

����

����

���

�� � ���

��

Distances after termination

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 89 / 132

Distance algorithmsDijkstra’s algorithm

Dijkstra(A, d)

Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 in adjacency list representation.Ensure: A vector d such that ∀q ∈ Q : d[q] = ∆(q0, q)1: Initialize-Single-Source(A, q0)2: Enqueue(PQ, q0)3: processed← ∅4: while PQ 6= ∅ do5: p← Extract-Min(PQ)6: if p /∈ processed then7: processed← processed ∪ {p}8: for each 〈p, a, w, q〉 ∈ Adj[p] do9: if q /∈ processed then

10: Relax(〈p, a, w, q〉)11: if q ∈ PQ then12: Update(q, PQ)13: else14: Insert(q, PQ)

PropertiesA bounded semiring is assumed.Complexity: O(|Q|log|Q|+ |E|) (with a priority queue based on a Fibonacci heap).

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 90 / 132

Distance algorithmsDijkstra’s algorithm: example

Example (Dijkstra’s algorithm in the Viterbi semiring)

���

���

������

���

���

������

��

����

��

���������

���

������

After Initialize-Single-Source

���

���

��������

���

���

������

��

����

��

���������

���

������

After extracting state 0

���

���

��������

������

���

������

��

��

����

���

���������

��

������

After extracting state 1

���

���

��������

�������

���

������

���

��

����

��

������

���

���

���

������

After extracting state 2

���

���

��������

�������

���

������

���

��

�������

��

������

������

���

������

After extracting state 3

���

���

��������

�������

���

������

���

��

�������

��

������

������

���

����������

After extracting state 5

���

���

��������

�������

���

������

���

��

�������

��

������

������

���

����������

After extracting state 4

���

���

��������

�������

���

������

���

��

�������

��

������

������

���

����������

After extracting state 6

���

���

��������

�������

���

������

���

��

�������

��

������

������

���

����������

Distances after termination

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 91 / 132

Distance algorithmsBellman-Ford algorithm

Bellman-Ford(A, d)

Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉Ensure: A vector d such that ∀q ∈ Q : d[q] = ∆(q0, q).

Returns false, if a negative cost cycle exists, otherwise true.1: Initialize-Single-Source(A, q0)2: for i← 1 to |Q| − 1 do3: for each t ∈ E do4: Relax(t)5: for each 〈p, a, w, q〉 ∈ E do6: if ¬(dist[q] ≤ dist[p]⊗ w) then7: return false8: return true

PropertiesWorks for non-bounded semirings but the weight of each cycle in A must be bounded

Detects non-bounded cycles (for example negative cost cycles in the tropical semiring)

Complexity: O((|Q||E|)).

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 92 / 132

Distance algorithmsBellman-Ford algorithm

Example (Bellman-Ford algorithm in the tropical semiring)

���

����

���

��

���

���

��

��

After Initialize-Single-Source

���

����

���

��

���

���

��

��

After pass 1

���

����

���

��

���

����

After pass 2

���

����

���

��

���

����

After pass 3

���

����

���

� �

���

����

After pass 4

���

����

���

� �

���

����

After termination

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 93 / 132

Distance algorithmsBellman-Ford algorithm: how does algorithm work?

Since we assume a bounded (that is 0-closed) semiring, every shortestpath in A is acyclic.

If A has |Q| states, every acyclic path cannot have more than |Q| − 1transitions.

Path-relaxation property

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 94 / 132

Distance algorithmsBellman-Ford algorithm: why does the negative cost cycle check work?

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 95 / 132

Distance algorithmsMohri’s algorithm for arbitrary semirings

They key idea to generalize the single-source-distance algorithms alreadyseen is, that they all are based on a certain queue discipline, that is, the statesare processed in a certain order:

DAG-distance uses a queue which contains the states in topologicalorder. Every state is processed once.

The queue in Dijkstra’s algorithm contains the states in best-first order.Every state is processed once.

The queue in Bellman-Ford’s algorithm contains the states in any order,but every state is enqueued |Q| − 1 times.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 96 / 132

Distance algorithmsMohri’s algorithm for arbitrary semirings

Single-Source-SR-Distances(A, s, d)

Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 in adjacency list representation and a source state s.Ensure: A vector d such that ∀q ∈ Q : d[q] = ∆(s, q).1: for each q ∈ Q do2: d[q]← r[q]← 03: d[s]← r[s]← 14: Enqueue(S, s)5: while S 6= ∅ do6: q ← Dequeue(S)7: r′ ← r[q]8: r[q]← 09: for each 〈q, a, w, q′〉 ∈ Adj[q] do

10: rw ← r′ ⊗ w11: if d[q′] 6= d[q′]⊕ rw then12: d[q′]← d[q′]⊕ rw13: r[q′]← r[q′]⊕ rw14: if q′ /∈ S then15: Enqueue(S, q′)16: d[s]← 1

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 97 / 132

Distance algorithmsMohri’s algorithm for arbitrary semirings

NotesLines 1 to 3 initialize d[q] and r[q] for all q ∈ Q\{s} to 0.d[s] and r[q] are initialized with 1.

Lines 5 to 15 define a while loop in which a state queue is processed.The states in the queue may be processed in any order, but the queue mayalso make use of semiring properties (boundedness) or acyclicity of theWFSA.

Lines 10 to 13 define the actual relaxation step. Since we do not requirebounded semirings, the test with ≤ is replaced by 6=.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 98 / 132

Distance algorithmsMohri’s algorithm for arbitrary semirings: Queue disciplinesPossible queue disciplines are:

1 Topological order: the states in the queue are ordered by the in-degreeof a state, that is, the number of incoming transitions. This disciplinemay be chosen in the case of acyclic WFSA.

2 Topological sort order: the queue is ordered by state number. Thisdiscipline may be chosen in the case of acyclic WFSA which aretopologically sorted.

3 Best-first: the states in the queue are ordered by the natural order of theweights in d.

4 Bellman-Ford: every state is enqueued |Q| − 1 times.5 Strongly-connected components: the queue contains subqueues which

hold the states of the strongly-connected components. The main queue isordered by the component graph.

6 LIFO or FIFO7 Arbitrary

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 99 / 132

Distance algorithmsMohri’s algorithm for arbitrary semirings

Example (Tropical semiring)

�����������

���������

����

������

�����

����

State q 0 1 2 3d[q] 0 ∞ ∞ ∞r[q] 0 ∞ ∞ ∞

Queue [ 0 ]

State q 0 1 2 3d[q] 0 0.2 ∞ 0.8r[q] ∞ 0.2 ∞ 0.8

Queue [ 3 1 ]

State q 0 1 2 3d[q] 0 0.2 ∞ 0.8r[q] ∞ 0.2 ∞ ∞

Queue [ 1 ]

State q 0 1 2 3d[q] 0 0.2 0.8 0.5r[q] ∞ ∞ 0.8 0.5

Queue [ 3 2 ]

State q 0 1 2 3d[q] 0 0.2 0.8 0.5r[q] ∞ ∞ 0.8 ∞

Queue [ 2 ]

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 100 / 132

Distance algorithmsMohri’s algorithm for arbitrary semirings: Approximation in non-k-closed semirings

Example

� ������

���

State q 0 1d[q] 1 0r[q] 1 0

State q 0 1d[q] 1 1r[q] 0 1

State q 0 1d[q] 1 1.5r[q] 0 0.5

State q 0 1d[q] 1 1.75r[q] 0 0.25

State q 0 1d[q] 1 1.875r[q] 0 0.125

State q 0 1d[q] 1 1.9375r[q] 0 0.0625

State q 0 1d[q] 1 1.96875r[q] 0 0.03125

State q 0 1d[q] 1 1.984375r[q] 0 0.015625

State q 0 1d[q] 1 1.9921875r[q] 0 0.0078125

State q 0 1d[q] 1 1.9960938r[q] 0 0.00390625

State q 0 1d[q] 1 1.9980469r[q] 0 0.001953125

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 101 / 132

Distance algorithmsComputing best paths

After using one of the shortest-distance algorithms, we calculate the weight ofthe best path in the following way:

Definition (Best path)

Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 be a WFSA with K = 〈W,⊕,⊗, 0, 1〉 as theunderlying semiring.The weight of the best path in A is defined as follows:

best-path(A) =⊕q∈F

(λ⊗∆(q0, q)⊗ ρ(p)

)The actual best path can be reconstructed using the predecessor vector Π.

NoteNote that this corresponds to the master formula, except that the labels alongthe different paths are neglected.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 102 / 132

Distance algorithmsComputing best paths

Example (Best path in the Viterbi semiring)

�����

������

�����

����

�����

�����

����

��������

�����

�������

A

� ������

������

����

������

�������

best-path(A)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 103 / 132

Distance algorithmsComputing best paths: application in morphological disambiguation

Example (Morphological disambiguation in the tropical semiring)

���

���

���

�������

������

��

����

��

� ��

����

����������

�����

���

�� ��

���

����

�����

�����

�����

���� ����� ����� ��

����

����

��������������

����

�����

����

����

�����

������� �� ������ ����

The tropical WFSA above represents two morphological analyses of German word Kindergarten:Kindergarten [NN SemClass=k_g_artef_geb Gender=masc Number=sg Case=nom_acc_dat] <0>Kind/N\er#Garten [NN SemClass=k_raumort Gender=masc Number=sg Case=nom_acc_dat] <12>

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 104 / 132

Distance algorithmsComputing best paths

NotesNote that the notion of a best path is only defined in idempotentsemirings.

But this is a necessary condition, not a sufficient one:the string semiring is idempotent, but a best path need not exist.

Example (Best path operation in the string semiring)

�����

����

��

����

The abstract sum over all paths in this WFSA – that is computing the longest prefix of all paths – isxyz ∧ xzz = x. But there is no successful path for x.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 105 / 132

Shortest-distance algorithmsComputing n-best paths

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 106 / 132

Equivalence transformations

Equivalence transformations

A number of equivalence transformations are defined for WFSAs/WFSTs:

ε-Removal

Determinization

Pushing

Minimization

ε-Normalization (only for WFSTs)

Synchronization (only for WFSTs)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 108 / 132

Equivalence transformationsε-Removal

For weighted automata, the strategy of computing ε-closures does notwork any longer.

Definition (ε-distance)Let A be a WFSM with state set Q, transition set E and alphabet Σ.Let Π(p, x, q) the set of paths from p ∈ Q to q ∈ Q labeled with x ∈ Σ∗.The ε-distance ∆ε(p, q) is defined as:

∆ε(p, q) =⊕

π∈Π(p,ε,q)

w[π]

Define ∆ε(p) as {〈q,∆ε(p, q)〉 | ∃q ∈ Q : ∆ε(p, q) 6= 0}.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 109 / 132

Equivalence transformationsε-Removal

Example (Abigrams)

���

������

��

���

������ �

��

�������

����� �

��������

���

�����������

����������

��

���������

��

������

���������

�����

����

���

���

���

���

���

���

��

�������

��

������������

�����������

���� �������

�� ���

��������

�������

���

���

�����

������

�����

Abigrams representing bigram counts in the real semiring

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 110 / 132

Equivalence transformationsε-Removal

Example (Abigrams after ε-removal)

�������

����

����������������� �������

������

�������

����

�������

������

��

���������

���������

��

������

��

������

��

�������� �������

��

��� ������

����������

��

��������

rmε(Abigrams)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 111 / 132

Equivalence transformationsε-removal algorithm

rmε(A)

Require: WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉 in adjacency list representationwith underlying semiring 〈W,⊕,⊗, 0, 1〉.

Ensure: An equivalent WFSA A′ without ε-transitions1: for each p ∈ Q do2: for each 〈q, w〉 ∈ ∆ε(p) do3: for each 〈q, a, w′, r〉 ∈ Adj[q], a 6= ε do4: E ← E ∪ {〈p, a, w ⊗ w′, r〉}5: if q ∈ F then6: if p ∈ F then7: ρ(p)← ρ(p)⊕ (w ⊗ ρ(q))8: else9: F ← F ∪ {p}

10: ρ(p)← w ⊗ ρ(q)11: E ← E\{〈p, ε, w, q〉 ∈ E}12: Connect(A)13: return A

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 112 / 132

Equivalence transformationsε-removal scheme

������

����������

������

Suppose the ε-distance ∆ε(p, q) between states p and q is w.

Suppose q has an outgoing transition q© a / w′−−−−−→ r© with a ∈ Σ.

Then, a new transition p© a / w⊗w′−−−−−−−→ r© is added.Note that the following boundary cases are possible:p = q, q = r or even p = q = r.Since q may become inaccessible after removal of the ε-transitions(line 11), A is made connected in line 12.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 113 / 132

Equivalence transformationsε-removal schemeSuppose q is a final state. There are two possible cases:

p /∈ F :

� ������������� ⇒

���������� �������������

p ∈ F :

������� ������������� ⇒

������������������ ������������

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 114 / 132

Equivalence transformationsε-removal algorithm: complexity

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 115 / 132

Equivalence transformationsWeighted determinization

Determinization of weighted FSAs and FSTs is based on an extension ofthe classical power set construction.

States in deterministic finite-state machines are constructed as sets ofpairs 〈q, w〉 such that q is a state of the input WFSM and w is a weight,the so called delayed weight.

The underlying semiring is required to be weakly left divisible becausewe try to realize “safe” weights as early as possible.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 116 / 132

Equivalence transformationsWeighted determinization: algorithm

Determinize-WFSA(A1, A2)

Require: WFSA A1 = 〈Q1,Σ, q01 , F1, δ1, σ1, λ, ρ1〉Ensure: WFSA A2 = 〈Q2,Σ, q02 , F2, δ2, σ2, λ, ρ2〉1: Q2 ← F2 ← ∅2: q02 ← {〈q01 , 1〉}3: Enqueue(Queue, q02 )4: while Queue 6= ∅ do5: q2 ← Dequeue(Queue)6: if ∃〈q, w〉 ∈ q2 ∧ q ∈ F1 then7: F2 ← F2 ∪ {q2}8: ρ2(q2)←

⊕q∈F1

w ⊗ ρ1(q)

9: for each a ∈ Σ : 〈p, x〉 ∈ q2 ∧ δ1(p, a) 6= ∅ do10: J2(a)← {〈q, w, q′〉 | q′ ∈ δ1(q, a) ∧ 〈q, w〉 ∈ q2}11: J1(a)← {〈q, w〉 | ∃q′ : 〈q, w, q′〉 ∈ J2(a)}12: σ2(q2, a)←

⊕〈q,w〉∈J1(a)

w ⊗⊕

q′∈δ1(q,a)

σ1(q, a, q′)

13: δ2(q2, a)←⋃

〈q,w,q′〉∈J2(a)

{σ2(q2, a)−1 ⊗ w ⊗ σ1(q, a, q′)}

14: if q2 /∈ Q2 then15: Q2 ← Q2 ∪ {q2}16: Enqueue(Queue, q2)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 117 / 132

Equivalence transformationsWeighted determinization: algorithm

NotesLine 3 enqueues the start state q02 of the deterministic WFSA

Lines 4 to 16 create new states

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 118 / 132

Equivalence transformationsWeighted determinization

Example (Determinization in the tropical semiring)

������

�����

�����

�����

����

���������

��������� ��������������������

����

������������ ���

����

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 119 / 132

Equivalence transformationsWeighted determinization

Example (Determinization in the real semiring)

������

�����

����

�����

����

���������

��������� ��������������������

����

�������������������

�����

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 120 / 132

Equivalence transformationsWeighted determinization: non-determinizable WFSAs

Not every WFSA has an equivalent deterministic WFSA.

That means that the class of deterministic WFSA is properly contained inthe class of non-deterministic WFSA.

Example (Non-determinizable WFSA in the tropical semiring)

������

�����

�����

�����

����

���� �����

��������� ��������������������

���������������������

��������������������� ����

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 121 / 132

Equivalence transformationsWeighted determinization: non-determinizable WFSAs

Example (Non-determinizable WFSA in the probabilistic semiring)

����������

������

�����

�����

������

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 122 / 132

Equivalence transformationsWeighted determinization: twins property

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 123 / 132

Equivalence transformationsWeight Pushing

Definition (Weight Pushing)Let A = 〈Q,Σ, q0, F, E, λ, ρ〉 a WFSA.Define the potential pot[q] for all states q ∈ Q as follows:

pot[q] =⊕

p∈F,π∈Π(q,p)

w[π]⊗ ρ(p)

The equivalent WFSA push-weights(A) is defined as:push-weights(A) = 〈Q,Σ, q0, F, E

′, λ′, ρ′〉, where:

ρ′(q) = pot[q]−1 ⊗ ρ(q),∀q ∈ QE′ = {〈p, a, pot[p]−1 ⊗ w ⊗ pot[q], q〉 | 〈p, a, w, q〉 ∈ E}λ′ = λ⊗ pot[q0]

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 124 / 132

Equivalence transformationsWeight Pushing

Example (Weight Pushing in the tropical semiring)

���

����

���

�����

��

����

��

��

���

�������

�������

����

�� ���

������

����� � �����

A

���

����

���

�����

��

����

��

��

���

�������

�������

����

������

������

����� � �����

push-weights(A)

Consider the transition with weight 0 between states 3 and 5.The new weight w′ of this transition is computed as follows:

w′ = −pot[3] + 0 + pot[5] = −2 + 0 + 5 = 3

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 125 / 132

Equivalence transformationsWeight Pushing

Example (Weight Pushing in the real SR: trigram probabilities)

���

����

���

����

�����

��

���

�����

���

����

�����

���

���������

��������

���������

���������

��� ���

�����

���

����

���

������

�������

����

�����

��������

��

�������

�������

������

�������

���

������

� ��������

������

�������

�������

������� �

���

������

��

���

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 126 / 132

Equivalence transformationsWeight Pushing

Weight pushing is always possible, regardless whether the input WFSAis deterministic or not.In case of deterministic WFSA, weight pushing leads to a canonical ornormal form of the input WFSA A:

I For every x ∈ Σ∗ there is at most a single path in A.I On that path, the path weight is pushed as much as possible towards the

start state.

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 127 / 132

Equivalence transformationsWeighted minimization

Weighted-Minimization(A)

Require: A deterministic WFSA A = 〈Q,Σ, q0, F, E, λ, ρ〉Ensure: A to A equivalent minimal WFSA A′

1: Push-Weights(A)2: Encode(A,EM)3: Unweighted-Minimization(A)4: Decode(A,EM)5: return A

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 128 / 132

Equivalence transformationsWeighted minimization

Example (Weight minimization in the tropical semiring)

���

����

���

�����

��

����

��

��

���

�������

�������

����

������

������

����� � �����

A

���

����

���

�����

��

����

��

��

���

�������

�������

����

������

������

����� � �����

Weighted-Minimization(A)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 129 / 132

Outline

6 Literature

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 130 / 132

Literature

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 131 / 132

Versions

31.10.2008: version 0.1 (initial version)

14.11.2008: version 0.2 (added shortest-distance algorithms)

23.11.2008: version 0.3 (changed Dijkstra algorithm, added shortest pathproperties, added ε-filter example)

1.12.2008: version 0.4 (added equivalence transformation section)

Thomas Hanneforth (Universitat Potsdam) Finite-state Machines: Theory and Applications December 10, 2008 132 / 132