40
Finite State Transducers Finite State Transducers 1 Mark Stamp

Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label

Embed Size (px)

Citation preview

Page 1: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

1

Finite State Transducers

Finite State Transducers

Mark Stamp

Page 2: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

2

Finite State Automata

FSA states and transitionso Represented as labeled directed

graphso FSA has one label per edge

State are circles: o Double circles for end states:

Beginning stateo Denoted by arrowhead:o Or, sometimes bold circle is used:

Finite State Transducers

Page 3: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

3

FSA Example

Nodes are states Transitions are (labeled) arrows For example…

Finite State Transducers

3

a

1

2y

c

z

Page 4: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

4

Finite State Transducer

FST input & output labels on edgeo That is, 2 labels per edgeo Can be more labels (e.g., edge

weights)o Recall, FSA has one label per edge

FST represented as directed grapho And same symbols used as for FSAo FSTs may be useful in malware

analysis…Finite State Transducers

Page 5: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

5

Finite State Transducer

FST has input and output “tapes”o Transducer, i.e., can map input to

outputo Often viewed as “translating” machineo But somewhat more general

FST is a finite automata with outputo Usual finite automata only has inputo Used in natural language processing

(NLP)o Also used in many other applicationsFinite State Transducers

Page 6: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

6

FST Graphically

Edges/transitions are (labeled) arrowso Of the form, i : o, that is, input:ouput

Nodes labeled numerically For example…

Finite State Transducers

3

a:b

1

2 y:q

c:d

z:x

Page 7: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

7

FST Modes

FST usually viewed as translating machine

But FST can operate in several modeso Generationo Recognitiono Translation (left-to-right or right-to-left)

Examples of modes considered next…

Finite State Transducers

Page 8: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

8

FST Modes

Consider this simple example: Generation mode

o Write equal number of a and b to first and second tape, respectively

Recognition modeo “Accept” when 1st tape has same

number of a as 2nd tape has b Translation mode next slide

Finite State Transducers

1

a:b

Page 9: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

9

FST Modes

Consider this simple example: Translation mode

o Left-to-right For every a read from 1st tape, write b to 2nd tape

o Right-to-left For every b read from 2nd tape, write a to 1st tape

Translation is the mode we usually want to consider

Finite State Transducers

1

a:b

Page 10: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

10

WFST

WFST == Weighted FSTo Include a “weight” on each edgeo That is, edges of the form i : o / w

Often, probabilities serve as weights…

Finite State Transducers

3

a:b/1

1

2 y:q/1

c:d/0.6

z:x/0.4

Page 11: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

11

FST Example

Homework…

Finite State Transducers

Page 12: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

12

Operations on FSTs

Many well-defined operations on FSTso Union, intersection, composition, etc.o These also apply to WFSTs

Composition is especially interesting

In malware context, might want to…o Compose detectors for same familyo Compose detectors for different

families Why might this be useful?

Finite State Transducers

Page 13: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

13

FST Composition

Compose 2 FSTs (or WFSTs)o Suppose 1st WFST has nodes 1,2,…,n o Suppose 2nd WFST has nodes 1,2,…,m o Possible nodes in composition labeled

(i,j), for i = 1,2,…,n and j = 1,2,…,m o Generally, not all of these will appear

Edge from (i1,j1) to (i2,j2) only when composed labels “match” (next slide…)

Finite State Transducers

Page 14: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

14

FST Composition

Suppose we have following labels o In 1st WFST, edge from i1 to i2 is x:y/p

o In 2nd WFST, edge from j1 to j2 is w:z/q

Consider nodes (i1,j1) and (i2,j2) in composed WFST o Edge between nodes provided y == w o I.e., output from 1st matches input for

2nd o And, resulting edge label is x:z/pq

Finite State Transducers

Page 15: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

15

WFST Composition

Consider composition of WFSTs

And…

Finite State Transducers

41 2a:b/0.1

3

41 2

3

a:b/0.2

b:b/0.3 a:b/0.5 a:a/0.6

b:b/0.4

b:b/0.1a:b/0.3 b:a/0.5

a:b/0.4

b:a/0.2

Page 16: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

16

WFSTCompositi

onExample

Finite State Transducers

41 2a:b/0.1

3

41 2

3

a:b/0.2

b:b/0.3 a:b/0.5 a:a/0.6

b:b/0.4

b:b/0.1a:b/0.3 b:a/0.5

a:b/0.4

b:a/0.2

1,1 2,2a:b/.01

1,2a:a/.04

a:a/.02

4,2b:a/.08

3,2

b:a/.06

a:a/.1

4,3

a:b/.24

a:b/.18

4,4

Page 17: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

17

WFST Composition In previous example, composition

is…

But (4,3) node is uselesso Must always end in a final state

Finite State Transducers

1,1 2,2a:b/.01

1,2a:a/.04

a:a/.02

4,2b:a/.08

3,2

b:a/.06

a:a/.1

4,3

a:b/.24

a:b/.18

4,4

Page 18: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

18

FST Approximation of HMM

Why would we want to approximate an HMM by FST?o Faster scoring using FSTo Easier to correct misclassification in

FSTo Possible to compose FSTso Most important, it’s really cool and

fun… Down side?

o FST may be less accurate than the HMM

Finite State Transducers

Page 19: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

19

FST Approximation of HMM

How to approximate HMM by FST? We consider 2 methods known as

o n-type approximationo s-type approximation

These usually focused on “problem 2”o That is, uncovering the hidden stateso This is the usual concern in NLP, such

as “part of speech” taggingFinite State Transducers

Page 20: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

20

n-type Approximation

Let V be distinct observations in HMMo Let λ = (A,B,π) be a trained HMMo Recall, A is N x N, B is N x M, π is 1 x N

Let (input : output / weight) = (Vi : Sj / p) o Where i {1,2,…,M} and j {1,2,…,N} o And Sj are hidden states (rows of B) o And weight is max probability (from λ)

Examples later…Finite State Transducers

Page 21: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

21

More n-type Approximations

Range of n-type approximationso n0-type only use the B matrixo n1-type see previous slideo n2-type for 2nd order HMMo n3-type for 3rd order HMM, and so on

What is 2nd order HMM?o Transitions depend on 2 consecutive

stateso In 1st order, only depend on previous

state Finite State Transducers

Page 22: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

22

s-type Approximation

“Sentence type” approximation Use sequences and/or natural breaks

o In n-type, max probability over one transition using A and B matrices

o In s-type, all sequences up to some length Ideally, break at boundaries of some sort

o In NLP, sentence is such a boundaryo For malware, not so clear where to breako So in malware, maybe just use a fixed length

Finite State Transducers

Page 23: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

23

HMM to FST

Exact representation also possibleo That is, resulting FST is “same” as

HMM Given model λ = (A,B,π) Nodes for each (input : output) = (Vi :

Sj) o Edge from each node to all other

nodes…o …including loop to same nodeo Edges labeled with target node o Weights computed from probabilities

in λ

Finite State Transducers

Page 24: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

24

HMM to FST

Note that some probabilities may be 0o Remove edges with 0 probabilities

A lot of probabilities may be smallo So, maybe approximate by removing

edges with “small” probabilities?o Could be an interesting experiment…o A reasonable way to approximate

HMM that does not seem to have been studied Finite State Transducers

Page 25: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

25

HMM Example

Suppose we have 2 coinso 1 coin is fair and 1 unfairo Roll a die to decide which coin to flipo We see resulting sequence of H and T

o We do not know which coin was

flipped…o …and we do not see the roll of the die

Observations? Hidden states? Finite State Transducers

Page 26: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

26

HMM Example Suppose probabilities are as given

o Then what is λ = (A,B,π) ?

Finite State Transducers

fair unfair0.9 0.2

0.8

0.1

0.5 0.5

H T H T

0.30.7

Observations:

Hidden states:

Page 27: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

27

HMM Example

HMM is given by λ = (A,B,π), where

A = B = π =

This π implies we start in F (fair) stateo Also, state 1 is F and state 2 is U (unfair)

Suppose we observe HHTHT o Then probability of, say, FUFFU isπFbF(H)aFUbU(H)aUFbF(T)aFFbF(H)aFUbU(T)

= 1.0(0.5)(0.1)(0.7)(0.8)(0.5)(0.9)(0.5)(0.1)(0.3) = 0.000189

Finite State Transducers

Page 28: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

28

HMM Example

We have

A =

B =

π =

And observe HHTHTo Probabilities in

tableFinite State Transducers

FFFFF .020503 .664086

FFFFU .001367 .044272

FFFUF .002835 .091824

FFFUU .000425 .013774

FFUFF .001215 .039353

FFUFU .000081 .002624

FFUUF .000387 .012243

FFUUU .000057 .001836

FUFFF .002835 .091824

FUFFU .000189 .006122

FUFUF .000392 .012697

FUFUU .000059 .001905

FUUFF .000378 .012243

FUUFU .000025 .000816

FUUUF .000118 .003809

FUUUU .000018 .000571

score probabilitystate

Page 29: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

29

HMM Example

So, most likely state sequence iso FFFFF o Solves problem 2

Problem 1, scoring?o Next slide

Problem 3?o Not relevant hereFinite State Transducers

FFFFF .020503 .664086

FFFFU .001367 .044272

FFFUF .002835 .091824

FFFUU .000425 .013774

FFUFF .001215 .039353

FFUFU .000081 .002624

FFUUF .000387 .012243

FFUUU .000057 .001836

FUFFF .002835 .091824

FUFFU .000189 .006122

FUFUF .000392 .012697

FUFUU .000059 .001905

FUUFF .000378 .012243

FUUFU .000025 .000816

FUUUF .000118 .003809

FUUUU .000018 .000571

score probabilitystate

Page 30: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

30

HMM Example

How to score sequence HHTHT ?

Sum over all stateso Sum the “score”

column in table:P(HHTHT) = .030874o Forward algorithm

is way more efficient

Finite State Transducers

FFFFF .020503 .664086

FFFFU .001367 .044272

FFFUF .002835 .091824

FFFUU .000425 .013774

FFUFF .001215 .039353

FFUFU .000081 .002624

FFUUF .000387 .012243

FFUUU .000057 .001836

FUFFF .002835 .091824

FUFFU .000189 .006122

FUFUF .000392 .012697

FUFUU .000059 .001905

FUUFF .000378 .012243

FUUFU .000025 .000816

FUUUF .000118 .003809

FUUUU .000018 .000571

score probabilitystate

Page 31: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

31

n-type Approximation

Consider the 2-coin HMM with

A = B = π =

For each observation, only include the most probable hidden stateo So, only possible FST labels in this

case…H:F/w1, H:U/w2, T:F/w3, T:U/w4

o Where weights wi are probabilitiesFinite State Transducers

Page 32: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

32

n-type Approximation

Consider example

A =

B =

π = For each

observation, most probable stateo Weight is probability

Finite State Transducers

2

H:F/0.45

3

T:F/0.45

1

H:F/0.5

T:F/0.5

T:F/0.45H:F/0.45

Page 33: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

33

n-type Approximation

Suppose instead…

A =

B =

π = Most probable state

for each observation?o Weight is probabilityFinite State Transducers

2

H:U/0.42

3

T:F/0.30

1

H:U/0.35

T:F/0.25

T:F/0.20

H:F/0.30

4

H:F/0.30

T:F/0.30

Page 34: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

34

HMM as FST

Consider 2-coin HMM where

A = B = π =

Then FST nodes correspond to…o Initial stateo Heads from fair coin, (H:F) o Tails from fair coin (T:F) o Heads from unfair coin (H:U) o Tails from unfair coin (T:U)

Finite State Transducers

Page 35: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

35

HMM as FST Suppose HMM is specified by

A = B = π =

Then FST is…

Finite State Transducers

2

H:F

3

T:F

4

1

5H:U

T:U

T:U T:U

T:U

H:U

H:UH:U

H:F

H:F

H:F

H:F

T:F

T:FT:F

T:F

Page 36: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

36

HMM as FST This FST is boring and not very useful

o Weights make it a little more interesting Computing the weights is homework…

Finite State Transducers

2

H:F

3

T:F

4

1

5H:U

T:U

T:U T:U

T:U

H:U

H:UH:U

H:F

H:F

H:F

H:F

T:F

T:FT:F

T:F

Page 37: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

37

Why Consider FSTs?

FST used as “translating machine” Well-defined operations on FSTs

o Composition is an interesting example Can convert HMM to FST

o Either exact or approximationo Approximations may be much

simplified, but might not be as accurate

Advantages of FST over HMM?Finite State Transducers

Page 38: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

38

Why Consider FSTs?

Scoring/translating faster with FST Able to compose multiple FSTs

o Where FSTs may be derived from HMMs One idea…

o Multiple HMMs trained on malware (same family and/or different families)

o Convert each HMM to FSTo Compose resulting FSTs

Finite State Transducers

Page 39: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

39

Bottom Line

Can we get best of both worlds?o Fast scoring, composition with FSTso Simplify/approximate HMMs via FSTso Tweak FST to improve scoringo Efficient training using HMMs

Other possibilities?o Directly compute an FST without HMMo Or FST as first pass (e.g.,

disassembly?)Finite State Transducers

Page 40: Finite State Transducers 1 Mark Stamp. Finite State Automata  FSA  states and transitions o Represented as labeled directed graphs o FSA has one label

40

References

A. Kempe, Finite state transducers approximating hidden Markov models

J. R. Novak, Weighted finite state transducers: Important algorithms

K. Striegnitz, Finite state transducers

Finite State Transducers