23
Brian Mitchell (bmitche [email protected]) - Dre xel University MCS680-F 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size) { int i,j; int weight = 0; for(i=0; i<size; i++) for(j=0; j<size; j++) weight+= graph[i][j]; return weight; } 1 1 n n O(1) O(1) O(n) O(n) Running Time = 2O(1) + O(n 2 ) = O(n 2 ) MCS680: Foundations Of Computer Science

MCS680: Foundations Of Computer Science

  • Upload
    haile

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

int MSTWeight(int graph[][], int size) { int i,j; int weight = 0; for(i=0; i

Citation preview

Page 1: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

1

Patterns,

Automata

&

Regular Expressions

int MSTWeight(int graph[][], int size){

int i,j;int weight = 0;

for(i=0; i<size; i++)for(j=0; j<size; j++)

weight+= graph[i][j];

return weight;}

1

1

nn

O(1)

O(1)

O(n) O(n)

Running Time = 2O(1) + O(n2) = O(n2)

MCS680:Foundations Of

Computer Science

Page 2: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

2

Introduction

• A pattern is a set of objects with some recognizable property– Programming language identifiers

• Start with a character, may be followed by zero or more other characters or numbers

• [a-z|A-Z][a-z|A-Z|0-9|’_’]*

• Problems with patterns– Definition of patterns

– Recognition of patterns

• Uses for patterns– Programming language design

– Circuit design

– Text editors• Searching for words

– Operating system command processors• dir *.exe

Page 3: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

3

State Machines and Automata

• Programs that search for patterns in data often have a special structure

• Track progress toward overall goal– Manage states

• Overall behavior of the program can be viewed as moving from state to state as it reads its input

• We can use a graph to represent the behavior of programs that search for patterns in data– Graph is called an automation

– Special nodes• Start node

• Accepting nodes (may be more than 1)

• Edges are called transitions

• The input is not accepted if we are not at an accepting node after all of the input is read

Page 4: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

4

Example Automation(Finite State Machine)

• Consider an automation to recognize a sequence of characters that contains the characters ‘aeiou’ in order– Let (Lambda) be the entire alphabet of

acceptable characters. In this example, a-z and A-Z

• Sigma () is also used in many texts to represent the alphabet

0 1 2 3 4 5a e i o u

-a -e -i -o -u

Page 5: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

5

Example Automation(Finite State Machine)

• Consider an automation to recognize a signed integer– May begin with a ‘+’,’-’ or integer value in

the range of 0-9

– Followed by zero or more occurrences of integers 0-9

– Let (Lambda) be the entire alphabet of acceptable characters. In this example, 0-9

0

1

3

2

‘+’

‘-’

Page 6: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

6

Deterministic and Nondeterministic Automata

• Deterministic Automata– For any state s and any input x there is at most

one transition out of state s whose label includes x

– Simulating deterministic automata• Given that we are in state s and the next input is x

– We either transition out of state s or,– We “die” at state s

– Easy to convert a deterministic automata into a program

• NonDeterministic Automata– Nondeterministic automata are allowed (but not

required) to have two or more transitions containing the same symbol out of the same state

– Nondeterministic automata are allowed to have (e-moves) where we transition out of a state with no inputs (use an empty string character)

Page 7: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

7

Nondeterministic AutomataExample

• Consider the language that accepts strings ending with “man”– Let (Lambda) be the entire alphabet of acceptable characters.

In this example, a-z and A-Z• There is an error in your books representation

0 1 2 3m a n

-m

-m

-a

-n

mm

0 1 2 3m a n

NFA

DFA

Page 8: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

8

Nondeterministic AutomataExample

• Consider the language that accepts zero or more occurrences of the sub-strings:– {ab} or {aba}

– L = [ab|aba]* (regular expression)

0 a 1 2 3

4

b a

a

bba

b

ba

0 1

2

a

b

baOR

0 1

2

a

bae

DFA

NFA’s

Page 9: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

9

Deterministic and Nondeterministic Automata

• Deterministic Automata are easy to code because all possible transitions are accounted for– From every possible state - every possible

input must be accounted for

– Makes state machine tough to construct

• Nondeterministic automata are simplier to construct, however they can not be directly coded due to the non-determinism– Not every possible input needs to be

accounted for at every possible state

– Input not accepted if a transition out of a state is not defined

– May use e-moves to move to new states in the absence of input

• Nondeterministic automata can be converted to deterministic automata by using the subset construction method

Page 10: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

10

Subset Construction

• Elimination of the nondeterminisim from an automata

• Given a nondeterministic automata:– Build a new starting state by expanding e-moves

– Start at the starting state

– Build new states based on allowable transitions out of the starting state• Treat new states as sets of states from the original

NFA

– Take the new states (from above) and build more new states by considering the allowable transitions out of the state that you started with

– Continue this process until no new states are developed

– Any state ( which is a set of states from the original NFA) that contains an accepting state in the NFA is also an accepting state

– Construct the resultant DFA

Page 11: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

11

Subset Construction (Example) - No e-moves

• Consider the NFA that accepts as input all strings that end with the substring “man”– Begin at the starting state (state 0): {0}

• From state zero we stay at state 0 for any letter other than ‘m’ ({0},-m,{0})

– We already have state 0

• We go to state 0 or state 1 with the letter ‘m’ ({0},m,{0,1})

– We create the new state {0,1}

– From state {0,1}:• If we get an ‘a’ we go to state 2 (from state 1) or state 0

(from state 0)– ({0,1}, a, {0,2}) - A new state

• If we get an ‘m’; we go to state 0 or state 1 (from state 0) nowhere to go from state 1

– ({0,1}, m, {0,1}) - Already have {0,1}

0 1 2 3m a n

Page 12: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

12

Subset Construction (Example)

• Subset construction example continued– From state {0,1} (con’t)

• Anything besides an ‘a’ or ‘m’ - go to state 0 (from state 0) nowhere to go from state 1

– ({0,1}, -a-m,{0}) - Already have {0}

– From state {0,2}• If we get an ‘n’ we go to state 3 (from state 2) or

state 0 (from state 0)– ({0,2}, n, {0,3}) - A new state

• If we get an ‘m’; we go to state 0 or state 1 (from state 0) nowhere to go from state 2

– ({0,2}, m, {0,1}) - Already have {0,1}

• Anything besides an ‘n’ or ‘m’ we go to state 0 (from state 0) nowhere to go from state 2

– ({0,2}, -n-m, {0}) - Already have {0}

0 1 2 3m a n

Page 13: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

13

Subset Construction (Example)

• Subset construction example continued– From state {0,3}

• If we get an ‘m’ we go to state 0 or state 1 (from state 0) nowhere to go from state 3

– ({0,3}, m, {0,1}) - Already have {0,1}

• Anything besides an or ‘m’ we go to state 0 (from state 0) nowhere to go from state 3

– ({0,3}, -m, {0}) - Already have {0}

• There are no new state

• Recap on the set of found states– {0}, {0,1}, {0,2}, {0,3}

• State {0} is the starting state

• State {0,3} is the only accepting state– It is the only state containing an accepting state from

the original NFA

0 1 2 3m a n

Page 14: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

14

Subset Construction (Example)

• Now lets recall the transitions that we discovered– ({0},-m,{0}), ({0},m,{0,1}),

({0,1}, a, {0,2}), ({0,1}, m, {0,1}), ({0,1}, -a-m,{0}), ({0,2}, n, {0,3}), ({0,2}, m, {0,1}) , ({0,2}, -n-m, {0}) , ({0,3}, m, {0,1}) , ({0,3}, -m, {0})

{0} {0,1} {0,2} {0,3}

-m

m a n

m

-m-a

m

-m-n

m

-m

Page 15: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

15

Subset Construction (Example) NFA with e-moves

• Construct the DFA from the NFA

• Notice how the language only consists of two characters {a,b}

• Step 1: Build new start state by expanding the e-moves– From state 0 we can reach states 1,2,3 by e-

moves

– Thus the new “logical” start state is{0,1,2,3}

– State {0,1,2,3}:• Input ‘a’: get to states {0,1,2,3,4} New state

• Input ‘b’ get to states {2,3,4} New state

0 2 4

1 3a

a a

bb

e

e

ee

Page 16: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

16

Subset Construction (Example) NFA with e-moves

• Construct the DFA from the NFA (con’t)– State {0,1,2,3,4}

• Input ‘a’: get to state {0,1,2,3,4} Already known

• Input ‘b’: get to state {2,3,4} Already known

– State {2,3,4}• Input ‘a’: get to state {3,4} New state

• Input ‘b’: get to state {3,4} Just discovered

– State {3,4}• Input ‘a’: get to state {3,4} Already known

• Input ‘b’: get to state {} New state

– State {}• Input ‘a’ or ‘b’ get to state {}Already known

0 2 4

1 3a

a a

bb

e

e

ee

Page 17: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

17

Subset Construction (Example) NFA with e-moves

• States– {0,1,2,3}, {0,1,2,3,4}, {2,3,4}, {3,4}, {}

• Start State: {0,1,2,3} obtained by traversing the e-moves from the start state in the NFA

• Accepting states: {0,1,2,3,4}, {2,3,4}, {3,4} because they all have state 4 which was an accepting state in the NFA

• Construct the DFA using the states and discovered transitions

{0,1,2,3}

{2,3,4}

{0,1,2,3,4}

{3,4} {}

b

a

b

a

a

b

a

a

b

b

Page 18: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

18

Regular Expressions

• An automation graphically defines a pattern

• A regular expression algebraically defines a pattern

• A regular expression consists of a sequence of atomic operands– A character

– The empty string character, e

– The empty set character, – A variable that can be defined with any

regular expression

• A regular expression represents a set of strings that are often called a language– Language of atomic operands:

• L(x) = {x}

• L(e) = {e}

• L() = {}

Page 19: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

19

Regular Expression Operators

• Union– Denoted by ‘|’

– If R and S are regular expressions then R|S denotes the union of the R and S languages

• L(R|S) = L(R) L(S)

• Concatenation– No special symbol for concatenation operation

– If R and S are regular expressions then RS denotes the concatenation of language S onto the back of language R

• L(RS) = L(R)L(S)

• Closure– Denoted by ‘*’ - Kleene closure or closure

– If R is a regular expression then R* indicates zero or more occurrences of language R

• L(R*) = L(R) L(R)L(R) L(R)L(R)L(R) L(R)L(R)L(R)L(R) ... = |R|RR|RRR|...

Page 20: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

20

Regular Expression Examples

• Order of precedence of regular expressions– Kleene star

– Concatenation

– Union

• Examples– [a|b] = {a,b}

– [ab] = {ab}

– [a|ab] = {a,ab}

– [c|bc] = {c,bc}

– [a|ab][c|bc] = {ac,abc,abbc} omit 2nd {abc}

– [a*] = {e,a,aa,aaa,aaaa,aaaaa,...}

– [a|b]* = {e, a, b, aa, ab, ba, bb, ...}

– [a|bc*d] = {a,bd,bcd,bccd,bcccd,bccccd,...}

• Simplify by using precidance– [a|bc*d] = [a|b(c*)d] = [a|(b(c*))d]=

[a|((b(c*))d)] = [(a|((b(c*))d))]

Page 21: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

21

Regular Expression Example

• Build a regular expression for programming language identifiers– Begins with a character

– Followed by any number of characters, integers or the underscore (‘_’) character

• Solution– letter = [a|b|...|y|z|A|B|...|Y|Z]

– integer = [0|1|2|3|4|5|6|7|8|9]

– underscore = [ _ ]

– identifier [letter|(letter|integer|underscore)*]

• Construct a regular expression for a signed integer– signed integer = [(+|-|e)(0|1|2|3|4|5|6|7|8|9)+]

– The ‘+’ is usually used to indicate one or more occurrences. The ‘+’ is only a simplification:

• [a+] = [aa*]

• [(0|1|...|8|9)+] =[(0|1|...|8|9) (0|1|...|8|9)*]

Page 22: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

22

Converting A Regular Expression into an NFA

• There are 3 simple rule for converting a regular expression into an NFA

• NFA can then be converted into a DFA using the subset construction method

Union: R1|R2 R1 R2

R1

R2

e

e e

e

Concatenation: R1R2 R1 R2

R1 R2e e e

Closure: R1* R1

R1e e

e

e

Page 23: MCS680: Foundations Of  Computer Science

Brian Mitchell ([email protected]) - Drexel University MCS680-FCS

23

Example: Converting A Regular Expression into an NFA

• Convert (ab|aab)* to an NFA– This is ((ab)|(aab))* = ((ab)|((aa)b))*

• Concatenation has higher precedence over union

Step 1: Handle the concatenation a e b

a e a e b

ab

aab

Step 2: Handle the uniona e b

a e a e b

ab|aab e

e

e

e

Step 3: Handle the closure

a e b

a e a e b

(ab|aab )*e

e

e

e

ee

e

e