28
Regular Expressions CS 154, Omer Reingold

Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Regular Expressions

CS 154, Omer Reingold

Page 2: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Regular ExpressionsComputation as simple, logical description

A totally different way of thinking about computation:What is the complexity of describing the strings in the language?

Page 3: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Inductive Definition of Regexp

For all ∊ Σ, is a regexp

ε is a regexp

is a regexp

If R1 and R2 are both regexps, then

(R1R2), (R1 + R2), and (R1)* are regexps

Let Σ be an alphabet. We define the regular expressions over Σ inductively:

Page 4: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Precedence Order:

* then then +

· R2R1*(Example: R1*R2 + R3 = ( ) ) + R3

Page 5: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Definition: Regexps Represent Languages

The regexp ∊ Σ represents the language {}

The regexp ε represents {ε}

The regexp represents

If R1 and R2 are regular expressions representing L1 and L2 then:

(R1R2) represents L1 L2

(R1 + R2) represents L1 L2

(R1)* represents L1*

Page 6: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Regexps Represent Languages

For every regexp R, define L(R) to be the language that R represents

A string w ∊ Σ* is accepted by R(or, w matches R) if w ∊ L(R)

Examples: 0, 010, and 01010 match (01)*0

110101110100100 matches (0+1)*0

Page 7: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

{ w | w has exactly a single 1 }

0*10*

Assume Σ = {0,1}

{ w | w contains 001 }

(0+1)*001(0+1)*

Page 8: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

What language does the regexp * represent?

{ε}

Assume Σ = {0,1}

Page 9: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

{ w | w has length ≥ 3 and its 3rd symbol is 0 }

(0+1)(0+1)0(0+1)*

Assume Σ = {0,1}

Page 10: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

{ w | every odd position in w is a 1 }

(1(0 + 1))*(1 + ε)

Assume Σ = {0,1}

Page 11: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

1 + 0 + ε + 0(0+1)*0 + 1(0+1)*1

{ w | w has equal number ofoccurrences of 01 and 10}

= { w | w = 1, w = 0, or w = ε, or w starts with a 0 and ends with a 0, orw starts with a 1 and ends with a 1 }

Claim: A string w has equal occurrences of 01 and 10w starts and ends with the same bit.

Assume Σ = {0,1}

Page 12: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

L can be represented by some regexp L is regular

DFAs ≡NFAs ≡ Regular Expressions!

Page 13: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

L can be represented by some regexp L is regular

Page 14: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Given any regexp R, we will construct an NFA N s.t.N accepts exactly the strings accepted by R

Proof by induction on the length of the regexp R

L can be represented by some regexp L is regular

Base Cases (R has length 1):

R =

R = ε

R =

Page 15: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Induction Step: Suppose every regexp of length < krepresents some regular language.

Three possibilities for R:

R = R1 + R2

R = R1 R2

R = (R1)*

Consider a regexp R of length k > 1

Page 16: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Induction Step: Suppose every regexp of length < k represents some regular language.

Three possibilities for R:

R = R1 + R2

R = R1 R2

R = (R1)*

Consider a regexp R of length k > 1

By induction, R1 and R2 representsome regular languages, L1 and L2

But L(R) = L(R1 + R2) = L1 L2

so L(R) is regular, by the union theorem!

Page 17: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Induction Step: Suppose every regexp of length < k represents some regular language.

Three possibilities for R:

R = R1 + R2

R = R1 R2

R = (R1)*

Consider a regexp R of length k > 1

By induction, R1 and R2 represent

some regular languages, L1 and L2

But L(R) = L(R1·R2) = L1· L2

so L(R) is regular by the concatenationtheorem

Page 18: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Induction Step: Suppose every regexp of length < krepresents some regular language.

Three possibilities for R:

R = R1 + R2

R = R1 R2

R = (R1)*

Consider a regexp R of length k > 1

By induction, R1 and R2 represent

some regular languages, L1 and L2

But L(R) = L(R1*) = L1*so L(R) is regular, by the star theorem

Page 19: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Induction Step: Suppose every regexp of length < krepresents some regular language.

Three possibilities for R:

R = R1 + R2

R = R1 R2

R = (R1)*

Consider a regexp R of length k > 1

By induction, R1 and R2 represent

some regular languages, L1 and L2

But L(R) = L(R1*) = L1*so L(R) is regular, by the star theorem

Therefore: If L is represented by a regexp,then L is regular

Page 20: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Give an NFA that accepts the language represented by (1(0 + 1))*

1ε 0,1

ε

1(0+1)( )*Regular expression:

Page 21: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

L can be represented by a regexp

L is a regular language

Idea: Transform an NFA for L into a regular expression by removing states and

re-labeling the arcs with regular expressions

Generalized NFAs (GNFA)

Rather than reading in just letters from the string on a step, we can read in entire substrings

Page 22: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

This GNFA recognizes L(a*b(cb)*a)

Is aaabcbcba accepted or rejected?

Is bba accepted or rejected?

Is bcba accepted or rejected?

Generalized NFA (GNFA)

Page 23: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

NFA

Add unique start and accept states

Page 24: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

While the machine has more than 2 states:

Pick an internal state, rip it out and re-label the arrows with regexps,

to account for paths through the missing state

0

1

001*0

NFA

Page 25: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

In general:

R(q1,q2)

R(q2,q2)

R(q2,q3)R(q1,q2)R(q2,q2)*R(q2,q3) + R(q1,q3)

q1q2 q3

G

R(q1,q3)

While the machine has more than 2 states:

NFA

Page 26: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

q1

b

a

εq2

a,b

εa*b(a*b)(a+b)*q0 q3

R(q0,q3) = (a*b)(a+b)* represents L(N)

Page 27: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

DFAs NFAs

RegularLanguages

RegularExpressions

DEFINITION

Page 28: Regular Expressions · 9/11/2020  · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R

Parting thought:

Regular Languages can be defined by their

closure properties