6
cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans Menu Today: Preparing for Exam 1 Language class for Deterministic PDAs Applications of DFAs Thursday: Exam Review (if you send questions and/or topics) Applications of probabilistic DFAs and Grammars Exam 1 In class, next Tuesday, 2 March Covers: Classes 1-9 (10 and 11) Sipser Ch 0-2 Problem Sets 1-3 + Comments Exam 1 Note: unlike nearly all other sets we draw in this class, all of these sets are finite, and the size (roughly) represents the relative size. What’s on the Exam? Definitions Language, problem, sets Constructing and understanding computing models Finite automata (DFA, NFA) Pushdown automata (DPDA, NPDA) Grammars (Context-Free Grammar) Language Classes: Regular and Context Free Show a language is in the class Show a language is not in the class Prove or disprove a closure property Proof Methods Proof by Induction Proof by Construction Understand and use the pumping lemmas for RL and CFL Sample exam on website should give you a good idea what to expect Your exam will probably also have “what’s wrong with this proof” questions Exam 1 Notesheet For Exam 1, you may use only: Your own brain and body A low-tech writing instrument (pen or pencil) A single page (both sides) of notes that you create You may work with others to create your notes page. Admiral Grace Hopper John von Neumann Albert Einstein

Exam 1 What’s on the Exam? - University of Virginia ... · PDF file– Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides)

Embed Size (px)

Citation preview

Page 1: Exam 1 What’s on the Exam? - University of Virginia ... · PDF file– Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides)

cs3102: Theory of Computation

Class 10:

DFAs in Practice

Spring 2010

University of Virginia

David Evans

Menu

• Today:

– Preparing for Exam 1

– Language class for Deterministic PDAs

– Applications of DFAs

• Thursday:

– Exam Review (if you send questions and/or topics)

– Applications of probabilistic DFAs and Grammars

Exam 1

• In class, next Tuesday, 2 March

• Covers:

Classes 1-9

(10 and 11)Sipser Ch 0-2

Problem Sets 1-3 + Comments

Exam 1

Note: unlike nearly all other sets we draw in this class, all of these sets are

finite, and the size (roughly) represents the relative size.

What’s on the Exam?

Definitions

Language, problem, sets

Constructing and understanding computing models

Finite automata (DFA, NFA)

Pushdown automata (DPDA, NPDA)

Grammars (Context-Free Grammar)

Language Classes: Regular and Context Free

Show a language is in the class

Show a language is not in the class

Prove or disprove a closure property

Proof Methods

Proof by Induction

Proof by Construction

Understand and use the pumping lemmas for RL and CFL

Sample exam on website

should give you a good

idea what to expect

Your exam will probably also have “what’s

wrong with this proof” questions

Exam 1 Notesheet

For Exam 1, you may use only:

– Your own brain and body

– A low-tech writing instrument (pen or pencil)

– A single page (both sides) of notes that you create

You may work with others to create your notes page.

Admiral Grace Hopper

John von Neumann

Albert Einstein

Page 2: Exam 1 What’s on the Exam? - University of Virginia ... · PDF file– Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides)

Exam Help Available

• Office Hours:

– Thursdays, 8:30-9:30am

– Thursdays, after class

– Fridays, 10-11:30am (Sonali in Stacks)

– Mondays, 1:15-3pm

• TA’s Exam Review Session

– This Sunday, 5-6:30pm, Olsson 228EAll Languages

Regular

Languages

(DFA, NFA, RE, RG)

Finite

Languages

Context-Free

(CFG or NPDA)

w

an

anbncn

ww

Where are the languages recognized by a Deterministic PDA?

Proving Set Equivalence

A = B ⇔ A ⊆ B and B ⊇ A

Sets A and B are equivalent if A is a subset

of B and B is a subset of A.

BA

A ⊆ B B ⊇ A

Proving Formalism Equivalence

Proving Formalism Equivalence Proving Formalism Non-Equivalence

Page 3: Exam 1 What’s on the Exam? - University of Virginia ... · PDF file– Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides)

All Languages

Regular

Languages

(DFA, NFA, RE, RG)

Context-Free

(CFG or NPDA)

Which of these could be true?

anbn

Regular

Languages(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

Regular

Languages(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

How can we distinguish these two plausible possibilities?

Regular

Languages(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

Regular

Languages(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

How can we distinguish these two plausible possibilities?

Find some language A that can

be recognized by some NPDA

but not by any DPDA.

A

Prove by construction: for any

NPDA, there is a DPDA that

recognizes the same language.

ε, ε→$

a, ε→+

b, +→εε, $ → ε

b, +→ε

b, ε→ε

ε, $ → ε

Proof by contradiction:

Assume there is a DPDA

that recognizes A. Show

how to construct a NPDA

that recognizes some

language we know is not

context free.

Proved by construction:

We showed an NPDA that

recognizes A.

Page 4: Exam 1 What’s on the Exam? - University of Virginia ... · PDF file– Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides)

Proof by contradiction. Suppose there is a DPDA M that recognizes A.

It must be in an accept state only after processing aibi and aib2i.

2i transitions, consuming 0i1i

i transitions, consuming 1i

Construct M’: copy all the states on the second half, replacing b with c:

… …

What is the language of M’?

Proof by contradiction. Suppose there is a DPDA M that recognizes A.

It must be in an accept state only after processing aibi and aib2i.

… …

Construct M’: copy all the states on the second half, replacing b with c:

… …

Not a Context-Free

Language!

We have a contradiction: if A is in L(DPDA), we could use the DPDA that

recognizes A to construct an DPDA that recognizes a non-context-free

language! Hence, A must not be in L(DPDA).

All Languages

Regular

Languages

(DFA, NFA, RE, RG)

Context-Free

(CFG or NPDA)

anbn

A

Deterministic Context-Free Languages

Recognized by a DPDA (or DCFG)

Context-Free Languages Deterministic

Context-Free LanguagesRegular Languages

DFAs in Practice

Malware

Scanner

W32.Bolzano.Gen:

576a222bd2c20400558b4c240cd9ffff

07fbffffff{0-2}5c4e544c445200{0-2}

5c57494e4e545c73797374656d

33325c6e746f736b726e6c2e657

86500{0-29}3b4658

W32.MyLife.E:

7a6172793230*40656d

61696c2e636f6d

Note: These are the signatures from ClamAV, an open source virus scanner.

Files

Network

Traffic

String Matching

q0 q1 q2 q3 q4 q5

t r u t h

We hold these truths to be self-evident, that …

How much work is it to scan a string of length N for a signature?

Page 5: Exam 1 What’s on the Exam? - University of Virginia ... · PDF file– Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides)

Faster String Matching

q0 q1 q2 q3 q4 q5

t r u t h

We hold these truths to be self-evident, that …

s[4] = h?

s[10] = h?

truth

truth

s[9] = t?s[8] = u?

truth

truthtruth

Skip table:

a, b, c, d, e, f, g, i, j, k, l, m, n, o, p, q,

r, s, v, w, x, y, z: 6

h: 0

r: 4

t: 1

u: 2

DFA / Skipping DFA

Is a “Skipping DFA” still a DFA?

(That is, does it still only accept the

Regular Languages?)

J. Strother Moore

(UT Austin)

Boyer-Moore Fast

String Searching

Algorithm (1977)

Best case: N/(w+1) comparisons

where N is the length of the text

and w is the length of the search

string

Is this fast enough for a malware scanner?

Virus Detection

Total number of signatures: 720,033

2

4

6

8

10

12

11/01 05/02 12/02 06/03 01/04 08/04 02/05 09/05 03/06

Siz

e (

MB

)

Symantec

RAV AV

Nate Paul’s study

Can we scan one

input for many

possible malware

signatures quickly?

Combining DFAs?

Regular languages closed under union:

q0

qA0

qB0

qA1

qB1

ε

ε

a

a

How many states are there now?

Signatures

First byte: Set of signatures:

00000000 ~720000/256

00000001 ~720000/256

00000010 ~720000/256

11111111 ~720000/256

Page 6: Exam 1 What’s on the Exam? - University of Virginia ... · PDF file– Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides)

Try a Trie

q0

q00

q01

q02

qFF

0x02

q0000

q0001

q0002

q01FF

0x02

720000/(256*256) ~ 11

Alfred V. Aho and Margaret J. Corasick, 1975

q0000Alure

ona

0x02

Scanner Demo

http://www.virustotal.com

Evasive Malware

Metamorphic Code: as virus

propagates, each new copy is

different

How hard is it to automatically

modify code without changing

its behavior?

Detecting Evasive Malware

• Less exact signatures

(e.g., W32.MyLife.E:

7a6172793230*40656d61696c2e636f6d)– Dangerous – start matching benign programs if you’re

not careful!

• Behavioral signatures: match the behavior, not the program text– Undecidable in general (we’ll see in a few weeks)

– Expensive and difficult in practice (but done by all decent scanners)

Faster String Scanning Charge

• We focus on DFAs, NFAs, PDAs, CFGs, etc. as

abstract models: Number of states, time to

process, etc. don’t matter

• Lots of real applications of these models: but

in practice, what matters is different

If you have topics you want me to review,

post comments (on today’s class

announcement) by 5pm tomorrow.