34
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 4: Dynamic Programming, Trees http://www.cs.virginia.edu/cs216

CS216: Program and Data Representation University of Virginia Computer Science

  • Upload
    jace

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans. Lecture 4: Dynamic Programming,Trees. http://www.cs.virginia.edu/cs216. Schedule This Week. Sections today and Tuesday Thornton D rooms Wednesday, Feb 1: Go to Ron Rivest’s talk - PowerPoint PPT Presentation

Citation preview

Page 1: CS216: Program and Data Representation University of Virginia Computer Science

CS216: Program and Data RepresentationUniversity of Virginia Computer Science

Spring 2006 David EvansLe

cture

4:

Dynam

ic

Pro

gra

mm

ing,

Tre

es

http://www.cs.virginia.edu/cs216

Page 2: CS216: Program and Data Representation University of Virginia Computer Science

2UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Schedule This Week• Sections today and Tuesday

– Thornton D rooms

• Wednesday, Feb 1:– Go to Ron Rivest’s talk– 10:30am-noon, Newcomb Hall South

Meeting Room

• Thursday, Feb 2:– Portman Wills, “Saving the World with a

Computer Science Degree”– 3:30pm, MEC 205 (CS290/CS390 +)

Page 3: CS216: Program and Data Representation University of Virginia Computer Science

3UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

RSA(Rivest, Shamir, Adelman 1978)

• Public-Key Encryption– Allows parties who have not established

a shared secret to communicate securely

– Your web browser used it when you see

• Most important algorithm invented in past 100 years

• Link from notes

Page 4: CS216: Program and Data Representation University of Virginia Computer Science

4UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Wednesday’s Class

• Ron Rivest, “Security of Voting Systems”

• Newcomb Hall South Meeting Room, 10:30am-noon

• This replaces CS216 class (so you don’t have to walk out at 11am)

Page 5: CS216: Program and Data Representation University of Virginia Computer Science

5UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Sequence Alignment

• Brute force algorithm in PS1bestAlignment (U, V) =

base case if |U| == 0 or |V| == 0 otherwise f (bestAlignment (U[1:], V),

bestAlignment (U, V[1:]), bestAlignment (U[1:], V[1:])

• Compare to Fibonacci:fibonacci (n) =

base case if n == 0 or n == 1 g (fibonacci (n-1), fibonacci (n-2))

Runningtime (n) = 1.618…

Page 6: CS216: Program and Data Representation University of Virginia Computer Science

6UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Sequence Alignment• Input size = n = |U| + |V|

bestAlignment (U, V) = base case if |U| == 0 or |V| == 0

otherwise f (bestAlignment (U[1:], V), size = n-1

bestAlignment (U, V[1:]), size = n-1

bestAlignment (U[1:], V[1:]) size = n-2

a(n) = a(n-1) + a(n-1) + a(n-2)> a(n-1) + a(n-2) (n)

Running time of bestAlignment (n)

Page 7: CS216: Program and Data Representation University of Virginia Computer Science

7UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Growth of Best Alignment

1

10

100

1000

10000

100000

1000000

10000000

100000000

1000000000

10000000000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

3n

2n

n

f(n) = 2f(n-1) + f(n-2)

Page 8: CS216: Program and Data Representation University of Virginia Computer Science

8UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

BLAST

http://www.ncbi.nlm.nih.gov/BLAST/GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCTGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAAGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAGATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACTCCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCTGGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGGCCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGCCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCTDinosaur DNA from Jurassic Park (p. 103)

Basic Local Alignment Search Tool

Page 9: CS216: Program and Data Representation University of Virginia Computer Science

9UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Page 10: CS216: Program and Data Representation University of Virginia Computer Science

10UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

|U| = 1200|V| > 16.5 Billion

Page 11: CS216: Program and Data Representation University of Virginia Computer Science

11UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Needleman-Wunsch Algorithm

• Avoid effort duplication by storing intermediate results

• Computing a matrix showing the best possible score up to each pair of positions

• Find the alignment by following a path in that matrix

Page 12: CS216: Program and Data Representation University of Virginia Computer Science

12UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

N-W Initialization

- C A T G

- 0

A

T

G

G

Sequ

ence

U

Sequence V

Page 13: CS216: Program and Data Representation University of Virginia Computer Science

13UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Filling Matrix

- C A T G

- 0

A

T

G

G

Sequ

ence

U

Sequence V

For each square consider: Gap in U score = score[] - g Gap in V score = score[] - g No gap match: score = score[] + c

no match: score = score[]

Page 14: CS216: Program and Data Representation University of Virginia Computer Science

14UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Filling Matrix

- C A T G

- 0

A

T

G

G

score = score[] - g score = score[] - g match: score = score[] + c no match: score = score[]

-2 -4 -6 -8CATG----

-2

-4

-6

-8

0 8 6 4 CATG-A--

Page 15: CS216: Program and Data Representation University of Virginia Computer Science

15UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Finished Matrix

- C A T G

- 0 -2 -4 -6 -8

A -2 0 8 6 4

T -4 -2 6 18 16

G -6 -4 4 16 28

G -8 -6 2 14 26

Page 16: CS216: Program and Data Representation University of Virginia Computer Science

16UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Finding Alignment- C A T G

- 0 -2 -4 -6 -8

A -2 0 8 6 4

T -4 -2 6 18 16

G -6 -4 4 16 28

G -8 -6 2 14 26

Start in bottom right corner; follow arrows:gap V gap U no gap

GG

-G

TT

AA

C-

Page 17: CS216: Program and Data Representation University of Virginia Computer Science

17UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

N-W Correctness

• Informal argument:– Fills each cell by picking the best

possible choice– Finds the best possible path using the

filled in matrix

• Guaranteed to find the best possible alignment since all possibilities are considered

Page 18: CS216: Program and Data Representation University of Virginia Computer Science

18UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

N-W Analysis

• What is the space usage?

Need to store the matrix:= (|U| + 1) * (|V| + 1)

Page 19: CS216: Program and Data Representation University of Virginia Computer Science

19UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

N-W Running Time

• Time to fill matrix– Each square O(1)– Assumes:

• Lookups are O(1)– Time scales with number of cells:

(|U| |V|)• Time to find alignment

– One decision for each entry in answer

(|U| + |V|)• Total running time (|U| |V|)

score = score[] - g score = score[] - g match: score = score[] + c no match: score = score[]

Page 20: CS216: Program and Data Representation University of Virginia Computer Science

20UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

|U| = 1200|V| > 16.5 Billion

Good enough?

Page 21: CS216: Program and Data Representation University of Virginia Computer Science

21UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Heuristic Alignment

• BLAST needs to be faster• No asymptotically faster algorithm is

guaranteed to find best alignment– Only way to do better is reject some

alignments without considering them

• BLAST uses heuristics:– Looks for short sequences (~3 proteins = 9

nucleotides) that match well without gaps– Extend those sequences into longer sequences

Page 22: CS216: Program and Data Representation University of Virginia Computer Science

22UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

PS2• Part 1 (1-3): List representations• Part 2 (4-5): Dynamic programming

implementation of sequence alignment• Part 3 (6-10): Brute force implementation

of phylogeny– Like alignment, brute force doesn’t scale

beyond very small inputs– Unlike alignment, to do better we will need to

use heuristics (give up correctness!)• This will be part of PS3

PS2 is longer and harder than PS1.You have 10 days to complete it -

Get started early!

Page 23: CS216: Program and Data Representation University of Virginia Computer Science

23UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Data Structures• If we have a good list implementation, do

we need any other data structures?• For computing: no

– We can compute everything with just lists (actually even less). The underlying machine memory can be thought of as a list.

• For thinking: yes– Lists are a very limited way of thinking about

problems.

Page 24: CS216: Program and Data Representation University of Virginia Computer Science

24UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

List Limitations

L

Node

Info:

Next:

1

Node

Info:

Next:

2 Info:

Next:

3

NodeIn a list, every element has direct relationships with only two things: predecessor and successor

Page 25: CS216: Program and Data Representation University of Virginia Computer Science

25UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Complex Relationships

Bill Cheswick’sMap of the Internet

http://research.lumeta.com/ches/map/gallery/

Page 26: CS216: Program and Data Representation University of Virginia Computer Science

26UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

List Tree• List: each element has relationships

with up to 2 other elements:

• Binary Tree: each element has relationships with up to 3 other elements:

ElementPredecessor Successor

Element

Parent

Right ChildLeft Child

Page 27: CS216: Program and Data Representation University of Virginia Computer Science

27UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Language Phylogeny

EnglishGerman Anglo-Saxon

Norwegian

Czech

SpanishItalian

Romanian

French

From Lecture 1...

Page 28: CS216: Program and Data Representation University of Virginia Computer Science

28UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Tree Terms

EnglishGerman Anglo-Saxon

Norwegian

Czech

SpanishItalian

Romanian

French

Leaf

Root

Note that CS treesare usually drawnupside down!

Height = 3

Height = 0

Depth = 4

Page 29: CS216: Program and Data Representation University of Virginia Computer Science

29UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Tree Terms• Root: a node with no parent

– There can only be one root

• Leaf: a node with no children• Height of a Node: length of the longest

path from that node to a leaf• Depth of a Node: length of the path

from the Root to that node• Height of a Tree: maximum depth of a

node in that tree = height of the root

Page 30: CS216: Program and Data Representation University of Virginia Computer Science

30UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Perfect Binary Tree

All leaves have the same depth

How many leaves?

2hhHow

many nodes?

2h+1-1

# Nodes = 1 + 2 + 22 + ... + 2h

Page 31: CS216: Program and Data Representation University of Virginia Computer Science

31UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Tree Node Operations

• getLeftChild (Node)• getRightChild (Node)• getInfo (Node)

def isLeaf (self): return self.getLeftChild () == None and self.getRightChild () == None

Page 32: CS216: Program and Data Representation University of Virginia Computer Science

32UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Calculating Height

def height (self): if self.isLeaf (): return 0 else: return 1 + max(self.getLeftChild().height(),

self.getRightChild().height())

Height of a Node: length of the longest path from that node to a leaf

Page 33: CS216: Program and Data Representation University of Virginia Computer Science

33UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Analyzing Height

• What is the asymptotic running time or our height procedure?

def height (self): if self.isLeaf (): return 0 else: return 1 + max(self.getLeftChild().height(),

self.getRightChild().height())

Page 34: CS216: Program and Data Representation University of Virginia Computer Science

34UVa CS216 Spring 2006 - Lecture 4: Dynamic Programming, Trees

Charge• Today and tomorrow:

– Sections in Thorton D classrooms

• Wednesday:– Instead of CS216, go to Ron Rivest’s talk– 10:30am, Newcomb Hall South Meeting

Room

• Get started on PS2!