12
CSC 212 – Data Structures Lecture 34: Strings and Pattern Matching

CSC 212 – Data Structures Lecture 34: Strings and Pattern Matching

  • View
    221

  • Download
    2

Embed Size (px)

Citation preview

CSC 212 –Data Structures

Lecture 34:

Strings and Pattern Matching

Problem of the Day

You drive a bus from Rotterdam to Delft. At the 1st stop, 33 people get in. At the 2nd stop, 7 more people get in, and 11 passengers leave. The 3rd stop, sees 5 people leave and 2 get in. After one hour, the bus arrives in Delft. What is the name of the driver?

Read the question: You are the driver!

Strings

Algorithmically, String is just sequence of concatenated data:“CSC212 STUDENTS IN DA HOUSE”“I can’t believe this is a String!”Java programsHTML documentsDigitized imageDNA sequences

Strings In Java

Java Strings are immutable Java maintains a Map of text to String objects

Each time String created, Map is checked If text exists, Java uses the String object to which it

is mapped Otherwise, makes a new String & adds text and

object to Map

Happens “under the hood” Make String work like a primitive type Also makes it cheap to do lots of text processing

String Terminology

String drawn from elements in an alphabet ASCII or Unicode Bits Pixels DNA bases

Substring P[i ... j] contains characters from P[i] through P[j]

Substrings starting at rank 0 called a prefix Substrings ending with string’s last rank is suffix

Suffixes and Prefixes

“I am the Lizard King!”Prefixes Suffixes

II I a

I am…I am the Lizard KinI am the Lizard King I am the Lizard King!

!g!ng!ing!…am the Lizard King!

am the Lizard King!

I am the Lizard King!

Pattern Matching Problem

Given strings T & P, find first substring of T matching PT is the “text”P is the “pattern”

Has many, many, many applicationsSearch enginesDatabase queriesBiological research

Brute-Force Approach

Common method of solving problems Easy to develop

Often requires little codingNeeds little brain power to figure out

Uses computer’s speed for analysisExamines every possible optionPainfully slow and use lots of memoryGenerally good only with small problems

Brute-Force Pattern Matching

Compare P with every substrings in T, until find substring of T equal to P -or-reject all possible substrings of T

If |P| = m and |T| = n, takes O(nm) time Worst-case:

T aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaP aaagCommon case for images & DNA data

Brute-Force Pattern Matching

Algorithm BruteForceMatch(String T, String P)// Check if each rank of T starts a matching substringfor i 0 to T.length() – P.length()

// Compare substring starting at T[i] with Pj 0while j < P.length() && T.charAt(i + j) == P.charAt(j)

j j + 1

if j == P.length()return i // Return 1st place in T we find P

return -1 // No matching substring exists

Your Turn

Get back into groups and do activity

Before Next Lecture…

Keep up with your reading!Cannot stress this enough

Get ready for Lab Mastery Exam Start thinking about questions for Final