of 27 /27
String Matching String matching: definition of the problem (text,pattern)depends on what we have: text or patterns Exact matching: Approximate matching: 1 pattern ---> The algorithm depends on |p| and || k patterns ---> The algorithm depends on k, |p| and || The text ----> Data structure for the text (suffix tree, ...) The patterns ---> Data structures for the patterns Dynamic programming Sequence alignment (pairwise and multiple) Extensions Regular Expressions Probabilistic search: Sequence assembly: hash algorithm Hidden Markov Models

# String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching:

• View
228

2

Embed Size (px)

### Text of String Matching String matching: definition of the problem (text,pattern) depends on what we have:...

• Slide 1
• String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching: 1 pattern ---> The algorithm depends on |p| and | | k patterns ---> The algorithm depends on k, |p| and | | The text ----> Data structure for the text (suffix tree,...) The patterns ---> Data structures for the patterns Dynamic programming Sequence alignment (pairwise and multiple) Extensions Regular Expressions Probabilistic search: Sequence assembly: hash algorithm Hidden Markov Models
• Slide 2
• Approximate string matching For instance, given the sequence CTACTACTTACGTGACTAATACTGATCGTAGCTAC search for the pattern ACTGA allowing one error but what is the meaning of one error?
• Slide 3
• Edit distance We accept three types of errors: The edit distance d between two strings is the minimum number of substitutions,insertions and deletions needed to transform the first string into the second one d(ACT,ACT)= d(ACT,AC)=d(ACT,C)= d(ACT,)= d(AC,ATC)= d(ACTTG,ATCTG)= 3. Deletion: ACCGTGAT ACCGGAT 2. Insertion: ACCGTGAT ACCGATGAT 1. Mismatch: ACCGTGAT ACCGAGAT Indel
• Slide 4
• Edit distance We accept three types of errors: The edit distance d between two strings is the minimum number of substitutions,insertions and deletions needed to transform the first string into the second one 3. Deletion: ACCGTGAT ACCGGAT 2. Insertion: ACCGTGAT ACCGATGAT 1. Mismatch: ACCGTGAT ACCGAGAT d(ACT,ACT)= d(ACT,AC)=d(ACT,C)= d(ACT,)= d(AC,ATC)= d(ACTTG,ATCTG)= Indel 012 312
• Slide 5
• Edit distance and alignment of strings ACT and ACT : ACT ACT ACTTG and ATCTG: ACT and AT: ACT A -T ACTTG ATCTG ACT - TG A - TCTG Given d(ACT,ACT)=0 d(ACT,AC)=1 d(ACTTG,ATCTG)=2 which is the best alignment in every case? The Edit distance is related with the best alignment of strings Then, the alignment suggest the substitutions, insertions and deletions to transform one string into the other
• Slide 6
• Edit distance and alignment of strings But which is the distance between the strings ACGCTATGCTATACG and ACGGTAGTGACGC? and the best alignment between them? 1966 was the first time this problem was discussed and the algorithm was proposed in 1968,1970, using the technique called Dynamic programming
• Slide 7
• Edit distance and alignment of strings C T A C T A C T A C G T A C T G A
• Slide 8
• Edit distance and alignment of strings C T A C T A C T A C G T A C T G A
• Slide 9
• Edit distance and alignment of strings C T A C T A C T A C G T A C T G A The cell contains the distance between AC and CTACT.
• Slide 10
• Edit distance and alignment of strings C T A C T A C T A C G T A C T G A ?
• Slide 11
• Edit distance and alignment of strings C T A C T A C T A C G T 0 A C T G A ?
• Slide 12
• Edit distance and alignment of strings C T A C T A C T A C G T 0 1 A C T G A - C ?
• Slide 13
• Edit distance and alignment of strings C T A C T A C T A C G T 0 1 2 A C T G A - - CT ?
• Slide 14
• Edit distance and alignment of strings C T A C T A C T A C G T 0 1 2 3 4 5 6 7 8 A C T G A - - - - - - CTACTA
• Slide 15
• Edit distance and alignment of strings C T A C T A C T A C G T 0 1 2 3 4 5 6 7 8 A ? C ? T ? G A
• Slide 16
• Edit distance and alignment of strings C T A C T A C T A C G T 0 1 2 3 4 5 6 7 8 A 1 C 2 T 3 G A ACT - - -
• Slide 17
• C T A C T A C T A C G T 0 1 2 3 4 5 6 7 8 A 1 C 2 T 3 G A C T A C T A C T A C G T A C T G A Edit distance and alignment of strings BA(AC,CTA) - C BA(A,CTA) CCCC BA(A,CTAC) C - BA(AC,CTAC)= best d(AC,CTAC)=min d(AC,CTA)+1 d(A,CTA) d(A,CTAC)+1
• Slide 18
• C T A C T A C T A C G T A C T G A Edit distance and alignment of strings d(A,CTAC)+1 d(AC,CTACT)=minimum d(A,CTA) ..+1 d(AC,CTA)+1 C T A C T A C T A C G T 0 1 2 3 4 5 6 7 8 A 1 C 2 T 3 G A
• Slide 19
• Edit distance and alignment of strings Connect to http://alggen.lsi.upc.es/docencia/ember/leed/Tfc1.htm and use the global method.
• Slide 20
• Edit distance and alignment of strings How this algorithm can be applied to the approximate search? to the K-approximate string searching?
• Slide 21
• K-approximate string searching C T A C T A C T A C G T A C T G G T G A A A C T G A This cell
• Slide 22
• K-approximate string searching C T A C T A C T A C G T A C T G G T G A A A C T G A This cell gives the distance between (ACTGA, CTGTA) but we only are interested in the last characters
• Slide 23
• K-approximate string searching C T A C T A C T A C G T A C T G G T G A A A C T G A This cell gives the distance between (ACTGA, CTGTA) but we only are interested in the last characters
• Slide 24
• K-approximate string searching * * * * * * C T A C G T A C T G G T G A A A C T G A This cell gives the distance between (ACTGA, CTGTA) but we only are interested in the last characters no matter where they appears in the text, then
• Slide 25
• K-approximate string searching * * * * * * C T A C G T A C T G G T G A A 0 A C T G A This cell gives the distance between (ACTGA, CTGTA) but we only are interested in the last characters no matter where they appears in the text, then
• Slide 26
• K-approximate string searching * * * * * * C T A C G T A C T G G T G A A 0 A C T G A This cell gives the distance between (ACTGA, CTGTA) but we only are interested in the last characters no matter where they appears in the text, then
• Slide 27
• K-approximate string searching C T A C T A C T A C G T A C T G G T G A A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A C T G A This cell gives the distance between (ACTGA, CTGTA) but we only are interested in the last characters no matter where they appears in the text, then
• Slide 28
• K-approximate string searching Connect to http://alggen.lsi.upc.es/docencia/ember/leed/Tfc1.htm and use the semi-global method. ##### String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report
Documents ##### String Matching Algorithms - cs.auckland.ac.nz · OutlineString matchingNa veAutomatonRabin-KarpKMPBoyer-MooreOthers String Matching (Searching) String matchingorsearchingalgorithms
Documents ##### Łódź, 2008 Intelligent Text Processing lecture 1 Intro to string matching Szymon Grabowski [email protected]
Documents ##### String matching algorithms - deepak garg · 2012-06-11 · String matching algorithms . Deliverables String Basics Naïve String matching Algorithm Boyer Moore Algorithm Rabin-Karp
Documents ##### Lecture 16: String Matching - home.cse.ust.hkdekai/271/notes/L16/L16.pdf · String Matching Problem and Terminology Given a text array and a pattern array such that the elements of
Documents ##### String MatchingString Matching • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report all starting positions
Documents ##### String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/2017-2018/String-Matching-dengan-Regex...String Matching dengan Regular Expression Masayu Leylia
Documents ##### Text Algorithms (4AP) Lecture: Approximate matching Lecture Approx... · Text Algorithms (4AP) Lecture: Approximate matching ... • In exact search we searched for a string or set
Documents ##### Speeding Up String Pattern Matching by Text Compression ...ayumi/papers/IPSJ40.pdf · Vol.42 No.3 IPSJJournal Mar. 2001 IPSJ40thAnniversaryAwardPaper Speeding Up String Pattern Matching
Documents ##### Yangjun Chen 1 String Matching String matching problem - prefix - suffix - automata - String-matching automata - prefix function - Knuth-Morris-Pratt algorithm
Documents ##### String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the
Documents ##### TUNING THE ZHU-TAKAOKA STRING MATCHING ALGORITHM … · 2010-03-26 · String matching is finding an occurrence of a pattern string in a larger string of text. This problem arises
Documents ##### VOLUME I: FORWARD STRING MATCHING - The Prague … ·  · 2006-02-23VOLUME I: FORWARD STRING MATCHING Bo rivoj Melichar, Jan Holub, Tom a s Polcar November 2005. Preface Text is
Documents Documents