Upload
kasimir-harvey
View
28
Download
3
Embed Size (px)
DESCRIPTION
Recuperació de la informació. Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002) Gonzalo Navarro and Mathieu Raffinot http://static.ppurl.com/chmview-V1JRYFF-BnMAZgFqD1NVOlZ0VzMMZgdqUDABMwI9BWc=/0001.html - PowerPoint PPT Presentation
Citation preview
Recuperació de la informació
•Modern Information Retrieval (1999)Ricardo-Baeza Yates and Berthier Ribeiro-Neto
•Flexible Pattern Matching in Strings (2002)Gonzalo Navarro and Mathieu Raffinot
http://static.ppurl.com/chmview-V1JRYFF-BnMAZgFqD1NVOlZ0VzMMZgdqUDABMwI9BWc=/0001.html
•Algorithms on strings (2001)M. Crochemore, C. Hancart and T. Lecroq
•http://www-igm.univ-mlv.fr/~lecroq/string/index.html
String Matching
String matching: definition of the problem (text,pattern) depends on what we have: text or patterns• Exact matching:
• Approximate matching:
• 1 pattern ---> The algorithm depends on |p| and || • k patterns ---> The algorithm depends on k, |p| and ||
• The text ----> Data structure for the text (suffix tree, ...)
• The patterns ---> Data structures for the patterns
• Dynamic programming • Sequence alignment (pairwise and multiple)
• Extensions • Regular Expressions
• Probabilistic search:
• Sequence assembly: hash algorithm
Hidden Markov Models
Extended string matching
• Classes of characters: when in some DNA files or patterns there are new characters as N or R that means N={A,C,G,T} and R={G,A}.
• Bounded length gaps: we find pattern as ATx(2,3)TA where x(2,3) means any 2 or 3 characters.
• Optional characters: we find pattern as AC?ACT?T?A where C? means that C may or may not appear in the text.
• Wild cards: we find pattern as AT*TA where * means an arbitrary long string.
• Repeatable characters: we find pattern as AT[TA]*AT where [TA]* means that TA can appear zero or more times..
Classes of characters
There are characters in the text that represent sets of simbols
1. Classes of characters in the tetx.
There are characters in the pattern that represent sets of simbols
2. Classes of characters in the pattern.
There are classes of characters represented by onesymbol. For instace the IUPAC code for the
DNA alphabet is:R = {G,A} Y = {T,C} K = {G,T} M = {A,C} S = {G,C} W = {A,T}
B = {G,T,C } D = {G,A,T} H = {A,C,T} V = {G,C,A} N = {A,G,C,T} (any)
Extended alphabets
First part
Classes in the text
Classes in the text: Brute force algorithm
Text : over 2|∑|
Pattern over From left to right: prefix
• Which is the next position of the window?
• How the comparison is made?
Pattern :
Text :
The window is shifted only one cell
We need the operation: belongs to a set ?
?
Classes in the text: Brute force algorithm
Every subset of is represented by a string of bits of length | |.
When || < computer word
For instance, given the DNA alphabet ={A,C,G,T} : I(A)=(1,0,0,0), I(C)=(0,1,0,0),... I(R)=I(G,A)=( , , , )
Classes in the text: Brute force algorithm
Every subset of is represented by a string of bits of length | |.
When || < computer word
For instance, given the DNA alphabet ={A,C,G,T} : I(A)=(1,0,0,0), I(C)=(0,1,0,0),... I(R)=I(G,A)=(1,0,1,0)...I(N)=( , , , )
Classes in the text: Brute force algorithm
Every subset of is represented by a string of bits of length | |.
When || < computer word
For instance, given the DNA alphabet ={A,C,G,T} : I(A)=(1,0,0,0), I(C)=(0,1,0,0),... I(R)=I(G,A)=(1,0,1,0)...I(N)=(1,1,1,1)
Then the operation “A belongs to set X” is made with ...
Classes in the text: Brute force algorithm
Every subset of is represented by a string of bits of length | |.
G T A R T R N A G G A ...A T G T A
A T G T A
When || < computer word
For instance, given the DNA alphabet ={A,C,G,T} : I(A)=(1,0,0,0), I(C)=(0,1,0,0),... I(R)=I(G,A)=(1,0,1,0)...I(N)=(1,1,1,1)
Then the operation “A belongs to set X” is made with I(A) and I(X) >0
I(A) & I(T)>0
Classes in the text: Brute force algorithm
Every subset of is represented by a string of bits of length | |.
G T A R T R N A G G A ...A T G T A
A T G T A
A T G T A
When || < computer word
For instance, given the DNA alphabet ={A,C,G,T} : I(A)=(1,0,0,0), I(C)=(0,1,0,0),... I(R)=I(G,A)=(1,0,1,0)...I(N)=(1,1,1,1)
Then the operation “A belongs to set X” is made with I(A) and I(X) >0
I(A) & I(T)>0
I(A) & I(R)>0 I(T) & I(T)>0 I(G) & I(R)>0 I(T) & I(A)>0
Classes in the text: Brute force algorithm
Every subset of is represented by a string of bits of length | |.
G T A R T R N A G G A ...A T G T A
A T G T A
A T G T A
When || < computer word
For instance, given the DNA alphabet ={A,C,G,T} : I(A)=(1,0,0,0), I(C)=(0,1,0,0),... I(R)=I(G,A)=(1,0,1,0)...I(N)=(1,1,1,1)
Then the operation “A belongs to set X” is made with I(A) and I(X) >0
I(A) & I(T)>0
I(A) & I(R)>0 I(T) & I(T)>0 I(G) & I(R)>0 I(T) & I(A)>0
I(A) & I(N)>0 I(T) & I(R)>0
...Which is the cost?
Classes in the text
Experimental efficiency (Navarro & Raffinot)
2 4 8 16 32 64 128 256
64
32
16
8
4
2
| |
Long. pattern
Horspool
BNDMBOM
BNDM : Backward Nondeterministic Dawg Matching
BOM : Backward Oracle Matching
w
Classes in the text: Horspool algorithm
Text :
Pattern :Sufix search
• Which is the next position of the window?
• How the comparison is made?
Pattern :
Text : a
Shift until the next ocurrence of “a” (or “t”,”r”,…) in the pattern:
aa a
a a a
We need a shift table with the extended alphabet.
Classes in the text :Horspool example
Given the pattern ATGTA
• The shift table is:
A 4C 5G 2T 1R ?…N ?
Classes in the text :Horspool example
Given the pattern ATGTA
• The shift table is:
A 4C 5G 2T 1R 2…N ?
Classes in the text :Horspool example
Given the pattern ATGTA
• The shift table is:
A 4C 5G 2T 1R 2…N 1
text : G T A R T R N A A G G A …A T G T A
A T G T A
A T G T A
Classes in the text :Horspool example
Given the pattern ATGTA
• The shift table is:
A 4C 5G 2T 1R 2…N 1
text : G T A R T R N A A G G A ...A T G T A
A T G T A
A T G T A A T G T A
…
Classes in the text
Experimental efficiency (Navarro & Raffinot)
2 4 8 16 32 64 128 256
64
32
16
8
4
2
| |
Long. pattern
Horspool
BNDMBOM
BNDM : Backward Nondeterministic Dawg Matching
BOM : Backward Oracle Matching
w
Text :
Pattern :
Search for suffixes of T that are factors of the pattern
Classes in the text: BNDM algorithm
• Which is the next position of the window ?
• How the comparison is made?
…that is denoted as
D2 = 1 0 0 0 1 0 0
Depends on the value of the leftmost bit of D
Once the next character x is read D3 = D2<<1 & B(x)
B(x): mask of x in the pattern P. For instance, if B(x) = ( 0 0 1 1 0 0 0)
D = (0 0 0 1 0 0 0) & (0 0 1 1 0 0 0 ) = (0 0 0 1 0 0 0 )
x
Classes in the text : BNDM example
Given the pattern ATGTA
• The masks of bits are
B(A) = ( 1 0 0 0 1 ) B(R)=( )B(C) = ( 0 0 0 0 0 ) …B(G) = ( 0 0 1 0 0 ) B(N)=( )B(T) = ( 0 1 0 1 0 )
Classes in the text : BNDM example
Given the pattern ATGTA B(A) = ( 1 0 0 0 1 ) B(R)=(1 0 1 0 1)B(C) = ( 0 0 0 0 0 ) …B(G) = ( 0 0 1 0 0 ) B(N)=( )B(T) = ( 0 1 0 1 0 )
• The masks of bits are
Classes in the text : BNDM example
Given the pattern ATGTA
• text : G T A R T R N A G G A C G ...A T G T A
A T G T A
B(A) = ( 1 0 0 0 1 ) B(R)=(1 0 1 0 1)B(C) = ( 0 0 0 0 0 ) …B(G) = ( 0 0 1 0 0 ) B(N)=(1 1 1 1 1)B(T) = ( 0 1 0 1 0 )
D1 = ( 0 1 0 1 0 )D2 = ( 1 0 1 0 0 ) & ( 1 0 1 0 1 ) = ( 1 0 1 0 0 )D2 = ( 0 1 0 0 0 ) & ( 1 0 0 0 1 ) = ( 0 0 0 0 0 )
• The masks of bits are
Classes in the text : BNDM example
Given the pattern ATGTA
• text : G T A R T R N A G G A C G ...A T G T A
A T G T A
B(A) = ( 1 0 0 0 1 ) B(R)=(1 0 1 0 1)B(C) = ( 0 0 0 0 0 ) …B(G) = ( 0 0 1 0 0 ) B(N)=(1 1 1 1 1)B(T) = ( 0 1 0 1 0 )
D1 = ( 0 1 0 1 0 )D2 = ( 1 0 1 0 0 ) & ( 1 0 1 0 1 ) = ( 1 0 1 0 0 )
D1 = ( 1 0 0 0 1 )D2 = ( 0 0 0 1 0 ) & ( 1 1 1 1 1 ) = ( 0 0 0 1 0 )
D2 = ( 0 1 0 0 0 ) & ( 1 0 0 0 1 ) = ( 0 0 0 0 0 )
D3 = ( 0 0 1 0 0 ) & ( 1 0 1 0 1 ) = ( 0 0 1 0 0 )D4 = ( 0 1 0 0 0 ) & ( 0 1 0 1 0 ) = ( 0 1 0 0 0 )D5 = ( 1 0 0 0 0 ) & ( 1 0 1 0 1 ) = ( 1 0 0 0 0)
• The masks of bits are
Classes in the text : BNDM example
Given the pattern ATGTA
• text : G T A R T R N A G G A C G ...A T G T A
A T G T A
A T G T A
B(A) = ( 1 0 0 0 1 ) B(R)=(1 0 1 0 1)B(C) = ( 0 0 0 0 0 ) …B(G) = ( 0 0 1 0 0 ) B(N)=(1 1 1 1 1)B(T) = ( 0 1 0 1 0 )
D1 = ( 0 1 0 1 0 )D2 = ( 1 0 1 0 0 ) & ( 1 0 1 0 1 ) = ( 1 0 1 0 0 )
D1 = ( 1 0 0 0 1 )D2 = ( 0 0 0 1 0 ) & ( 1 1 1 1 1 ) = ( 0 0 0 1 0 )
D2 = ( 0 1 0 0 0 ) & ( 1 0 0 0 1 ) = ( 0 0 0 0 0 )
D3 = ( 0 0 1 0 0 ) & ( 1 0 1 0 1 ) = ( 0 0 1 0 0 )D4 = ( 0 1 0 0 0 ) & ( 0 1 0 1 0 ) = ( 0 1 0 0 0 )D5 = ( 1 0 0 0 0 ) & ( 1 0 1 0 1 ) = ( 1 0 0 0 0)
…
• The masks of bits are
Classes in the text
Experimental efficiency (Navarro & Raffinot)
2 4 8 16 32 64 128 256
64
32
16
8
4
2
| |
Long. pattern
Horspool
BNDMBOM
BNDM : Backward Nondeterministic Dawg Matching
BOM : Backward Oracle Matching
w
BOM algorithm (Backward Oracle Matching)
• Which is the next position of the window?
• How the comparison is made?
Text :
Pattern : Automata: Factor Oracle
Check if the suffix is a factor
The position determined by the last character of the text with a transition in the automata
Classes in the text: BOM example
The we build the AFO of the inverse pattern of ATGTATG
… and we try to find… : G T A R T R N A A T G…A T G T A T G
GG AT T ATTA
G
It’s not possible any improvement!
Multiple string matching
5 10 15 20 25 30 35 40 45
8
4
2
| |Wu-Manber
SBOMlmin
(5 strings)
5 10 15 20 25 30 35 40 45
8
4
2
Wu-Manber
SBOM(10 strings)
Ad AC
5 10 15 20 25 30 35 40 45
8
4
2
Wu-Manber
SBOM (1000 strings)
Ad AC
5 10 15 20 25 30 35 40 45
8
4
2
Wu-ManberSBOM
(100 strings)
Ad AC
Classes in the text: Set Horspool algorithm
Text :
Patterns:
By suffixes
• Which is the next position of the window?
• How the comparison is made?
Trie of all inverse patterns
?
Set Horspool algorithm
Search for ATGTATG,TATG,ATAAT,ATGTG
4. Find the patterns
T A
G
GA
TTT
T
G
A
A
AA T
1. Construct the trie of GTATGTA, GTAT, TAATA i GTGTA
2. Determine lmin=4A 1C 4 (lmin)G 2T 1
3. Determine the shift table
Classes in the text: Set Horspool
Search for the patterns ATGTATG,TATG,ATAAT,ATGTG
T A
A
G
GA
TTT
T
G
A
A
AA T
text: ARTGNCTATGTGACA…
It’s not possible any improvement!
Multiple string matching
5 10 15 20 25 30 35 40 45
8
4
2
| |Wu-Manber
SBOMlmin
(5 strings)
5 10 15 20 25 30 35 40 45
8
4
2
Wu-Manber
SBOM(10 strings)
Ad AC
5 10 15 20 25 30 35 40 45
8
4
2
Wu-Manber
SBOM (1000 strings)
Ad AC
5 10 15 20 25 30 35 40 45
8
4
2
Wu-ManberSBOM
(100 strings)
Ad AC
Classes in the text: SBOM algorithm
• Which is the next position of the window?
• How the comparison is made?
Text :
Pattern : Automata: Factor Oracle (Inverse patterns of length lmin)
Check if the suffix is a factor of any pattern
The position determined by the last character of the text with a transition in the automata
Classes in the text: SBOM example
Search for the patterns ATGTATG, TAATG,TAATAAT i AATGTG
GG AT TTTA
G A
T A
A1 4
2 3
text: ACATN C TAGC TA TA ATAATGTATG
A
It’s not possible any improvement!
Extended alphabets
Classes in the:text pattern
Horspool ✓BNDM ✓BOM ✗
Set-Horspool ✗SBOM ✗
Extended search
Second part
Classes in the pattern
Classes in the pattern: Brute force algorithm
Text : over From left to right: prefix
• Which is the next position of the window?
• How the comparison is made?
Pattern :
Text :
The window is shifted only one cell
We need the operation: belongs to a set ?
?
Pattern : over 2|∑|
Classes in the pattern: Brute force algorithm
Every subset is represented by a string of bits of length | |.
G T A C T A G A G G A C G T A T G T A C T G ...A T N T R
A T N T R
When || < computer word
For instance, given the DNA alphabet ={A,C,G,T} : I(A)=(1,0,0,0), I(C)=(0,1,0,0),... I(R)=(1,0,1,0,),..., I(N)=(1,1,1,1)
Then the operation “A belongs to set X” is made with I(A) and I(X) >0
I(T) and I(R) >0
I(A) and I(R) >0 I(T) and I(T) >0 I(C) and I(N) >0 I(A) and I(T) >0 …
Classes in the text
Experimental efficiency (Navarro & Raffinot)
2 4 8 16 32 64 128 256
64
32
16
8
4
2
| |
Long. pattern
Horspool
BNDMBOM
BNDM : Backward Nondeterministic Dawg Matching
BOM : Backward Oracle Matching
w
Classes in the pattern: Horspool algorithm
Text :
Pattern :Sufix search
• Which is the next position of the window?
• How the comparison is made?
Pattern :
Text : a
Shift until the next ocurrence of “a” in the pattern:
aa a
a a a
We need a preprocessing phase to construct the shift table.
Classes in the pattern: Horspool example
Given the pattern ATNTR
• The shift table is:
A C G T
Classes in the pattern: Horspool example
Given the pattern ATNTR
• The shift table is:
A 2C G T
Classes in the pattern: Horspool example
Given the pattern ATNTR
• The shift table is:
A 2C 2G T
Classes in the pattern: Horspool example
Given the pattern ATNTR
• The shift table is:
A 2C 2G 2T
Classes in the pattern: Horspool example
Given the pattern ATNTR
• The shift table is:
A 2C 2G 2T 1
text : G T A C T A G A T A T G A G ...A T N T R
A T N T R
A T N T R
A T N T R A T N T R
Classes in the pattern: Horspool example
Given the pattern ATNTR
• The shift table is:
A 2C 2G 2T 1
text : G T A C T A G A T A T G A G ...A T N T R
A T N T R
A T N T R
A T N T R A T N T R
A T G T A
Shorter shifts!
Classes in the text
Experimental efficiency (Navarro & Raffinot)
2 4 8 16 32 64 128 256
64
32
16
8
4
2
| |
Long. pattern
Horspool
BNDMBOM
BNDM : Backward Nondeterministic Dawg Matching
BOM : Backward Oracle Matching
w
Text :
Pattern :
Search for suffixes of T that are factors of the pattern
Classes in the text: BNDM algorithm
• Which is the next position of the window ?
• How the comparison is made?
…that is denoted as
D2 = 1 0 0 0 1 0 0
Depends on the value of the leftmost bit of D
Once the next character x is read D3 = D2<<1 & B(x)
B(x): mask of x in the pattern P. For instance, if B(x) = ( 0 0 1 1 0 0 0)
D = (0 0 0 1 0 0 0) & (0 0 1 1 0 0 0 ) = (0 0 0 1 0 0 0 )
x
Classes in the pattern : BNDM example
Given the pattern ATNTR
• The masks of bits of symbols are
B(A) = ( )B(C) = ( )B(G) = ( )B(T) = ( )
Classes in the pattern : BNDM example
Given the pattern ATNTR
• The masks of bits of symbols are
B(A) = ( 1 0 1 0 1 )B(C) = ( )B(G) = ( )B(T) = ( )
Classes in the pattern : BNDM example
Given the pattern ATNTR
• The masks of bits of symbols are
B(A) = ( 1 0 1 0 1 )B(C) = ( 0 0 1 0 0 )B(G) = ( )B(T) = ( )
Classes in the pattern : BNDM example
Given the pattern ATNTR
• The masks of bits of symbols are
B(A) = ( 1 0 1 0 1 )B(C) = ( 0 0 1 0 0 )B(G) = ( 0 0 1 0 1 )B(T) = ( )
Classes in the pattern : BNDM example
Given the pattern ATNTR
• text : G T A C T A G A G G A C G T A T G T A C T G ...A T N T R
A T N T R
A T N T R
• The masks of bits of symbols are
B(A) = ( 1 0 1 0 1 )B(C) = ( 0 0 1 0 0 )B(G) = ( 0 0 1 0 1 )B(T) = ( 0 1 1 1 0 )
D1 = ( 0 1 1 1 0 )D2 = ( 1 1 1 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 1 0 0 )
D1 = ( 0 0 1 0 1 )D2 = ( 0 1 0 1 0 ) & ( 0 0 1 0 1 ) = ( 0 0 0 0 0 )
D3 = ( 0 1 0 0 0 ) & ( 1 0 1 0 1 ) = ( 0 0 0 0 0 )
…
D1 = ( 1 0 1 0 1 )D2 = ( 0 1 0 1 0 ) & ( 0 1 1 1 0 ) = ( 0 1 0 1 0 )
D3 = ( 1 0 1 0 0 ) & ( 0 0 1 0 1 ) = ( 0 0 1 0 0 )D4 = ( 0 1 0 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 0 0 0 ) A T N T R
Classes in the text
Experimental efficiency (Navarro & Raffinot)
2 4 8 16 32 64 128 256
64
32
16
8
4
2
| |
Long. pattern
Horspool
BNDMBOM
BNDM : Backward Nondeterministic Dawg Matching
BOM : Backward Oracle Matching
w
BOM algorithm (Backward Oracle Matching)
• Which is the next position of the window?
• How the comparison is made?
Text :
Pattern : Automata: Factor Oracle
Check if the suffix is a factor
The position determined by the last character of the text with a transition in the automata
Classes in the pattern: BOM example
• Given the pattern ATGTATG, the AFO is
GG AT T ATTA
G
We should apply the SBOM algorithm!
but for the patter ATNTRTG?
Multiple string matching
5 10 15 20 25 30 35 40 45
8
4
2
| |Wu-Manber
SBOMlmin
(5 strings)
5 10 15 20 25 30 35 40 45
8
4
2
Wu-Manber
SBOM(10 strings)
Ad AC
5 10 15 20 25 30 35 40 45
8
4
2
Wu-Manber
SBOM (1000 strings)
Ad AC
5 10 15 20 25 30 35 40 45
8
4
2
Wu-ManberSBOM
(100 strings)
Ad AC
Set Horspool algorithm
Text :
Patterns:
By suffixes
• Which is the next position of the window?
• How the comparison is made?
a
Trie of all inverse patterns
We shift until a is aligned with the first a in the trie not longer than lmin, or lmin
Set Horspool algorithm
Search for ATNTARG,RTGR,NTTNAR,ATRTG
4. Find the patterns
1. Construct the trie of the 46 possible inverse patterns
2. Determine lmin=4
A 1C 2G 1T 1
3. Determine the shift table
Multiple string matching
5 10 15 20 25 30 35 40 45
8
4
2
| |Wu-Manber
SBOMlmin
(5 strings)
5 10 15 20 25 30 35 40 45
8
4
2
Wu-Manber
SBOM(10 strings)
Ad AC
5 10 15 20 25 30 35 40 45
8
4
2
Wu-Manber
SBOM (1000 strings)
Ad AC
5 10 15 20 25 30 35 40 45
8
4
2
Wu-ManberSBOM
(100 strings)
Ad AC
SBOM algorithm
• Which is the next position of the window?
• How the comparison is made?
Text :
Pattern : Automata: Factor Oracle (Inverse patterns of length lmin)
Check if the suffix is a factor of any pattern
The position determined by the last character of the text with a transition in the automata
Classes in the patterns: SBOM example
the Automata Factor Oracle of all 21 possible patterns is built …
Given the patternsATGNARG, TRATR,TAATAAT i ANTNTGR
Extended alphabets
Classes in the:text pattern
Horspool ✓ ✓BNDM ✓ ✓BOM ✗ ≈
Set-Horspool ✗ ≈ SBOM ✗ ≈
Extended string matching
• Classes of characters: when in some DNA files or patterns there are new characters as N or R that means N={A,C,G,T} and R={G,A}.
• Bounded length gaps: we find pattern as ATx(2,3)TA where x(2,3) means any 2 or 3 characters.
• Optional characters: we find pattern as AC?ACT?T?A where C? means that C may or may not appear in the text.
• Wild cards: we find pattern as AT*TA where * means an arbitrary long string.
• Repeatable characters: we find pattern as AT[TA]*AT where [TA]* means that TA can appear zero or more times..
Bounded length gaps : BNDM example
Given the pattern ATx(2,3)TA
• The masks of bits are
B(A) = ( 1 0 1 1 1 0 1 ) B(C) = ( 0 0 1 1 1 0 0 )B(G) = ( 0 0 1 1 1 0 0 )B(T) = ( 0 1 1 1 1 1 0 )
Bounded length gaps : BNDM example
Given the pattern ATx(2,3)TA
• text : A T A G T A G A G T ...
D2 = ( 0 1 1 1 0 1 0 ) & ( 0 1 1 1 1 1 0 ) = ( 0 1 1 1 0 1 0 )D3 = ( 1 1 1 0 1 0 0 ) & ( 0 0 1 1 1 0 0 ) = ( 0 0 1 0 1 0 0 )
• The masks of bits are• The masks of bits are
B(A) = ( 1 0 1 1 1 0 1 ) B(C) = ( 0 0 1 1 1 0 0 )B(G) = ( 0 0 1 1 1 0 0 )B(T) = ( 0 1 1 1 1 1 0 )
D1 = ( 1 0 1 1 1 0 1 )
D4 = ( 0 1 0 1 0 0 0 ) & ( 1 0 1 1 1 0 1 ) = ( 0 0 0 1 0 0 0 )
D5 = ( 0 0 1 0 0 0 0 ) & ( 0 1 1 1 1 1 0) = ( 0 0 1 0 0 0 0 )
D6 = ( 0 1 0 0 0 0 0 ) & ( 1 0 1 1 1 0 1) = ( 0 0 0 0 0 0 0 ) ?
Bounded length gaps : BNDM example
Given the pattern ATx(2,3)TA • text : A T A G T A G A G T ...
D2 = ( 0 1 1 1 0 1 0 ) & ( 0 1 1 1 1 1 0 ) = ( 0 1 1 1 0 1 0 )D3 = ( 1 1 1 0 1 0 0 ) & ( 0 0 1 1 1 0 0 ) = ( 0 0 1 0 1 0 0 )
D1 = ( 1 0 1 1 1 0 1 )
D4 = ( 0 1 0 1 0 0 0 ) & ( 1 0 1 1 1 0 1 ) = ( 0 0 0 1 0 0 0 )D5 = ( 0 0 1 0 0 0 0 ) & ( 0 1 1 1 1 1 0) = ( 0 0 1 0 0 0 0 )
D6 = ( 0 1 0 0 0 0 0 ) & ( 1 0 1 1 1 0 1) = ( 0 0 0 0 0 0 0 ) ?AT***TA
Let’s see the automaton:
ε0 1 0 0 0 0 0 -0 0 0 1 0 0 0
0 0 1 1 0 0 0
= F= I
D [ F - (I & D) ] & ¬ F
( 0 0 1 1 0 0 0 ) 1 1 1 1