Linear-time computation of local periods Linear-time computation of local periods Gregory Kucherov...

Preview:

DESCRIPTION

3 Finding periodicities CGCGGCAGTTTTGCCGACTGTTTGGGACTTGCTCGAACTTGCCTATGCCAAGCTGCCGACGATTC CGCCCACCCTGTTGGAACGCGATTTTAATTTCCCGCCTTTTTCCGAACTCGAAGCCGAAGTCGCC AAAATCGCCGATTATCAAACGCGTGCCGGAAAGGAATGCCGCCGTGCAGCCTGAAACCTCCGCCC AATACCAGCACCGTTTCGCCCAAGCCATACGCGGGGGCGAAGCCGCAGACGGTCTGCCGCAAGAC CGACTGAACGTCTATATCCGCCTGATACGCAACAATATCTACAGCTTTATCGACCGTTGTTATAC CGAAACGCTGCAATACTTTGACCGCGAAGAATGGGGCCGTCTGAAAGAAGGTTTCGTCCGCGACG CGTGCGCCCAAACGCCCTATTTTCAAGAAATCCCCGGCGAGTTCCTCCAATATTGCCAAAGCCTG CCGCTTTTAGACGGCATTTTGGCACTGATGGATTTTGAATATACCCAATTGCTGGCAGAAGTTGC TCAAATTCCGGATATTCCCGACATTCATTATTCAAATGACAGCAAATACACACCTTCCCCTGCGG CCTTTATCCGGCAATATCGATATGATGTTACCGATGATTTGCATGAAGCGGAAACAGCCTTGTTA ATATGGCGAAACGCCGAAGATGATGTGATGTACCAAACATTGGACGGCTTCGATATGATGCTGCT AGAAATAATGGGGTTCTCCGCGCTTTCGTTTGACACCCTCGCCCAAACCCTTGTCGAATTTATGC CTGAGGACGATAATTGGAAAAATATTTTGCTTGGGAAATGGTCAGGCTGGACTGAACAAAGGATT ATCATCCCCTCCTTGTCCGCCATATCCGAAAATATGGAAGACAATTCCCCGGGCC

Citation preview

Linear-time computation of local Linear-time computation of local periodsperiods

Gregory KucherovINRIA/LORIA

Nancy, France

joint work with Roman Kolpakov (Moscow) and Jean-Pierre Duval, Thierry Lecroq, Arnaud

Lefebvre (Rouen)

Haifa Stringology Workshop, April 3-8 2005

2

Periodicities (repetitions) in stringsPeriodicities (repetitions) in strings

period: the (global) period: minimal period periodicity = word of period Example: square, cube : fractional periodicity periodicities = “runs” of squares (cyclic) root, 8/3 exponent

3

Finding periodicitiesFinding periodicities

CGCGGCAGTTTTGCCGACTGTTTGGGACTTGCTCGAACTTGCCTATGCCAAGCTGCCGACGATTCCGCCCACCCTGTTGGAACGCGATTTTAATTTCCCGCCTTTTTCCGAACTCGAAGCCGAAGTCGCCAAAATCGCCGATTATCAAACGCGTGCCGGAAAGGAATGCCGCCGTGCAGCCTGAAACCTCCGCCCAATACCAGCACCGTTTCGCCCAAGCCATACGCGGGGGCGAAGCCGCAGACGGTCTGCCGCAAGACCGACTGAACGTCTATATCCGCCTGATACGCAACAATATCTACAGCTTTATCGACCGTTGTTATACCGAAACGCTGCAATACTTTGACCGCGAAGAATGGGGCCGTCTGAAAGAAGGTTTCGTCCGCGACGCGTGCGCCCAAACGCCCTATTTTCAAGAAATCCCCGGCGAGTTCCTCCAATATTGCCAAAGCCTGCCGCTTTTAGACGGCATTTTGGCACTGATGGATTTTGAATATACCCAATTGCTGGCAGAAGTTGCTCAAATTCCGGATATTCCCGACATTCATTATTCAAATGACAGCAAATACACACCTTCCCCTGCGGCCTTTATCCGGCAATATCGATATGATGTTACCGATGATTTGCATGAAGCGGAAACAGCCTTGTTAATATGGCGAAACGCCGAAGATGATGTGATGTACCAAACATTGGACGGCTTCGATATGATGCTGCTAGAAATAATGGGGTTCTCCGCGCTTTCGTTTGACACCCTCGCCCAAACCCTTGTCGAATTTATGCCTGAGGACGATAATTGGAAAAATATTTTGCTTGGGAAATGGTCAGGCTGGACTGAACAAAGGATTATCATCCCCTCCTTGTCCGCCATATCCGAAAATATGGAAGACAATTCCCCGGGCC

4

Finding periodicitiesFinding periodicities

CGCGGCAGTTTTGCCGACTGTTTGGGACTTGCTCGAACTTGCCTATGCCAAGCTGCCGACGATTCCGCCCACCCTGTTGGAACGCGATTTTAATTTCCCGCCTTTTTCCGAACTCGAAGCCGAAGTCGCCAAAATCGCCGATTATCAAACGCGTGCCGGAAAGGAATGCCGCCGTGCAGCCTGAAACCTCCGCCCAATACCAGCACCGTTTCGCCCAAGCCATACGCGGGGGCGAAGCCGCAGACGGTCTGCCGCAAGACCGACTGAACGTCTATATCCGCCTGATACGCAACAATATCTACAGCTTTATCGACCGTTGTTATACCGAAACGCTGCAATACTTTGACCGCGAAGAATGGGGCCGTCTGAAAGAAGGTTTCGTCCGCGACGCGTGCGCCCAAACGCCCTATTTTCAAGAAATCCCCGGCGAGTTCCTCCAATATTGCCAAAGCCTGCCGCTTTTAGACGGCATTTTGGCACTGATGGATTTTGAATATACCCAATTGCTGGCAGAAGTTGCTCAAATTCCGGATATTCCCGACATTCATTATTCAAATGACAGCAAATACACACCTTCCCCTGCGGCCTTTATCCGGCAATATCGATATGATGTTACCGATGATTTGCATGAAGCGGAAACAGCCTTGTTAATATGGCGAAACGCCGAAGATGATGTGATGTACCAAACATTGGACGGCTTCGATATGATGCTGCTAGAAATAATGGGGTTCTCCGCGCTTTCGTTTGACACCCTCGCCCAAACCCTTGTCGAATTTATGCCTGAGGACGATAATTGGAAAAATATTTTGCTTGGGAAATGGTCAGGCTGGACTGAACAAAGGATTATCATCCCCTCCTTGTCCGCCATATCCGAAAATATGGAAGACAATTCCCCGGGCC

5

Some work has been done ...Some work has been done ...

... see R.Kolpakov,G.Kucherov, Periodic structures in words, chapter of the 3rd Lothaire volume Applied Combinatorics on Words, Cambridge University Press, 2005

6

Some work has been done ...Some work has been done ...

... see R.Kolpakov,G.Kucherov, Periodic structures in words, chapter of the 3rd Lothaire volume Applied Combinatorics on Words, Cambridge University Press, 2005

different results based on common simple techniques: extension functions and s-factorization

7

Rest of this talkRest of this talk

Basics– extension functions– computing periodicities in time– s-factorisation (Lempel-Ziv factorization)– computing periodicities in time

Computing all local periods in time

8

Extension function: simplest definitionExtension function: simplest definition

all values can be computed in time [Main&Lorentz 84]

9

Extension function: simplest definitionExtension function: simplest definition

all values can be computed in time [Main&Lorentz 84] a refined algorithm is presented in [Lothaire

05] (inspired from Manacher’s linear-time algorithm for computing palindromes)

10

Extension function: variantsExtension function: variants

11

Using extension functions to compute Using extension functions to compute periodicitiesperiodicities

Lemma: There exists a square of period iff

12

Using extension functions to compute Using extension functions to compute periodicitiesperiodicities

Example:

a t a c g a a c g a a c g g t a c g a a c g a

c g a a c g a ag a a c g a a c

13

Using extension functions to compute Using extension functions to compute periodicitiesperiodicities

Example:

a t a c g a a c g a a c g g t a c g a a c g a

c g a a c g a ag a a c g a a c

14

Using extension functions to compute Using extension functions to compute periodicitiesperiodicities

This implies (using binary division) that one can compute a compact representation of

all squares (maximal periodicieis) in time one can compute all squares in time

[Crochemore 81, Main&Lorentz 84] one can test the square-freeness in time

15

ss-factorization -factorization ((Lempel-Ziv factorization)Lempel-Ziv factorization)

, where :– if letter which immediately follows

does not occur in , then– otherwise is the longest subword

occurring at least twice in Example: s-factorization (Lempel-Ziv factorization) can

be computed in linear time using suffix tree or DAWG

16

Why Why s-s-factorization is useful herefactorization is useful here

17

Why Why s-s-factorization is useful herefactorization is useful here

18

Why Why s-s-factorization is useful herefactorization is useful here

lemma of [Main 89]

19

Computing (a compact representation of) Computing (a compact representation of) all squares in linear timeall squares in linear time

1. compute the s-factorization of (in )2. for each factor

A. compute all maximal periodicities ending inside and crossing the border between and (in )

B. recover all maximal periodicities occurring inside from a left copy of (in )

Important: the number of maximal periodicities is while the number of squares can be

20

Using extension functions + Using extension functions + s-s-factorization factorization to compute periodicitiesto compute periodicities

This implies that one can compute a compact representation of

all squares (maximal periodicities) in time [Kolpakov,Kucherov 99]

one can compute all squares (but also cubes, ...) in time

one can test the square-freeness in time [Crochemore 83, Main&Lorentz 85]

21

Local Local periodperiodss

minimal (local) square at = minimal square centered at local period at (denoted ) = root length of the minimal square at

internal square

right-external square

left- and right-external square

22

Critical Factorization TheoremCritical Factorization Theorem

for any , global period of

Critical Factorization Theorem: For every , there exists a position such that = global period of

23

Computing local periods (minimal squares)Computing local periods (minimal squares)

compute separately– internal minimal squares– left-external and right-external minimal

squares– both left- and right-external minimal

squares focus on internal minimal squares compute s-factorization for each factor , compute minimal squares

ending in this factor

24

Minimal squares inside a factorMinimal squares inside a factor

25

Minimal squares inside a factorMinimal squares inside a factor

26

Minimal squares crossing factor borderMinimal squares crossing factor border

focus on squares crossing the left border of

27

Minimal squares crossing factor borderMinimal squares crossing factor border

focus on squares crossing the left border of focus on those of them centered inside

28

Minimal squares crossing factor borderMinimal squares crossing factor border

focus on squares crossing the left border of focus on those of them centered inside general idea: compute squares and pick the minimal ones

29

Minimal squares crossing factor borderMinimal squares crossing factor border

focus on squares crossing the left border of focus on those of them centered inside general idea: compute squares and pick the minimal ones be careful, the number of squares can be super-linear!!

30

Minimal squares crossing factor borderMinimal squares crossing factor border

focus on squares crossing the left border of focus on those of them centered inside general idea: compute squares and pick the minimal ones be careful, the number of squares can be super-linear!! compute maximal periodicities in increasing order of periods

31

Minimal squares crossing factor borderMinimal squares crossing factor border

focus on squares crossing the left border of focus on those of them centered inside general idea: compute squares and pick the minimal ones be careful, the number of squares can be super-linear!! compute maximal periodicities in increasing order of periods only a linear number of squares need to be tested for

minimality!!

32

Sketch of the proofSketch of the proof

assume we are looking at squares of period

33

Sketch of the proofSketch of the proof

assume we are looking at squares of period consider largest period for which squares have

been found

34

Sketch of the proofSketch of the proof

assume we are looking at squares of period consider largest period for which squares have

been found if , then test all squares of period (at most )

35

Sketch of the proofSketch of the proof

assume we are looking at squares of period consider largest period for which squares have

been found if , then test all squares of period (at most ) if , then either , or

36

Sketch of the proofSketch of the proof

assume we are looking at squares of period consider largest period for which squares have

been found if , then test all squares of period (at most ) if , then either , or

37

Sketch of the proofSketch of the proof

assume we are looking at squares of period consider largest period for which squares have

been found if , then test all squares of period (at most ) if , then either , or

38

Sketch of the proofSketch of the proof

assume we are looking at squares of period consider largest period for which squares have

been found if , then test all squares of period (at most ) if , then either , or

39

Sketch of the proofSketch of the proof

assume we are looking at squares of period consider largest period for which squares have

been found if , then test all squares of period (at most ) if , then either , or at most squares need to be tested

40

Computing (right-)external squaresComputing (right-)external squares

41

Computing (right-)external squaresComputing (right-)external squares

use extension functions!

42

Computing (right-)external squaresComputing (right-)external squares

use extension functions!

43

Computing (right-)external squaresComputing (right-)external squares

use extension functions!

44

Computing (right-)external squaresComputing (right-)external squares

use extension functions! for each , find minimal such that can be done in time

45

ConclusionsConclusions

All local periods can be computed in

note that the global period of is

Recommended