Periodic pattern mining

Preview:

Citation preview

1 I NAME OF PRESENTER

Periodic Pattern Mining in Time Series Databases

Ashis Kumar ChandaSwapnil Saha

Department of Computer Science and EngineeringUniversity of Dhaka

2 I NAME OF PRESENTERCSE, DU2

Introduction

Key Terms

Suffix Tree Generation

Conclusion

>

>

>

Time Series Database>

Periodic Pattern Detection

>

Topics to be covered

>

3 I NAME OF PRESENTERCSE, DU3

Introduction

What is a time-series database?A time-series database consists of sequences of values or events obtained over repeated measurements of time

A fixed time intervals (e.g., hourly, daily, weekly).

4

MATHEMATICAL RPRESENTATIONA time series is a set of observation taken

at specified times

A time series involving a variable YIf a time series is defined by y1, y2, y3 ...

Values at times t1, t2, t3 ... Then we can write a function of time Y=F(t)

5

CATEGORIES OF TIME SERIES Long term movements Cyclic movements Seasonal movements Irregular or random movements

We can define each movements as L, C, S, I variables respectively

And Time series variables Y = L+C+S+Ior Y = L*C*S*I

6

TYPES OF PERIODIC Symbol periodicity

axy apq amn

Sequence periodicityabxy abpq abmn

Segment periodicityabxy abxy abxy

7

KEY TERMS

Perfect Periodicityabxy abpq abmnabxy acpq abmn

Here conf( 4,0, ab)= 2/3 = 0.67

8

KEY TERMS Periodicity in Subsection of a Time

SeriesT= gbxy asdf abpq abmnStpos = 8endPos= 15So, Subsection part gbxy asdf abpq

abmn

9

KEY TERMS Periodicity with Time ToleranceWe can’t get always noise free time series

data

So we check some more bit then our target

sequenceThis extra bit is known as time tolerance

(tt)

If X is a pattern of p length in T then we check

At stPos, stPos+p±tt, stPos+2p±tt . . . ..

10

KEY TERMS A period in a time series may be

represented by 5 tuple ( S, p, stPos, endPos, Conf)

S = sequence of periodic patternp = check pattern after p num of charConf= confidencestPos, endPos is the starting and endingposition of segment where match pattern

11

KEY TERMS Suppose, T= abxy acpq abdd abmnthen ( ab, 4, 0, 11, 1) means Find ab pattern in T from 0 position to 11postion affter 4 char

a b x y a c p q a b d d abmn0 1 2 3 4 5 6 7 8 9 10 11

12

KEY TERMSOccurrence Vector:

a b c a b b a b b a $0 1 2 3 4 5 6 7 8 9

Occurrence vector of a : (0 3 6 9)Occurrence vector of ab : (0 3 6)

13

KEY TERMSDifference Vector:

a b c a b b a b b a $0 1 2 3 4 5 6 7 8 9

Occurrence vector of a : 0 3Difference vector : 3Occurrence vector of bb : 4 7Difference vetor : 3

14

How to get a string format from a Transactional database?

Discretization Technique

15

DISCRETIZATION TECHNIQUE

16

DISCRETIZATION TECHNIQUEWe need to define a range or group from

DB and characterized each range by a unique ASCII character

Suppose,In our previous example,

log in defined by alog out ,, xbefore log in ,, bbefore log out ,, cafter log out ,, d

17

DISCRETIZATION TECHNIQUE

18

DISCRETIZATION TECHNIQUE

accx acxd axdd bacx

19

SUFFIX TREE GENERATION

‘abcabbaabb$’ has following ten suffixes. We can ignore the 10th suffix when generating suffix tree

1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

7. abb$

8. bb$

9. b$

10. $

20

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

ab

ac

bb

ab

b$

21

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

ab

ac

bb

ab

b$

bc

b

ab

$

a

b

b

22

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

ab

ac

bb

ab

b$

bc

b

ab

$

a

b

b

cb

ab

$

ab

b

23

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

ab

bc

b

ab

$

a

b

b

cb

ab

$

ab

b

ac

bb

ab

b$

24

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

ab

bc

b

ab

$

a

b

b

cb

ab

$

ab

b

ac

bb

ab

b$

b

a

b

b

$

25

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

a

b

b

$

c

b

ab

$

a

b

b

ba

bb

$

26

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

a

b

b

$

c

b

ab

$

a

b

b

ba

bb

$

a bb

$

27

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

7. abb$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

abb$

c

b

ab

$

a

b

b

ba

bb

$

a bb

$

$

28

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

7. abb$

8. bb$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

abb$

c

b

ab

$

a

b

b

b

a

a bb

$

$

bb

$

$

29

SUFFIX TREE GENERATION

Strings:1. abcabbabb$

2. bcabbabb$

3. cabbabb$

4. abbabb$

5. bbabb$

6. babb$

7. abb$

8. bb$

9. b$

ab b

cb

ab

$

ab

b

ac

bb

ab

b$

b

abb$

c

b

ab

$

a

b

b

b

a

a bb

$

$

bb

$

$

$

30

SUFFIX TREEabcabbabb$

Edge leaf node holds a number that represents starting position of the suffix

Each intermediate node holds a number which is the length of the substring read from root to the intermediate node

0

ab

1

b

cb

ab

$

ab

b

2ac

bb

ab

b$

2

6

b

abb$

c

b

ab

$

a

b

b

1

4

b

a

5

a bb

$

3$

3

bb

$

2

7

$

$

8

31

SUFFIX TREEabcabbabb$Find Occrrence Vector

0

ab

1

b

cb

ab

$

ab

b

2ac

bb

ab

b$

2

6

b

abb$

c

b

ab

$

a

b

b

1

4

b

a

5

a bb

$

3$

3

bb

$

2

7

$

$

8(3,6)

32

SUFFIX TREEabcabbabb$Find Occrrence Vector

0

ab

1

b

cb

ab

$

ab

b

2ac

bb

ab

b$

2

6

b

abb$

c

b

ab

$

a

b

b

1

4

b

a

5

a bb

$

3$

3

bb

$

2

7

$

$

8(3,6)

(0,3,6)

33

SUFFIX TREEabcabbabb$Find Occrrence Vector

0

ab

1

b

cb

ab

$

ab

b

2ac

bb

ab

b$

2

6

b

abb$

c

b

ab

$

a

b

b

1

4

b

a

5

a bb

$

3$

3

bb

$

2

7

$

$

8(3,6)

(0,3,6)(4,7)

(1,5,8,4,7)

34

PERIODICITY DETECTIONInput: a time series of Size nOutput: Positions of periodic patterns

Process:for each occurrence vector of size k

find pfor 0 to k

check each position after p char

count confidenceadd to list if greater than threshold

35

STEPSabcabbabb$ab - (0,3,6)abb - (3,6)bb - (4,7)b - (1,5,8,4,7)

stpos= 0endPos= 6P= 3-0 = 3

Now check occurrence vector of abif difference equal pcount increment

Check confidenceAdd to pattern list if confidence >= Θ

36

STEPSabcdabcabcab$ab - (0,4,7,10)stpos= 0

endPos= 10P= 4-0 = 4

Now check occurrence vector of abif difference equal pcount increment

Only one pattern get 0 to 10 with p=4abcdabcabcab$

37

STEPSabcdabcabcab$ab - (0,4,7,10)stpos= 4

endPos= 10P= 7-4 = 3

Now check occurrence vector of abif difference equal pcount increment

3 pattern get 4 to 10 with p=3 abcdabcabcab$

38

ALGORITHM

39

DISCUSS- Elfeky proposed two separate

algorithms to detect symbol & segment periodicity. (CONV) & (WARP)

But it not used in sub-sequence & complexity O(nlogn) & O(n^2)

- Han’s parper algorithm used in sub-sequence

But it need user input

40

DISCUSS- In this perspective, The algorithm

discussed here is better than previous- Complexity O(nlogn)

- Works online

41 I NAME OF PRESENTERCSE, DU41

References- Periodic pattern mining using suffix tree

by Rasheed, Al-Shalalfa, & Alhajj, 2011

- Effective periodic pattern mining in time series database by Nishi, Farhan, Samiullah, Jeong

- Data Mining Concepts & Techniquesby J. Han & M. Kamber

- Database system Concept by Abraham Sillberschatz, Korth, Sudarshan

42 I NAME OF PRESENTERCSE, DU42

Questions

43 I NAME OF PRESENTERCSE, DU43

Thank You

Recommended