122
Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K¨ arkk¨ ainen Dominik Kempa Simon J. Puglisi University of Helsinki, Finland CPM 2013

Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

  • Upload
    others

  • View
    25

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

Linear Time Lempel-Ziv Factorization:Simple, Fast, Small

Juha Karkkainen Dominik Kempa Simon J. Puglisi

University of Helsinki, Finland

CPM 2013

Page 2: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Outline

1 Introduction

2 Existing solutions

3 2n log n algorithm

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 3: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Example

`7 = 3p7 = 2 `10 = 3p10 = 1p10 = 4p10 = 6

X = b a b b a b a b b b a b

X = b a b b a b a b b b a bX = b a b b a b a b b b a b

1 2 3 4 5 6 7

77

8 9 10

10

11 12

Definition

Pairs (pi, `i) define the LPF array: LPF[i] = (pi, `i).

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 4: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Example

`7 = 3p7 = 2 `10 = 3p10 = 1p10 = 4p10 = 6

X = b a b b a b a b b b a b

X = b a b b a b a b b b a b

X = b a b b a b a b b b a b

1 2 3 4 5 6

7

7

7

8 9 10

10

11 12

Definition

Pairs (pi, `i) define the LPF array: LPF[i] = (pi, `i).

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 5: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Example

`7 = 3p7 = 2 `10 = 3p10 = 1p10 = 4p10 = 6

X = b a b b a b a b b b a b

X = b a b b a b a b b b a b

X = b a b b a b a b b b a b

1 2 3 4 5 6

7

7

7

8 9 10

10

11 12

Definition

Pairs (pi, `i) define the LPF array: LPF[i] = (pi, `i).

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 6: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Example

`7 = 3p7 = 2

`10 = 3p10 = 1p10 = 4p10 = 6

X = b a b b a b a b b b a b

X = b a b b a b a b b b a b

X = b a b b a b a b b b a b

1 2 3 4 5 6

7

7

7

8 9 10

10

11 12

Definition

Pairs (pi, `i) define the LPF array: LPF[i] = (pi, `i).

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 7: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Example

`7 = 3p7 = 2 `10 = 3p10 = 1p10 = 4p10 = 6

X = b a b b a b a b b b a bX = b a b b a b a b b b a b

X = b a b b a b a b b b a b1 2 3 4 5 6

77

7 8 9

10

10 11 12

Definition

Pairs (pi, `i) define the LPF array: LPF[i] = (pi, `i).

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 8: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Example

`7 = 3p7 = 2

`10 = 3p10 = 1

p10 = 4p10 = 6

X = b a b b a b a b b b a bX = b a b b a b a b b b a b

X = b a b b a b a b b b a b1 2 3 4 5 6

77

7 8 9

10

10 11 12

Definition

Pairs (pi, `i) define the LPF array: LPF[i] = (pi, `i).

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 9: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Example

`7 = 3p7 = 2

`10 = 3

p10 = 1

p10 = 4

p10 = 6

X = b a b b a b a b b b a bX = b a b b a b a b b b a b

X = b a b b a b a b b b a b1 2 3 4 5 6

77

7 8 9

10

10 11 12

Definition

Pairs (pi, `i) define the LPF array: LPF[i] = (pi, `i).

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 10: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Example

`7 = 3p7 = 2

`10 = 3

p10 = 1p10 = 4

p10 = 6

X = b a b b a b a b b b a bX = b a b b a b a b b b a b

X = b a b b a b a b b b a b1 2 3 4 5 6

77

7 8 9

10

10 11 12

Definition

Pairs (pi, `i) define the LPF array: LPF[i] = (pi, `i).

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 11: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Example

`7 = 3p7 = 2

`10 = 3

p10 = 1p10 = 4

p10 = 6

X = b a b b a b a b b b a bX = b a b b a b a b b b a b

X = b a b b a b a b b b a b1 2 3 4 5 6

77

7 8 9

10

10 11 12

Definition

Pairs (pi, `i) define the LPF array: LPF[i] = (pi, `i).

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 12: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

LPF array

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

0

0

1

3

2

4

3

2

4

3

2

1

⊥⊥1

1

2

1

2

3

3

4

5

6

1 2 3 4 5 6 7 8 9 10 11 12

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 13: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:

LZ77: (b,0)LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 14: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:

LZ77: (b,0)LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 15: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:

LZ77: (b,0)

LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 16: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:

LZ77: (b,0)

LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 17: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)

LZ77: (b,0),(a,0)

LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 18: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

11

31

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)

LZ77: (b,0),(a,0)

LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 19: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

11

31

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)LZ77: (b,0),(a,0)

LZ77: (b,0),(a,0),(1,1)

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 20: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

11

31

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)LZ77: (b,0),(a,0)

LZ77: (b,0),(a,0),(1,1)

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 21: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

11

31

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 22: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 23: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 24: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 25: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 26: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Lempel-Ziv Factorization

LPFpi `ii

1131

32

34

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

LZ77:LZ77: (b,0)LZ77: (b,0),(a,0)LZ77: (b,0),(a,0),(1,1)LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3),(4,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 27: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Outline

1 Introduction

2 Existing solutions

3 2n log n algorithm

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 28: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Existing solutions

Space excludes the input and output (both of size n log σ).

Algorithm Extra space

Abouelhoda et al., 2004 4n log nChen et al. (CPS1), 2007 3n log nCrochemore and Ilie, 2008 (3n+

√n) log n

Ohlebusch and Gog, 2011 3n log nGoto and Bannai (BGL), 2013 3n log n

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 29: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Our contribution

Two new linear time algorithms:

1 K3: 3n log n bits of extra space

minimizes the number of cache missesfastest, when the input is not highly repetitive

2 K2: 2n log n bits of extra space

most space efficient linear algorithm for LZ77based on combinatorics of suffix arrays

Algorithm K3

1: SA[0]← SA[n+ 1]← top← 02: for i← 1 to n+ 1 do3: while SA[top] > SA[i] do4: NSV[SA[top]]← SA[i]5: PSV[SA[top]]← SA[top− 1]6: top← top− 17: top← top+ 18: SA[top]← SA[i]9: i← 110:while i ≤ n do11: i← LZ-Factor(i,NSV[i],PSV[i])

Procedure LZ-Factor(i, nsv, psv)1: `nsv ← lcp(i, nsv)2: `psv ← lcp(i, psv)3: if `nsv > `psv then4: (p, `)← (nsv, `nsv)5: else6: (p, `)← (psv, `psv)7: if ` = 0 then p← X[i]8: output factor (p, `)9: return i+max(`, 1)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 30: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Our contribution

Two new linear time algorithms:

1 K3: 3n log n bits of extra space

minimizes the number of cache missesfastest, when the input is not highly repetitive

2 K2: 2n log n bits of extra space

most space efficient linear algorithm for LZ77based on combinatorics of suffix arrays

Algorithm K3

1: SA[0]← SA[n+ 1]← top← 02: for i← 1 to n+ 1 do3: while SA[top] > SA[i] do4: NSV[SA[top]]← SA[i]5: PSV[SA[top]]← SA[top− 1]6: top← top− 17: top← top+ 18: SA[top]← SA[i]9: i← 110:while i ≤ n do11: i← LZ-Factor(i,NSV[i],PSV[i])

Procedure LZ-Factor(i, nsv, psv)1: `nsv ← lcp(i, nsv)2: `psv ← lcp(i, psv)3: if `nsv > `psv then4: (p, `)← (nsv, `nsv)5: else6: (p, `)← (psv, `psv)7: if ` = 0 then p← X[i]8: output factor (p, `)9: return i+max(`, 1)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 31: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Our contribution

Two new linear time algorithms:

1 K3: 3n log n bits of extra spaceminimizes the number of cache misses

fastest, when the input is not highly repetitive

2 K2: 2n log n bits of extra space

most space efficient linear algorithm for LZ77based on combinatorics of suffix arrays

Algorithm K3

1: SA[0]← SA[n+ 1]← top← 02: for i← 1 to n+ 1 do3: while SA[top] > SA[i] do4: NSV[SA[top]]← SA[i]5: PSV[SA[top]]← SA[top− 1]6: top← top− 17: top← top+ 18: SA[top]← SA[i]9: i← 110:while i ≤ n do11: i← LZ-Factor(i,NSV[i],PSV[i])

Procedure LZ-Factor(i, nsv, psv)1: `nsv ← lcp(i, nsv)2: `psv ← lcp(i, psv)3: if `nsv > `psv then4: (p, `)← (nsv, `nsv)5: else6: (p, `)← (psv, `psv)7: if ` = 0 then p← X[i]8: output factor (p, `)9: return i+max(`, 1)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 32: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Our contribution

Two new linear time algorithms:

1 K3: 3n log n bits of extra spaceminimizes the number of cache missesfastest, when the input is not highly repetitive

2 K2: 2n log n bits of extra space

most space efficient linear algorithm for LZ77based on combinatorics of suffix arrays

Algorithm K3

1: SA[0]← SA[n+ 1]← top← 02: for i← 1 to n+ 1 do3: while SA[top] > SA[i] do4: NSV[SA[top]]← SA[i]5: PSV[SA[top]]← SA[top− 1]6: top← top− 17: top← top+ 18: SA[top]← SA[i]9: i← 110:while i ≤ n do11: i← LZ-Factor(i,NSV[i],PSV[i])

Procedure LZ-Factor(i, nsv, psv)1: `nsv ← lcp(i, nsv)2: `psv ← lcp(i, psv)3: if `nsv > `psv then4: (p, `)← (nsv, `nsv)5: else6: (p, `)← (psv, `psv)7: if ` = 0 then p← X[i]8: output factor (p, `)9: return i+max(`, 1)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 33: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Our contribution

Two new linear time algorithms:

1 K3: 3n log n bits of extra spaceminimizes the number of cache missesfastest, when the input is not highly repetitive

2 K2: 2n log n bits of extra space

most space efficient linear algorithm for LZ77based on combinatorics of suffix arrays

Algorithm K3

1: SA[0]← SA[n+ 1]← top← 02: for i← 1 to n+ 1 do3: while SA[top] > SA[i] do4: NSV[SA[top]]← SA[i]5: PSV[SA[top]]← SA[top− 1]6: top← top− 17: top← top+ 18: SA[top]← SA[i]9: i← 110:while i ≤ n do11: i← LZ-Factor(i,NSV[i],PSV[i])

Procedure LZ-Factor(i, nsv, psv)1: `nsv ← lcp(i, nsv)2: `psv ← lcp(i, psv)3: if `nsv > `psv then4: (p, `)← (nsv, `nsv)5: else6: (p, `)← (psv, `psv)7: if ` = 0 then p← X[i]8: output factor (p, `)9: return i+max(`, 1)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 34: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Our contribution

Two new linear time algorithms:

1 K3: 3n log n bits of extra spaceminimizes the number of cache missesfastest, when the input is not highly repetitive

2 K2: 2n log n bits of extra space

most space efficient linear algorithm for LZ77based on combinatorics of suffix arrays

Algorithm K3

1: SA[0]← SA[n+ 1]← top← 02: for i← 1 to n+ 1 do3: while SA[top] > SA[i] do4: NSV[SA[top]]← SA[i]5: PSV[SA[top]]← SA[top− 1]6: top← top− 17: top← top+ 18: SA[top]← SA[i]9: i← 110:while i ≤ n do11: i← LZ-Factor(i,NSV[i],PSV[i])

Procedure LZ-Factor(i, nsv, psv)1: `nsv ← lcp(i, nsv)2: `psv ← lcp(i, psv)3: if `nsv > `psv then4: (p, `)← (nsv, `nsv)5: else6: (p, `)← (psv, `psv)7: if ` = 0 then p← X[i]8: output factor (p, `)9: return i+max(`, 1)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 35: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Our contribution

Two new linear time algorithms:

1 K3: 3n log n bits of extra spaceminimizes the number of cache missesfastest, when the input is not highly repetitive

2 K2: 2n log n bits of extra spacemost space efficient linear algorithm for LZ77

based on combinatorics of suffix arrays

Algorithm K3

1: SA[0]← SA[n+ 1]← top← 02: for i← 1 to n+ 1 do3: while SA[top] > SA[i] do4: NSV[SA[top]]← SA[i]5: PSV[SA[top]]← SA[top− 1]6: top← top− 17: top← top+ 18: SA[top]← SA[i]9: i← 110:while i ≤ n do11: i← LZ-Factor(i,NSV[i],PSV[i])

Procedure LZ-Factor(i, nsv, psv)1: `nsv ← lcp(i, nsv)2: `psv ← lcp(i, psv)3: if `nsv > `psv then4: (p, `)← (nsv, `nsv)5: else6: (p, `)← (psv, `psv)7: if ` = 0 then p← X[i]8: output factor (p, `)9: return i+max(`, 1)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 36: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Our contribution

Two new linear time algorithms:

1 K3: 3n log n bits of extra spaceminimizes the number of cache missesfastest, when the input is not highly repetitive

2 K2: 2n log n bits of extra spacemost space efficient linear algorithm for LZ77based on combinatorics of suffix arrays

Algorithm K3

1: SA[0]← SA[n+ 1]← top← 02: for i← 1 to n+ 1 do3: while SA[top] > SA[i] do4: NSV[SA[top]]← SA[i]5: PSV[SA[top]]← SA[top− 1]6: top← top− 17: top← top+ 18: SA[top]← SA[i]9: i← 110:while i ≤ n do11: i← LZ-Factor(i,NSV[i],PSV[i])

Procedure LZ-Factor(i, nsv, psv)1: `nsv ← lcp(i, nsv)2: `psv ← lcp(i, psv)3: if `nsv > `psv then4: (p, `)← (nsv, `nsv)5: else6: (p, `)← (psv, `psv)7: if ` = 0 then p← X[i]8: output factor (p, `)9: return i+max(`, 1)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 37: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Outline

1 Introduction

2 Existing solutions

3 2n log n algorithm

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 38: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing

in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 39: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing

in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 40: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing

in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 41: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing

in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12

$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 42: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing

in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

=

= = 6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12$

$ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 43: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing

in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

=

=

= 6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12$ $

$

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 44: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing

in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

==

=

6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 45: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing

in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== =

6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 46: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing

in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12$ $ $

LZ77:LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 47: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12$ $ $

LZ77:LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 48: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Observation

The pi component of LPFarray is enough to computeLZ77 parsing in linear time.

Goal

Space efficient computationof all pi values.

LPFpi `ii b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

001324324321

3

⊥⊥1121233456

1 2 3 4 5 6 7 8 9 10 11 12$ $ $

LZ77:LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 49: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient computationof all pi values.

Goal

Space efficient simulationof all pi values.

1 Consider text position i,e.g., let i = 4

2 Locate in SA closestsmaller elements

Lemma [Crochemore, Ilie]

Either PSV[i] or NSV[i] is avalid choice for pi

Not quite what we wanted...

...but sufficient for LZ77.

SA

a a ba ba b a b b a a ba b b a a ba b b a b a b b a a bbb a a bb a b a b b a a bb a b b a a bb a b b a b a b b a a bb b a a bb b a b a b b a a b

SA

1011572

12946183

4

2

1

PSV[4] =

NSV[4] =

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 50: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient computationof all pi values.

Goal

Space efficient simulationof all pi values.

1 Consider text position i,e.g., let i = 4

2 Locate in SA closestsmaller elements

Lemma [Crochemore, Ilie]

Either PSV[i] or NSV[i] is avalid choice for pi

Not quite what we wanted...

...but sufficient for LZ77.

SA

a a ba ba b a b b a a ba b b a a ba b b a b a b b a a bbb a a bb a b a b b a a bb a b b a a bb a b b a b a b b a a bb b a a bb b a b a b b a a b

SA

1011572

12946183

4

2

1

PSV[4] =

NSV[4] =

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 51: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient computationof all pi values.

Goal

Space efficient simulationof all pi values.

1 Consider text position i,e.g., let i = 4

2 Locate in SA closestsmaller elements

Lemma [Crochemore, Ilie]

Either PSV[i] or NSV[i] is avalid choice for pi

Not quite what we wanted...

...but sufficient for LZ77.

SA

a a ba ba b a b b a a ba b b a a ba b b a b a b b a a bbb a a bb a b a b b a a bb a b b a a bb a b b a b a b b a a bb b a a bb b a b a b b a a b

SA

1011572

12946183

4

2

1

PSV[4] =

NSV[4] =

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 52: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient computationof all pi values.

Goal

Space efficient simulationof all pi values.

1 Consider text position i,e.g., let i = 4

2 Locate in SA closestsmaller elements

Lemma [Crochemore, Ilie]

Either PSV[i] or NSV[i] is avalid choice for pi

Not quite what we wanted...

...but sufficient for LZ77.

SA

a a ba ba b a b b a a ba b b a a ba b b a b a b b a a bbb a a bb a b a b b a a bb a b b a a bb a b b a b a b b a a bb b a a bb b a b a b b a a b

SA

1011572

12946183

4

2

1

PSV[4] =

NSV[4] =

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 53: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient computationof all pi values.

Goal

Space efficient simulationof all pi values.

1 Consider text position i,e.g., let i = 4

2 Locate in SA closestsmaller elements

Lemma [Crochemore, Ilie]

Either PSV[i] or NSV[i] is avalid choice for pi

Not quite what we wanted...

...but sufficient for LZ77.

SA

a a ba ba b a b b a a ba b b a a ba b b a b a b b a a bbb a a bb a b a b b a a bb a b b a a bb a b b a b a b b a a bb b a a bb b a b a b b a a b

SA

1011572

12946183

4

2

1

PSV[4] =

NSV[4] =

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 54: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient computationof all pi values.

Goal

Space efficient simulationof all pi values.

1 Consider text position i,e.g., let i = 4

2 Locate in SA closestsmaller elements

Lemma [Crochemore, Ilie]

Either PSV[i] or NSV[i] is avalid choice for pi

Not quite what we wanted...

...but sufficient for LZ77.

SA

a a ba ba b a b b a a ba b b a a ba b b a b a b b a a bbb a a bb a b a b b a a bb a b b a a bb a b b a b a b b a a bb b a a bb b a b a b b a a b

SA

1011572

12946183

4

2

1

PSV[4] =

NSV[4] =

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 55: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient computationof all pi values.

Goal

Space efficient simulationof all pi values.

1 Consider text position i,e.g., let i = 4

2 Locate in SA closestsmaller elements

Lemma [Crochemore, Ilie]

Either PSV[i] or NSV[i] is avalid choice for pi

Not quite what we wanted...

...but sufficient for LZ77.

SA

a a ba ba b a b b a a ba b b a a ba b b a b a b b a a bbb a a bb a b a b b a a bb a b b a a bb a b b a b a b b a a bb b a a bb b a b a b b a a b

SA

1011572

12946183

4

2

1

PSV[4] =

NSV[4] =

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 56: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient computationof all pi values.

Goal

Space efficient simulationof all pi values.

1 Consider text position i,e.g., let i = 4

2 Locate in SA closestsmaller elements

Lemma [Crochemore, Ilie]

Either PSV[i] or NSV[i] is avalid choice for pi

Not quite what we wanted...

...but sufficient for LZ77.

SA

a a ba ba b a b b a a ba b b a a ba b b a b a b b a a bbb a a bb a b a b b a a bb a b b a a bb a b b a b a b b a a bb b a a bb b a b a b b a a b

SA

1011572

12946183

4

2

1

PSV[4] =

NSV[4] =

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 57: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient computationof all pi values.

Goal

Space efficient simulationof all pi values.

1 Consider text position i,e.g., let i = 4

2 Locate in SA closestsmaller elements

Lemma [Crochemore, Ilie]

Either PSV[i] or NSV[i] is avalid choice for pi

Not quite what we wanted...

...but sufficient for LZ77.

SA

a a ba ba b a b b a a ba b b a a ba b b a b a b b a a bbb a a bb a b a b b a a bb a b b a a bb a b b a b a b b a a bb b a a bb b a b a b b a a b

SA

1011572

12946183

4

2

1

PSV[4] =

NSV[4] =

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 58: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `iP

SV

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥122347

107

⊥1⊥12345⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 59: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥122347

107

⊥1⊥12345⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 60: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 61: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥122347

107

⊥1⊥12345⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 62: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 63: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 64: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

=

= = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $

$

$ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 65: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

=

=

= 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $

$ $

$

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 66: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

==

=

6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $

$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 67: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== =

6=

= = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12

$ $

$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 68: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6=

=

= 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12$

$

$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 69: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6==

=

6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 70: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== =

6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 71: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)

LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 72: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 73: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 74: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

2n log n algorithm

Goal

Space efficient simulationof all pi values.

Fact

Given PSV and NSVarrays, LZ77 parsing canbe computed in lineartime.

LPFpi `i

PS

V

NS

V

i b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

b a b b a b a b b b a b

b a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a bb a b b a b a b b b a b

== = 6== = 6=

1

2

3

4

5

6

7

8

9

10

11

12

⊥⊥1121233456

⊥⊥12⊥12

2

347

107

⊥1⊥1234

5

⊥3454

1 2 3 4 5 6 7 8 9 10 11 12$ $$ $ $

LZ77:

LZ77: (b,0),(a,0),(1,1),(1,3)LZ77: (b,0),(a,0),(1,1),(1,3),(2,3)

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 75: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Computing NSV/PSV arrays

PSV/NSV can be computed from SA in linear time

no extra space required

computation of PSV:

1 6 3 6 8 5

5

8 7

7

9

9 6

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 76: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Computing NSV/PSV arrays

PSV/NSV can be computed from SA in linear time

no extra space required

computation of PSV:

1 6 3 6 8 5

5

8 7

7

9

9 6

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 77: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Computing NSV/PSV arrays

PSV/NSV can be computed from SA in linear time

no extra space required

computation of PSV:

1 6 3 6 8 5

5

8 7

7

9

9 6

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 78: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Computing NSV/PSV arrays

PSV/NSV can be computed from SA in linear time

no extra space required

computation of PSV:

1 6 3 6 8 5

5

8 7

7

9

9 6

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 79: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Computing NSV/PSV arrays

PSV/NSV can be computed from SA in linear time

no extra space required

computation of PSV:

1 6 3 6 8 5

5

8 7

7

9

9

6

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 80: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Computing NSV/PSV arrays

PSV/NSV can be computed from SA in linear time

no extra space required

computation of PSV:

1 6 3 6 8 5

5

8 7

7

99 6

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 81: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Computing NSV/PSV arrays

PSV/NSV can be computed from SA in linear time

no extra space required

computation of PSV:

1 6 3 6 8 5

5

8 77 9

9

6

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 82: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Computing NSV/PSV arrays

PSV/NSV can be computed from SA in linear time

no extra space required

computation of PSV:

1 6 3 6 8 55 8 7

7

9

9

6

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 83: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Computing NSV/PSV arrays

PSV/NSV can be computed from SA in linear time

no extra space required

computation of PSV:

1 6 3 6 8 5

5

8 7

7

9

9

6

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 84: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

LZ77 with NSV/PSV arrays

Algorithm:1 Compute SA

2 Compute PSV/NSV (in place)3 Compute the factorization

O(n) time

, 3n log n bits of extra space

Observation

We only need to access each PSV/NSV value once, in aleft-to-right scan.

Lemma

A scan of NSV and PSV can be simulated with only one of them.It takes linear time and requires no extra space.

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 85: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

LZ77 with NSV/PSV arrays

Algorithm:1 Compute SA2 Compute PSV/NSV (in place)

3 Compute the factorization

O(n) time

, 3n log n bits of extra space

Observation

We only need to access each PSV/NSV value once, in aleft-to-right scan.

Lemma

A scan of NSV and PSV can be simulated with only one of them.It takes linear time and requires no extra space.

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 86: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

LZ77 with NSV/PSV arrays

Algorithm:1 Compute SA2 Compute PSV/NSV (in place)3 Compute the factorization

O(n) time

, 3n log n bits of extra space

Observation

We only need to access each PSV/NSV value once, in aleft-to-right scan.

Lemma

A scan of NSV and PSV can be simulated with only one of them.It takes linear time and requires no extra space.

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 87: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

LZ77 with NSV/PSV arrays

Algorithm:1 Compute SA2 Compute PSV/NSV (in place)3 Compute the factorization

O(n) time

, 3n log n bits of extra space

Observation

We only need to access each PSV/NSV value once, in aleft-to-right scan.

Lemma

A scan of NSV and PSV can be simulated with only one of them.It takes linear time and requires no extra space.

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 88: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

LZ77 with NSV/PSV arrays

Algorithm:1 Compute SA2 Compute PSV/NSV (in place)3 Compute the factorization

O(n) time, 3n log n bits of extra space

Observation

We only need to access each PSV/NSV value once, in aleft-to-right scan.

Lemma

A scan of NSV and PSV can be simulated with only one of them.It takes linear time and requires no extra space.

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 89: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

LZ77 with NSV/PSV arrays

Algorithm:1 Compute SA2 Compute PSV/NSV (in place)3 Compute the factorization

O(n) time, 3n log n bits of extra space

Observation

We only need to access each PSV/NSV value once, in aleft-to-right scan.

Lemma

A scan of NSV and PSV can be simulated with only one of them.It takes linear time and requires no extra space.

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 90: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

LZ77 with NSV/PSV arrays

Algorithm:1 Compute SA2 Compute PSV/NSV (in place)3 Compute the factorization

O(n) time, 3n log n bits of extra space

Observation

We only need to access each PSV/NSV value once, in aleft-to-right scan.

Lemma

A scan of NSV and PSV can be simulated with only one of them.It takes linear time and requires no extra space.

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 91: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 92: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 01

2 3 4 5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 93: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 01

2 3 4 5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 94: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1

2

3 4 5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 95: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1

2

3 4 5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 96: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2

3

4 5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 97: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2

3

4 5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 98: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3

4

5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 99: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3

4

5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 100: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4

5

681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 101: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4

5

681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 102: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 5

6

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 103: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 5

6

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 104: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 5681 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 105: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 56

8

1 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 106: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 56

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 107: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 56

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 108: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 56

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 109: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] =

3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 56

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 110: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] = 3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 56

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 111: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] = 3

NSV[8] =

4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 56

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 112: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] = 3

NSV[8] = 4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 56

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 113: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] = 3

NSV[8] = 4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 56

81 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 114: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] = 3

NSV[8] = 4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 568

1 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 115: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] = 3

NSV[8] = 4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 568

1 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 116: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] = 3

NSV[8] = 4

Updating the links:

O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 568

1 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 117: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Simulating the scan of NSV/PSV

Goal: at step i know the value of PSV[i] and NSV[i].

PSV[8] = 3

NSV[8] = 4

Updating the links: O(1) time

0 13 1 9 2 3 11 8 10 4 12 6 15 7 14 5 16 0

1 2 3 4 568

1 2 3 4 6 7 5

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 118: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

The end

Thank you!

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 119: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Experiments

Dataset from Pizza & Chilli corpus

Measured the time to compute LZ factorization

◦ we exclude the time to compute SA

Alg. Mem pro eng dna src cor cer ker ein tm29

K3 13n 74.5 75.7 81.7 50.5 43.6 63.2 45.7 56.9 38.2K2 9n 84.1 80.6 92.7 54.8 40.2 53.2 41.5 43.5 35.1

ISA6r 6n - - - - 43.3 51.8 39.2 31.1 34.2ISA6s 6n 198.0 171.0 175.2 115.0 49.4 56.3 45.7 37.1 39.6ISA9 9n 92.7 83.9 86.1 59.3 41.9 53.0 42.8 45.2 36.4

iBGS 17n 99.8 93.2 97.5 69.3 51.5 65.5 52.9 60.0 44.1iBGL 17n 123.2 108.6 113.4 77.8 52.2 66.1 53.0 58.6 44.2iBGT 13n 171.4 153.9 188.0 99.8 55.4 84.1 56.2 52.8 44.4

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 120: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Experiments

Dataset from Pizza & Chilli corpus

Measured the time to compute LZ factorization

◦ we exclude the time to compute SA

Alg. Mem pro eng dna src cor cer ker ein tm29

K3 13n 74.5 75.7 81.7 50.5 43.6 63.2 45.7 56.9 38.2K2 9n 84.1 80.6 92.7 54.8 40.2 53.2 41.5 43.5 35.1

ISA6r 6n - - - - 43.3 51.8 39.2 31.1 34.2ISA6s 6n 198.0 171.0 175.2 115.0 49.4 56.3 45.7 37.1 39.6ISA9 9n 92.7 83.9 86.1 59.3 41.9 53.0 42.8 45.2 36.4

iBGS 17n 99.8 93.2 97.5 69.3 51.5 65.5 52.9 60.0 44.1iBGL 17n 123.2 108.6 113.4 77.8 52.2 66.1 53.0 58.6 44.2iBGT 13n 171.4 153.9 188.0 99.8 55.4 84.1 56.2 52.8 44.4

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 121: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Experiments

Dataset from Pizza & Chilli corpus

Measured the time to compute LZ factorization

◦ we exclude the time to compute SA

Alg. Mem pro eng dna src cor cer ker ein tm29

K3 13n 74.5 75.7 81.7 50.5 43.6 63.2 45.7 56.9 38.2K2 9n 84.1 80.6 92.7 54.8 40.2 53.2 41.5 43.5 35.1

ISA6r 6n - - - - 43.3 51.8 39.2 31.1 34.2ISA6s 6n 198.0 171.0 175.2 115.0 49.4 56.3 45.7 37.1 39.6ISA9 9n 92.7 83.9 86.1 59.3 41.9 53.0 42.8 45.2 36.4

iBGS 17n 99.8 93.2 97.5 69.3 51.5 65.5 52.9 60.0 44.1iBGL 17n 123.2 108.6 113.4 77.8 52.2 66.1 53.0 58.6 44.2iBGT 13n 171.4 153.9 188.0 99.8 55.4 84.1 56.2 52.8 44.4

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Page 122: Linear Time Lempel-Ziv Factorization: Simple, Fast, Smallstelo/cpm/cpm13/16_kempa.pdf · Linear Time Lempel-Ziv Factorization: Simple, Fast, Small Juha K arkk ainen Dominik Kempa

IntroductionExisting solutions

2n log n algorithm

Experiments

Dataset from Pizza & Chilli corpus

Measured the time to compute LZ factorization

◦ we exclude the time to compute SA

Alg. Mem pro eng dna src cor cer ker ein tm29

K3 13n 74.5 75.7 81.7 50.5 43.6 63.2 45.7 56.9 38.2K2 9n 84.1 80.6 92.7 54.8 40.2 53.2 41.5 43.5 35.1

ISA6r 6n - - - - 43.3 51.8 39.2 31.1 34.2ISA6s 6n 198.0 171.0 175.2 115.0 49.4 56.3 45.7 37.1 39.6ISA9 9n 92.7 83.9 86.1 59.3 41.9 53.0 42.8 45.2 36.4

iBGS 17n 99.8 93.2 97.5 69.3 51.5 65.5 52.9 60.0 44.1iBGL 17n 123.2 108.6 113.4 77.8 52.2 66.1 53.0 58.6 44.2iBGT 13n 171.4 153.9 188.0 99.8 55.4 84.1 56.2 52.8 44.4

Juha Karkkainen, Dominik Kempa, Simon J. Puglisi Linear Time Lempel-Ziv Factorization: Simple, Fast, Small