20
An Efficient Regular Expressions Compression Algorithm From A New Perspective Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo Publisher: INFOCOM, 2011 Presenter: Yuen-Shuo Li Date: 2013/02/27 1

An Efficient Regular Expressions Compression Algorithm From A New Perspective Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo Publisher:

Embed Size (px)

Citation preview

Page 1: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

1

An Efficient Regular Expressions Compression Algorithm From A New Perspective

Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo

Publisher: INFOCOM, 2011

Presenter: Yuen-Shuo Li

Date: 2013/02/27

Page 2: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

2

Background

Deep packet inspection (DPI) is widely recognized as a powerful and important technology used in network security and application-specific services. e.g. Firewalls, traffic monitoring, packet classifier.

Currently, regular expressions are replacing exact strings to describe patterns in most popular software tools, because their expressive power, simplicity and flexibility for expressing signatures.

Page 3: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

3

About DFA

Deterministic Finite Automata (DFA) and Nondeterministic Finite Automata (NFA) are two classical equivalent representation of regular expressions.

DFA is the preferred representation to perform deep packet inspection in high-speed network environments.

it triggers only one state transition (one corresponding memory access) for each input symbol processed

it is possible to compile multiple regular expressions into a composite DFA which can inspect the input in a single pass.

Page 4: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

4

About DFA

Unfortunately, the use of DFAs demand for a large memory space to store state transition tables for current sets of regular expressions.

Big

Page 5: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

5

Our goal

In this paper, we focus on reducing memory usage of composite DFAs by compressing transitions.

Page 6: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

CSCA

We introduce a new method, named Cluster-based Splitting Compression Algoriithm(CSCA)

Page 7: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

7

get a unique determinate trie-tree after traversing DFA by level, if we stipulate that we traverse the son states by the label character from small to large.

7

Page 8: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

8

Cluster

In the trie-tree, if state r has a transition to state s, we call r the father state of s. Conversely s is the son state of r. A states set is called a cluster if it is composed of all son states of a certain state.

Page 9: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

9

We divided all the transitions and store them into three different matrixes T1, T2, T3.

sparse matrix

Page 10: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

10

Combinative Row

In matrix M (M is T1 or T2), for row s, if there is a row r which M[r, c] = X or M[s, c] = X

M[r, c] = M[s, c],

we say that row r is a combinative row of row s.

State A B ^

0 X 3 4

1 2 X 4

Page 11: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

11

If rows r and s are combinative row, we process them according to the rules as follows:

for character c, if M[s, c] = X, reset M[s, c] = M[r, c]

if M[s, c] = M[r, c] or M[r, c] = X, keep M[s, c] unchanged

State A B ^

0 X 3 4

1 2 X 4

State A B ^

0 2 3 4

Page 12: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

12

find base

State A B ^

0 X 2 3

1 3 X 4

State A B ^ Base

0 X 0 1 2

1 0 X 1 3

combinative row

Page 13: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

13

add a new index array equal, and set equal[r] = s, meaning that row r now equals with row s, and we can get the value of row r from row s equally

delete row r in matrix M.

State A B ^ Base

0 X 0 1 2

1 0 X 1 3

State A B ^ Base equal

0 X 0 1 2 0

3 0

Page 14: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

14

The main idea of compressing matrixes T1 and T2 is: convert the matrix into an offset matrix, which can generate many combinative rows, and then merge them in order to reduce memory usage.

T3 is a sparse, in this paper we do not discuss how to compress it.

Page 15: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

15

The lookup function need to decide which matrix the next state is in first, so we add two bitmaps to distinguish T1 and T2.

We can quickly get the information by adding two bitmaps to distinguish three parts.

T1

T3

T2

Page 16: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

16

One advantage of our work is that it is orthogonal to many previous compression schemes.

Because our work focuses on utilizing the transition characteristic inside states and reducing memory usage by extracting the base value of each cluster, while previous schemes almost are based on the transition characteristic among states.

Page 17: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

17

EXPERIMENT RESULTS

Pattern sets

Page 18: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

18

• n : the row number of matrix T• n1 (n2) :the row number of matrix R1(R2) • R1(R2): offset matrixes• base1(2): a int-type array• equal1(2) : an array• r: ratio of effective elements in T3

𝑆𝐶𝑅=CSS(Compressed  Storage  Space)UCSS(Un−Compressed  Storage  Space) 

Page 19: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

19

We extract base value for DFA matrixes of regular expression groups to get the corresponding offset-matrixes, and then we compress DFA matrixes and offset-matrixes with previous compression schemes.

The result is shown in Table V, the value in which means the ratio of effect transitions after compressing DFAs.

Page 20: An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

20