26
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion Approximate Tree Kernels Konrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert uller Presented By Niharjyoti Sarangi Indian Institute of Technology Madras April 21, 2012

Approximate Tree Kernels

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

Approximate Tree KernelsKonrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert

Muller

Presented ByNiharjyoti Sarangi

Indian Institute of Technology Madras

April 21, 2012

Page 2: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

OUTLINE OF THE PRESENTATION

1 BACKGROUND

Learning from tree-structured dataApplication Domains

2 PARSE TREE KERNELS

Computing PTKComputational constraints

3 APPROXIMATE TREE KERNELS

Computing ATKValidity of ATKTypes of learning

4 RESULTS

PerformanceTimeMemory

5 Conclusion

Page 3: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

TREE-STRUCTURED DATA

Trees: carry hierarchical informationFlat feature Vectors: Fail to capture the underlyingdependency structure

Parse TreeAn ordered, rooted tree that represents the syntactic structureof a string according to some formal grammar.

A tree X is called a parse tree of G = (S,P, s) if X is derived byassembling productions p ∈ P such that every node x ∈ X islabeled with a symbol l(x) ∈ S.

Page 4: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

EXAMPLES

Figure: Parse trees for natural language text and the HTTP networkprotocol.

Page 5: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

LEARNING FROM TREES

Kernel functions for Structured dataConvolution of local kernelsParse tree kernel proposed by Collins and Duffy(2002)

Kernel Functionsk : X × X→ R is a symmetric and positive semi-definitefunction, which implicitly computes an inner product in areproducing kernel Hilbert space

Page 6: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

APPLICATION DOMAINS

Natural Language ProcessingWeb Spam DetectionNetwork Intrusion DetectionInformation Retreival from structureddocuments...

Page 7: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

COMPUTING PTK

A generic technique for defining kernel functions overstructured data is the convolution of local kernels defined oversub-structures.

Parse Tree kernelk(X,Z) =

∑x∈X

∑z∈Z c(x, z) , Where, X and Z are two parse trees.

Notationsxi: i-th child of a node x|X|: Number of nodes in Xχ: Set of all possible trees

Page 8: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

ILLUSTRATION

Figure: Shared subtrees in two parse trees.

Page 9: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

COUNTING FUNCTION

c(x, z) is known as the counting function which recursivelydetermines the number of shared subtrees rooted in the treenodes x and z.

Defining c(x,z)

c(x, z) =

0 if x,z not derived from same Pλ if x,z are leaf nodesλ∏|x|

i=1 c(xi, zi) otherwise

0 ≤ λ ≤ 1 , balances the contribution of subtrees, such thatsmall values of decay the contribution of lower nodes in largesubtrees

Page 10: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

COMPUTATIONAL COMPLEXITY

The complexity is©(n2), where n is the number of nodesin each parse tree.

Experimental data

The computation of a parse tree kernel for two HTML documentscomprising 10,000 nodes each, requires about 1 gigabyte of memoryand takes over 100 seconds on a recent computer system.

We need to compare a large number of parse trees. Goingby the above statistics, the use of PTKs are rendered to beof no practical significance because of the computingresources required.

Page 11: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

ATTEMPTED IMPROVEMENTS

A feature selection procedure based on statistical tests.Suzuki et.al.Limiting computation to node pairs with matching grammarsymbols. Moschitti

Page 12: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

COMPUTING ATK

Approximation of tree kernels is based on the observation thattrees often contain redundant parts that are not only irrelevantfor the learning task but also slow-down the kernelcomputation unnecessarily.

Approximate Tree kernel

k(X,Z)=∑

s∈S w(s)∑

x∈Xl(x)=s

∑z∈Z

l(z)=sc(x, z) , Where, X and Z are two parse trees.

Selection function: w : S→ 0, 1Controls whether subtrees rooted in nodes with the symbols ∈ S contribute to the convolution. (w(s) = 0 or w(s) = 1)

Page 13: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

APPROXIMATE COUNTING FUNCTION

c(x, z) is the approximate counting function

Defining c(x, z)

c(x, z) =

0 if x,z not derived from same P0 if x or z not selectedλ if x,z are leaf nodesλ∏|x|

i=1c(xi, zi) otherwise

The selection function w(s) is decided based on the domain anddata. the exact parse tree kernel is obtained as a special case ofATK if w(s) = 1 for all symbols s ∈ S.

Page 14: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

ATK IS A VALID KERNEL

ProofLet Φ(X) be the vector of frequencies of all subtrees occurringin X. Then, by definition, Kwcan always be written as

Kw = 〈PwΦ(X),PwΦ(Z)〉,For any w, the projection Pw is independent of the actual X andZ, and hence Kw is a valid kernel.

Page 15: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

ATK IS FASTER THAN PTK

Speed up factor qw

qw =

∑s∈S#s(X)#s(Z)∑

s∈Sws#s(X)#s(Z)

Where #s(X) denotes the occurances of nodes x ∈ X that were selected.

Looking at the above equation, we can argue that even if onlyone symbol is rejected in Approximate Tree Kernel, we get aspeedup qw ≥ 1.

Page 16: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

SUPERVISED SETTING

Given n labeled parse trees (X1, y1), · · · , (Xn, yn), where yi arethe class labels.An ideal kernel gram matrix Y is given as follows:

Yij = [|yi = yj|]− [|yi 6= yj|]

Kernel Target alignment

〈Y, Kw〉F =∑yi=yj

Kij−∑yi 6=yj

Kij

Our target now is to maximize the above term w.r.t w.

Page 17: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

SUPERVISED SETTING (CONTD.)

Optimization Problem

w? = argmaxw∈[0,1]|S|

n∑i,j=1i6=j

∑s∈S

w(S)∑x∈Xi

l(x)=s

∑z∈Zi

l(z)=s

c(x, z)

subject to, ∑s∈S

w(s) ≤ N, N ∈ N

Page 18: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

UNSUPERVISED SETTING

Average Frequency of Node comparison

f (s) =1n2

n∑i,j=1

#s(Xi)#s(Xj)

ComparisonRatio(ρ) =ExpectedNodeComparisons

ActualNumberOfComparisonsinPTK

Page 19: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

UNSUPERVISED SETTING (CONTD.)

Optimization Problem

w? = argmaxw∈[0,1]|S|

n∑i,j=1i6=j

∑s∈S

w(S)∑x∈Xi

l(x)=s

∑z∈Zi

l(z)=s

c(x, z)

subject to, ∑s∈S w(s)f (s)∑

s∈S f (s)≤ ρ

Page 20: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

SYNTHETIC DATA

Figure: Classification performancefor the supervised synthetic data.

Figure: Detection performance forthe unsupervised synthetic data.

Page 21: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

REAL DATA

Figure: Classification performancefor question classification task.

Figure: Detection performance forthe intrusion detection task (FTP).

Page 22: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

TIME

Figure: Training and testing time of SVMs using the exact and theapproximate tree kernel.

Page 23: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

TIME(COND.)

Figure: Run-times for web spam (WS) and intrusion detection (ID).

Page 24: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

MEMORY

Figure: Memory requirements for web spam (WS) and intrusiondetection (ID).

Page 25: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

CONCLUSION

Approximate Parse tree Kernels give us a fast and efficientway to work with parse trees.Improvements in terms of run-time and memoryrequirements. For large trees, the approximation reduces asingle kernel computation from 1 gigabyte to less than 800kilobytes, accompanied by run-time improvements up tothree orders of magnitude.Best results were obtained for Network IntrusionDetection.

Page 26: Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

QUESTIONS

Any Questions ???