Approximate Tree Kernels

BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion

Approximate Tree KernelsKonrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert

Muller

Presented ByNiharjyoti Sarangi

Indian Institute of Technology Madras

April 21, 2012


OUTLINE OF THE PRESENTATION

1 BACKGROUND

Learning from tree-structured dataApplication Domains

2 PARSE TREE KERNELS

Computing PTKComputational constraints

3 APPROXIMATE TREE KERNELS

Computing ATKValidity of ATKTypes of learning

4 RESULTS

PerformanceTimeMemory

5 Conclusion


TREE-STRUCTURED DATA

Trees: carry hierarchical informationFlat feature Vectors: Fail to capture the underlyingdependency structure

Parse TreeAn ordered, rooted tree that represents the syntactic structureof a string according to some formal grammar.

A tree X is called a parse tree of G = (S,P, s) if X is derived byassembling productions p ∈ P such that every node x ∈ X islabeled with a symbol l(x) ∈ S.


EXAMPLES

Figure: Parse trees for natural language text and the HTTP networkprotocol.


LEARNING FROM TREES

Kernel functions for Structured dataConvolution of local kernelsParse tree kernel proposed by Collins and Duffy(2002)

Kernel Functionsk : X × X→ R is a symmetric and positive semi-definitefunction, which implicitly computes an inner product in areproducing kernel Hilbert space


APPLICATION DOMAINS

Natural Language ProcessingWeb Spam DetectionNetwork Intrusion DetectionInformation Retreival from structureddocuments...


COMPUTING PTK

A generic technique for defining kernel functions overstructured data is the convolution of local kernels defined oversub-structures.

Parse Tree kernelk(X,Z) =

∑x∈X

∑z∈Z c(x, z) , Where, X and Z are two parse trees.

Notationsxi: i-th child of a node x|X|: Number of nodes in Xχ: Set of all possible trees


ILLUSTRATION

Figure: Shared subtrees in two parse trees.


COUNTING FUNCTION

c(x, z) is known as the counting function which recursivelydetermines the number of shared subtrees rooted in the treenodes x and z.

Defining c(x,z)

c(x, z) =

0 if x,z not derived from same Pλ if x,z are leaf nodesλ∏|x|

i=1 c(xi, zi) otherwise

0 ≤ λ ≤ 1 , balances the contribution of subtrees, such thatsmall values of decay the contribution of lower nodes in largesubtrees


COMPUTATIONAL COMPLEXITY

The complexity is©(n2), where n is the number of nodesin each parse tree.

Experimental data

The computation of a parse tree kernel for two HTML documentscomprising 10,000 nodes each, requires about 1 gigabyte of memoryand takes over 100 seconds on a recent computer system.

We need to compare a large number of parse trees. Goingby the above statistics, the use of PTKs are rendered to beof no practical significance because of the computingresources required.


ATTEMPTED IMPROVEMENTS

A feature selection procedure based on statistical tests.Suzuki et.al.Limiting computation to node pairs with matching grammarsymbols. Moschitti


COMPUTING ATK

Approximation of tree kernels is based on the observation thattrees often contain redundant parts that are not only irrelevantfor the learning task but also slow-down the kernelcomputation unnecessarily.

Approximate Tree kernel

k(X,Z)=∑

s∈S w(s)∑

x∈Xl(x)=s

∑z∈Z

l(z)=sc(x, z) , Where, X and Z are two parse trees.

Selection function: w : S→ 0, 1Controls whether subtrees rooted in nodes with the symbols ∈ S contribute to the convolution. (w(s) = 0 or w(s) = 1)


APPROXIMATE COUNTING FUNCTION

c(x, z) is the approximate counting function

Defining c(x, z)

c(x, z) =

0 if x,z not derived from same P0 if x or z not selectedλ if x,z are leaf nodesλ∏|x|

i=1c(xi, zi) otherwise

The selection function w(s) is decided based on the domain anddata. the exact parse tree kernel is obtained as a special case ofATK if w(s) = 1 for all symbols s ∈ S.


ATK IS A VALID KERNEL

ProofLet Φ(X) be the vector of frequencies of all subtrees occurringin X. Then, by definition, Kwcan always be written as

Kw = 〈PwΦ(X),PwΦ(Z)〉,For any w, the projection Pw is independent of the actual X andZ, and hence Kw is a valid kernel.


ATK IS FASTER THAN PTK

Speed up factor qw

qw =

∑s∈S#s(X)#s(Z)∑

s∈Sws#s(X)#s(Z)

Where #s(X) denotes the occurances of nodes x ∈ X that were selected.

Looking at the above equation, we can argue that even if onlyone symbol is rejected in Approximate Tree Kernel, we get aspeedup qw ≥ 1.


SUPERVISED SETTING

Given n labeled parse trees (X1, y1), · · · , (Xn, yn), where yi arethe class labels.An ideal kernel gram matrix Y is given as follows:

Yij = [|yi = yj|]− [|yi 6= yj|]

Kernel Target alignment

〈Y, Kw〉F =∑yi=yj

Kij−∑yi 6=yj

Kij

Our target now is to maximize the above term w.r.t w.


SUPERVISED SETTING (CONTD.)

Optimization Problem

w? = argmaxw∈[0,1]|S|

n∑i,j=1i6=j

∑s∈S

w(S)∑x∈Xi

l(x)=s

∑z∈Zi

l(z)=s

c(x, z)

subject to, ∑s∈S

w(s) ≤ N, N ∈ N


UNSUPERVISED SETTING

Average Frequency of Node comparison

f (s) =1n2

n∑i,j=1

#s(Xi)#s(Xj)

ComparisonRatio(ρ) =ExpectedNodeComparisons

ActualNumberOfComparisonsinPTK


UNSUPERVISED SETTING (CONTD.)

Optimization Problem

w? = argmaxw∈[0,1]|S|

n∑i,j=1i6=j

∑s∈S

w(S)∑x∈Xi

l(x)=s

∑z∈Zi

l(z)=s

c(x, z)

subject to, ∑s∈S w(s)f (s)∑

s∈S f (s)≤ ρ


SYNTHETIC DATA

Figure: Classification performancefor the supervised synthetic data.

Figure: Detection performance forthe unsupervised synthetic data.


REAL DATA

Figure: Classification performancefor question classification task.

Figure: Detection performance forthe intrusion detection task (FTP).


TIME

Figure: Training and testing time of SVMs using the exact and theapproximate tree kernel.


TIME(COND.)

Figure: Run-times for web spam (WS) and intrusion detection (ID).


MEMORY

Figure: Memory requirements for web spam (WS) and intrusiondetection (ID).


CONCLUSION

Approximate Parse tree Kernels give us a fast and efficientway to work with parse trees.Improvements in terms of run-time and memoryrequirements. For large trees, the approximation reduces asingle kernel computation from 1 gigabyte to less than 800kilobytes, accompanied by run-time improvements up tothree orders of magnitude.Best results were obtained for Network IntrusionDetection.


QUESTIONS

Any Questions ???