Efficient Equal Interval Neighborhood Ring (P-trees technology is patented by NDSU)

Preview:

Citation preview

Efficient Equal Interval Efficient Equal Interval Neighborhood RingNeighborhood Ring

(P-trees technology is patented by NDSU)(P-trees technology is patented by NDSU)

OUTLINEOUTLINE

• Review: HOBBit Metric• Equal Interval Neighborhood Ring

(EINring)– Prototype Problem– Definition of Range Mask– Propositions– Definition of EINring– Calculation of EINring Using P-trees

• Summary

Review: HOBBit Similarity MetricReview: HOBBit Similarity Metric

• Let X and Y be two values, the HOBBit similarity between X and Y is defined by

• where xi and yi are the bits of X and Y respectively, denotes XOR. In another word, it is the left most position at which X and Y differ.

}1|max{),( ii yxiYXm

Review: HOBBit Distance & RingReview: HOBBit Distance & Ring

• The HOBBit distance between two tuples X and Y is defined by

• HOBBit Ring: The HOBBit ring of radii, r1 and r2 , centered at c is defined as R(c, r1, r2) = {x X | r1 d(c,x) < r2}, where d(c,x) is HOBBit distance.

),(2),( YXmYXd

Diagram of HOBBit RingDiagram of HOBBit Ring

Diagram of HOBBit Ring

Example of HOBBit RingExample of HOBBit Ring

HOBBit Ring Binary Range Decimal

R(22,7,8) 00010110-00010111 22-23

R(22,6,8) 00010100-00010111 20-23

R(22,5,8) 00010000-00010111 16-23

R(22,4,8) 00010000-00011111 16-31

R(22,3,8) 00000000-00011111 0-31

R(22,2,8) 00000000-00111111 0-63

R(22,1,8) 00000000-00010111 0-127

R(22,0,8) 00000000-00010111 0-255

Summary of HOBBit MetricSummary of HOBBit Metric

• The HOBBit metric is based on the most significant matching bit positions starting from the left.

• HOBBit ring is a geometric ring

whose diameter increases exponentially.

• HOBBit ring is eccentric ring.

Equal Interval Equal Interval

Neighborhood Ring Neighborhood Ring

(EINring)(EINring)

Outline Outline

• Prototype Problem

• Definition of Range Mask

• Propositions and Theorem

• Definition of EINring

• Calculation of EINring using P-trees

Prototype ProblemPrototype Problem

• Problem: x > (4)10 > (100)2

• Conjecture: Px>(100)2 = P3(P2P1) 7 7 7 7 5 5 1 17 7 7 7 1 1 1 1 5 5 7 7 4 4 1 1 5 7 7 7 4 5 5 1 6 6 6 6 3 3 0 0 6 6 6 6 0 0 0 0 2 2 6 6 3 3 0 0 2 6 6 6 3 3 3 0

8x8 data set

Walk Through: Peano TreesWalk Through: Peano Trees

Result of Crude MethodResult of Crude Method

Result of ConjectureResult of Conjecture

Definition of Range MaskDefinition of Range Mask

• Range Mask The Range Mask is the P-tree mask that calculates any data point, x, that satisfies range inequality, i.e., x c1, x > c1, x c2, etc., where c1, c2 are integers.

• Example: Px>100 is a P-tree mask that calculates any data point greater than 100.

Proposition 1Proposition 1

• Let m be the number of binary bit of jth attribute of data point x, Pj,m, Pj,m-1, … Pj,0 be the basic P-trees of ith bit of jth attribute, and integers c=bmbm-1…b0, where bi is ith binary bit value of c. Let Pxjr be the Range Mask that satisfies inequality xj r, then

Pxjr = Pj,m … Pj,i opj,i… Pj,0,

s.t. 1) Opj,j is if bi=1, 2) Opj,i is if bi=0, 3) right binding within each attribute.

• Example: Pxj (70)10 = Pxj (01000110)2

= P7(P6(P5(P4( P3( P2P1P0))))

Proof Sketch Proof Sketch

Without loss of generality, assume data point x has one attribute. Let c= bm…bi…b0, where bi is ith bit value of c. Pxjc is the range mask that satisfy x c.

If bi=1, the ith bit of x should be set 1 when x and c have the same bit value from position mth to ith position, e.i., Pxjc =Pm…Pi…P0. (Partially done!)

If bi=0, there are two cases that satisfy x c, one is to set ith bit of x, xi=1, another is to set xi=0. Thus

Pxjc = (Pm… Pi)(Pm…Pi’Pi-1…P0). = < complement rule, X(XY)=XY >

Pxjc =(Pm…(Pi(Pi-1…P0)). Done!

Proposition 2Proposition 2• Let m be the number of binary bit of jth

attribute of data point x, P’j,m, P’j,m-1, … P’j,0 be the complement P-trees of ith bit of jth attribute, and integers c=bmbm-1…b0, where bi is ith binary bit value of c. Let Pxj c be the Range Mask that satisfies xj c, then

Pxj r = P’j,m … P’j,i opj,i… P’j,0

s.t. 1) Opj,i is if bi=0, 2) Opj,i is if bi=1, 3) right binding within each attribute • Example: Pxj (198)10 = Pxj (11000101)2

= P7’ (P6’ (P5’ P4’P3’(P2’ (P1’P0’)))

Proposition 3Proposition 3

• Let m be the number of binary bit of jth attribute of data point x, Pj,m, Pj,m-1, … Pj,0 be the basic P-trees of ith bit of jth attribute, and integers c=bmbm-1…b0, where bi is c’s ith binary bit value. Let Pxj>c be the Range Mask that satisfies inequality xj > c, then

Pxj>c = Pj,m … Pi,j opi,j… Pj,k,

s.t. 1) opi,j is if bi=1, 2) opi,j is if bi=0, 3) right binding within each attribute 4) bk=0, and bj=1 j<k . • Example: Pxj > (72)10 = Pxj > (01001000)2

• = P7 (P6 (P5 P4 P3))

Proposition 4Proposition 4• Let m be the number of binary bit of jth attribute

of data point x, P’j,m, P’j,m-1, … P’j,0 be the complement P-trees of ith bit of jth attribute, and integer c=bmbm-1…b0, where bi is c’s ith binary bit value. Let Pxj<c be the Range Mask that satisfies xj < c, then

Pxj<c = P’j,m … P’j,i opj,i… P’j,k,

s.t. 1) opi,j is if bi=0, 2) opi,j is if bi=1, 3) right binding within each attribute,

4) bk=0, and bj=1 j<k . • Example: Pxj < (72)10 = Pxj < (01001000)2

= P7’P6’ (P5’ P4’P3’)

More ExamplesMore Examples

• c=(70)10=(01001000)2

Px<c =P7’P6’ (P5’ P4’P3’)

• c=(72)10=(01001000)2

Px>c =P7 P6 (P5 P4 P3)

• c=(198)10=(11000101)2,

Px<=c = P7’ (P6’ (P5’ P4’P3’(P2’ (P1’P0’)))

• Let c=(198)10=(11000101)2,

Px=c =P7P6(P5 (P4 (P3 (P2(P1 P0)))))

Theorem – Range Mask Theorem – Range Mask Complement RuleComplement Rule

• Theorem Range Mask Complement Rule Let Pxj<c, Pxj<c, Pxj c and Pxj>c be the Range

Mask of jth attribute of any data point x, where c is integer, then

Pxjc = P’xj<c and Pxj c = P’xj>c

hold.

Definition of Neighborhood RingDefinition of Neighborhood Ring

• Neighborhood Ring: The Neighborhood ring of radii, r1 and r2 , centered at c is defined as R(c, r1, r2) = {x X | r1< abs(x-c) r2}, where abs(x-c) is absolute length between x and c.

Definition of Equal Interval Definition of Equal Interval Neighborhood Ring (EINring)Neighborhood Ring (EINring)

• The Equal Interval Neighborhood ring of radii, r1 and r2, centered at c is defined as R(c, r1, r2) = {x X | r1<abs(x-c) r2}, and abs(r2-r1)=2k, where k=1,2,…, abs(x-c) is absolute length between x and c, and is interval

Diagram of EINringDiagram of EINring

Diagram of EINring

Example of EINringExample of EINring

HOBBit Ring

Binary Range Decimal

R16(86,1) 01000110-01100110 70-102

R16 (86,2) 00110110-11110110 54-118

R16(86,3) 00100110-10000110 38-134

R16(86,4) 00001110-10010110 22-150

… … …

R16(86,10) 00000000-11100110 0-230

R16(86,11) 00000000-11110110 0-246

R16(86,12) 00000000-11111111 0-255

Neighbor Count within EINringNeighbor Count within EINring

• For any data point, x, let x = (x1,x2,…xm) , where x,j is x’s jth attribute column. Let r be vectors with m elements, we define the range mask Px>c+r as

Px>c-r = Px1>c-r1 Px2>c-r2 …. Pxj>c-rj

and define the range mask Pxc+r as

Pxc+r = Px1c+r1 Px2c+r2 …. Pxjc+rj

where c is a constant.

Neighbor Count within EINring:Neighbor Count within EINring:

The range mask for any data points x within the neighborhood ring, R(c, 0, r), are calculated by

Pc,r = Px>c-r Pxc+r

The neighbor count for x within the neighborhood ring, R(c, 0, r), are calculated by

NC (c,0,r) = RC(Pc,r) where RC is the root count of P-tree.

Neighbor Count within EINringNeighbor Count within EINring

The Neighbor Count NC(c, r1, r2) of c within EINring R(c, r1, r2) is calculated as

NC(c, r1,r2) =RC(Pc,r2)-RC(Pc,r1)

SummarySummary

• Equal Interval Neighborhood Ring (EINring)

is much finer than HOBBit ring.

• Calculation of EINring using P-trees is

efficient, comparable to that of HOBBit

ring.

Recommended