72
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Geometric Algorithms and Data Structures Prof. Neeraj Suri Andreas Johansson Constantin Sarbu Abdelmajid Khelil

Geometric Algorithms and Data Structures

Embed Size (px)

DESCRIPTION

Geometric Algorithms and Data Structures. Prof. Neeraj Suri Andreas Johansson Constantin Sarbu Abdelmajid Khelil. Outline. Introduction Geometric Data Structures Quadtree Region quadtree Point quadtree K-d tree Strip tree K-d trie Binary trie Multidimensional Data Z-Order - PowerPoint PPT Presentation

Citation preview

Page 1: Geometric Algorithms and Data Structures

© Neeraj SuriEU-NSF ICT March 2006

Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

Geometric Algorithms and Data Structures

Prof. Neeraj SuriAndreas JohanssonConstantin SarbuAbdelmajid Khelil

Page 2: Geometric Algorithms and Data Structures

ICS-II - 2006 2Lecture 14: Geometric Algorithms and Data Structures

Outline

Introduction Geometric Data Structures

Quadtree□Region quadtree□Point quadtree

K-d tree Strip tree K-d trie Binary trie

Multidimensional Data Z-Order Multidimensional data Data mining

Page 3: Geometric Algorithms and Data Structures

ICS-II - 2006 3Lecture 14: Geometric Algorithms and Data Structures

Geometric Problems (1)

Algorithmic geometry: Study of the algorithmic complexity of elementary geometric problems

Geometric problems: Are often abstract formulations of practical problems (similar to graph theory)

Some geometric problems and their interpretation: Given a set of points in the plane. Find all the points within

a rectangle□„Clipping“ in VR□Find tuples in a database with values within given bounds for

attributes A1 and A2□Generalization for searching in a k-dimensional field (all points

contained in a k-dimensional field)

Page 4: Geometric Algorithms and Data Structures

ICS-II - 2006 4Lecture 14: Geometric Algorithms and Data Structures

Geometric Problems (2)

Given a set of rectangles in the plane. Find all pairwise intersecting rectangles

□Correctness test at designing Very Large Scale Integration (VLSI), chip layers as rectangles

Given a set of 3-dimensional objects (compounds). Find pair wise intersecting objects

□Ensuring the rule distance resp. the safety margin in CAD Given a set of rectangles in the plane. Find the slice plane.

□Geographic Information Systems (GIS), approximation of generic forms through rectangles, determining areas with specific properties on distinct maps (e.g. find regions which are sandy (map 1), wet (map 2), and between 200 and 300 m altitude (elevation map))

Page 5: Geometric Algorithms and Data Structures

ICS-II - 2006 5Lecture 14: Geometric Algorithms and Data Structures

Geometric Problems (3)

Given a set of polyhedrons in space. Determine the edges or portion of edges that are visible or hidden from a viewpoint.

□Computation of a realistic view of a 3-dimensional scene□Determining the coverage area of a transmitter, the area with

no reception Given a set of points in a k-dimensional space and a query-

point P. Find the point S closest to P.□Voice recognition: A spoken word is characterized by features

and compared with the vocabulary (point set in a k-dimensional space).

Page 6: Geometric Algorithms and Data Structures

ICS-II - 2006 6Lecture 14: Geometric Algorithms and Data Structures

Classification of Geometric Problems

2 classes of problems: Set problems: Compute the property of a set of objects S

you’re interested in.□E.g. the outline of the area covered by S

Search problems: Given a set of objects S and a query-object q. Find all objects in S that have a specific relation with q.

Set problems are often reducible to search problems E.g. Plane-Sweep algorithms reduce a k-dimensional set

problem to a (k-1)-dimensional search problem Search problems are solved by organizing S with the aid

of appropriate data structures and indexing

Page 7: Geometric Algorithms and Data Structures

ICS-II - 2006 7Lecture 14: Geometric Algorithms and Data Structures

First Problem

How do we efficiently represent this figure?

Page 8: Geometric Algorithms and Data Structures

ICS-II - 2006 8Lecture 14: Geometric Algorithms and Data Structures

Representing Figures (1)

How about a matrix representation?

Black = 1, empty = 0

0 0 1 1

0 0 1 1

0 0 1 1

0 0 0 1

1 0 0 0

1 1 0 0

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

Not very effective

Page 9: Geometric Algorithms and Data Structures

ICS-II - 2006 9Lecture 14: Geometric Algorithms and Data Structures

Representing Figures (2)

Idea: represent areas, not points

Now represent the areas using another structure

Quadtrees do this

1 1

1 1

1 1

1

1

1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1

1 1

1 1

1

1

1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

Page 10: Geometric Algorithms and Data Structures

ICS-II - 2006 10Lecture 14: Geometric Algorithms and Data Structures

Overview of Quadtrees

Quadtree is a generic term Quadtree: A class of hierarchical data structures that are

based on recursive decomposition of space Differentiation is possible based on:

Data type represented by the Quadtree : Point data, regions, curves, surfaces, and volumes

Principle of decomposition: regular vs. input-driven Resolution: Fixed vs. variable number of decomposition steps

Examples: Region quadtree Point quadtree

Literature: Samet, H.; “The Quadtree and Related Hierarchical Data Structures”, ACM

Comp. Surveys, Vol. 16, No. 2, June 1984 (available from ACM DL)

Page 11: Geometric Algorithms and Data Structures

ICS-II - 2006 11Lecture 14: Geometric Algorithms and Data Structures

Region Quadtree

Successive subdivision of the image array into 4 equal-sized quadrants.

Basic idea: Figure as an image array, i.e. every pixel of the figure has a

value of 1, all other pixels have a value of 0 The entire area (image array) is subdivided into 4 equal-

sized quadrants (usually 2k dimensional) Upon each division one has to check if the image array of a

quadrant is homogeneous (i.e. only 1s or only 0s)□homogeneous no further subdivision□heterogeneous further subdivisions until homogeneous

(possibly single pixels)

Page 12: Geometric Algorithms and Data Structures

ICS-II - 2006 12Lecture 14: Geometric Algorithms and Data Structures

Region Quadtree: Terminology

NW NE

SW SEE

N

W

S

Page 13: Geometric Algorithms and Data Structures

ICS-II - 2006 13Lecture 14: Geometric Algorithms and Data Structures

Region Quadtree: Terminology

NWNE SW

SE

GREY

BLACK

WHITE

0 1

1 0

0 1

0

Leaf nodes are said to be either BLACK or WHITE Non-leaf nodes are said to be GREY

Page 14: Geometric Algorithms and Data Structures

ICS-II - 2006 14Lecture 14: Geometric Algorithms and Data Structures

Region Quadtree: Example

Step1

1 1

1 1

1 1

1

1

1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1

1 1

1 1

1

1

1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

Page 15: Geometric Algorithms and Data Structures

ICS-II - 2006 15Lecture 14: Geometric Algorithms and Data Structures

Region Quadtree : Example

1 1

1 1

1 1

1

1

1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

Step2

Page 16: Geometric Algorithms and Data Structures

ICS-II - 2006 16Lecture 14: Geometric Algorithms and Data Structures

Region Quadtree : Example

1 1

1 1

1 1

1

1

1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

Step3

Page 17: Geometric Algorithms and Data Structures

ICS-II - 2006 17Lecture 14: Geometric Algorithms and Data Structures

Region Quadtree: Set Operations

Quadtrees are especially useful for performing set operations Overlap (intersection) Overlays (union)

Example: From data provided on forests, grassland, fields, nature

reserve and polder, identify which areas are in agricultural use (typical overlay problem)

Page 18: Geometric Algorithms and Data Structures

ICS-II - 2006 18Lecture 14: Geometric Algorithms and Data Structures

Overlays with Quadtrees: Example

Page 19: Geometric Algorithms and Data Structures

ICS-II - 2006 19Lecture 14: Geometric Algorithms and Data Structures

Overlays with Quadtrees: Algorithm (1)

Traverse top-down quadtree QT1 beginning with root and compare with the corresponding node in quadtree QT2

if the node in QT1 is BLACK, then the corresponding node in the resulting quadtree is also BLACK

if the node in QT1 is WHITE, then the node in the resulting quadtree is set to the node in QT2

if the node in QT1 is GREY, then set the node in the resulting quadtree to

GREY if QT2 is GREY GREY if QT2 is WHITE BLACK if QT2 is BLACK

if both nodes are gray, the algorithm returns after processing the next level to consolidate if necessary.

Page 20: Geometric Algorithms and Data Structures

ICS-II - 2006 20Lecture 14: Geometric Algorithms and Data Structures

Overlays with Quadtrees: Algorithm (2)

BLACK x BLACK

WHITE x x

GREY GREY GREY1)

1) A check for a merger need to be performed to determine if all 4 sons are BLACK.

Decision Table:

Example:

Page 21: Geometric Algorithms and Data Structures

ICS-II - 2006 21Lecture 14: Geometric Algorithms and Data Structures

Intersection with Quadtrees (Example)

Page 22: Geometric Algorithms and Data Structures

ICS-II - 2006 22Lecture 14: Geometric Algorithms and Data Structures

Intersection with Quadtrees: Algorithm (1)

Traverse top-down quadtree QT1 beginning with root and compare with the corresponding node in quadtree QT2

if the node in QT1 is BLACK and the node in QT2 is BLACK,then set the corresponding node in the resulting QT to BLACK

if the node in QT1 or QT2 is WHITE, then the resulting node is WHITE

if the node in QT1 is GREY, then set the node to

GREY if QT2 is also GREYWHITE if QT2 is WHITEGREY if QT2 is BLACK

if both nodes are grey, the algorithm returns after processing the next level to consolidate if necessary.

Page 23: Geometric Algorithms and Data Structures

ICS-II - 2006 23Lecture 14: Geometric Algorithms and Data Structures

Intersection with Quadtrees: Algorithm (2)

WHITE x WHITE

BLACK x x

GREY GREY GREY1)

1) A check for a merger need to be performed to determine if all 4 sons are WHITE.

Decision Table:

Example:

Page 24: Geometric Algorithms and Data Structures

ICS-II - 2006 24Lecture 14: Geometric Algorithms and Data Structures

Complexity Analysis

Complexity is proportional to the number of nodes in the quadtree best case: whole area unicolored (1 node) worst case: “Salt and Pepper”, i.e. all inner nodes are grey,

need to go down to pixel level (depends on the resolution)

Page 25: Geometric Algorithms and Data Structures

ICS-II - 2006 25Lecture 14: Geometric Algorithms and Data Structures

Point-Quadtree: Definition

Point data 2-D points can be stored and indexed in a point-

quadtree A point-quadtree splits the space into 4 quadrants at the

insertion point The insertion order is thus important (it determines the

structure of the tree)

Page 26: Geometric Algorithms and Data Structures

ICS-II - 2006 26Lecture 14: Geometric Algorithms and Data Structures

Point-Quadtree (Example)

(100,100)

(0,0) (100,0)

(0,100)

(35,40)Chicago

(5,45)Denver

(25,35)Omaha

(50,10)Mobile (90,5)

Miami

(85,15)Atlanta

(80,65)Buffalo

(60,75)Toronto

Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami

Page 27: Geometric Algorithms and Data Structures

ICS-II - 2006 27Lecture 14: Geometric Algorithms and Data Structures

Point-Quadtree (Example)

Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami

Chicago

Mobile

Buffalo Atlanta Miami

(100,100)

(0,0) (100,0)

(0,100)

(35,40)Chicago

(5,45)Denver

(25,35)Omaha

(50,10)Mobile (90,5)

Miami

(85,15)Atlanta

(80,65)Buffalo

(60,75)Toronto

Denver Toronto Omaha

Page 28: Geometric Algorithms and Data Structures

ICS-II - 2006 28Lecture 14: Geometric Algorithms and Data Structures

„find all points (records) within a given distance from another point (record)”

Point-Quadtree (Search Example)

Find all the cities, at most 8 units from the point (83,10)

Chicago

Mobile

Buffalo Atlanta Miami

(100,100)

(0,0) (100,0)

(0,100)

(35,40)Chicago

(5,45)Denver

(25,35)Omaha

(50,10)Mobile

(90,5)Miami

(85,15)Atlanta

(80,65)Buffalo

(60,75)Toronto

Denver Toronto Omaha

Page 29: Geometric Algorithms and Data Structures

ICS-II - 2006 29Lecture 14: Geometric Algorithms and Data Structures

Point-Quadtree (Search Example)

The root is (35,40) NW, NE, SW can be ignored

Next is Mobile (50,10) NW and SW can be ignored

Are Atlanta or Miami within 8? Solutions based on

approximations with rectangles (bounding box), can contain negative reports

Exact solution with a circle

Find all the cities, at most 8 units from the point (83,10)

(100,100)

(0,0) (100,0)

(0,100)

(35,40)Chicago

(5,45)Denver

(25,35)Omaha

(50,10)Mobile

(90,5) Miami

(85,15)Atlanta

(80,65)Buffalo

(60,75)Toronto

Page 30: Geometric Algorithms and Data Structures

ICS-II - 2006 30Lecture 14: Geometric Algorithms and Data Structures

Search in Point-Quadtrees

Especially suitable for search problems of the following type: “find all points (records) within a given distance from another point (record)”

Point Quadtrees are quite efficient for 2 dimensions. In k > 2 dimensions however, Point Quadtrees have a large branching factor and thus contain many NULL-pointers

Chicago

Mobile

Buffalo Atlanta Miami

Denver Toronto Omaha

Page 31: Geometric Algorithms and Data Structures

ICS-II - 2006 31Lecture 14: Geometric Algorithms and Data Structures

K-d Trees

k-dimensional point data We want to avoid the large fan-out of point quadtree

Quadtrees (22=4-way split) Octrees (23=8-way split) In general: 2k-way split

A k-d tree is a binary search tree with the distinction that at each level, a different coordinate (dimension) is tested to determine the direction of the branch 2-way split Node consists of

□2 child pointers□Name□Key

Page 32: Geometric Algorithms and Data Structures

ICS-II - 2006 32Lecture 14: Geometric Algorithms and Data Structures

K-d Tree: Basic Idea

Construct a binary Tree At each step, choose one of the coordinates as a basis of

dividing the rest of the points For example, at the root, choose x as the basis

□Like binary search trees, all items to the left of root will have the x-coordinate less than that of the root

□All items to the right of the root will have the x-coordinate greater than (or equal to) that of the root

Choose y as the basis for discrimination for the root’s children

Choose x again for the root’s grandchildren

Page 33: Geometric Algorithms and Data Structures

ICS-II - 2006 33Lecture 14: Geometric Algorithms and Data Structures

K-d Tree: Example

Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami

(100,100)

(0,0) (100,0)

(0,100)

(35,40)Chicago

(5,45)Denver

(25,35)Omaha

(50,10)Mobile (90,5)

Miami

(85,15)Atlanta

(80,65)Buffalo

(60,75)Toronto

Fewer NULL pointers!

Denver

MiamiOmaha

K-d tree Alternation of discriminator

xToronto

yBuffalo

xAtlanta

xChicagox≥xchicago

x<xchicago

yMobile

y≥ymobiley<ymobile

Page 34: Geometric Algorithms and Data Structures

ICS-II - 2006 34Lecture 14: Geometric Algorithms and Data Structures

Adaptive k-d Tree

Like k-d tree, but Division is between (not on) data points. Division not by alternating the discriminator, but according

to the dimension with the maximum spread (max-min). Balanced k-d Tree Internal nodes contain only split coordinates and their

value (e.g. X=30) The records are stored at the terminal nodes (leaves) Insertion of one record requires rebuilding the tree (

Static structure ) Deletion of one record is highly complex Search is like k-d tree

Page 35: Geometric Algorithms and Data Structures

ICS-II - 2006 35Lecture 14: Geometric Algorithms and Data Structures

Exampleadaptive k-d tree

(k=2)

(100,100)

(0,0) (100,0)

(0,100)

(35,40)Chicago

(5,45)Denver

(25,35)Omaha

(50,10)Mobile (90,5)

Miami

(85,15)Atlanta

(80,65)Buffalo

(60,75)Toronto

55,x

30,x 40,y

15,x 25,y 10,y 70,x

Chicago(35,45)

Mobile(50,10)

Toronto(60,75)

Buffalo(80,65)

Denver(5,45)

Omaha(25,35)

Atlanta(85,15)

Miami(90,5)

Page 36: Geometric Algorithms and Data Structures

ICS-II - 2006 36Lecture 14: Geometric Algorithms and Data Structures

Comparison

Region Quadtree parallelizable

Point Quadtree: parallelizable, dynamic

K-d Tree: Not easily parallelizable, dynamic, better sequential data

structure Adaptive k-d Tree:

Not easily parallelizable, static, balanced, optimized search

Page 37: Geometric Algorithms and Data Structures

ICS-II - 2006 37Lecture 14: Geometric Algorithms and Data Structures

Curvilinear Data: Strip Tree (Example)

QP

B C

D E

Selected as splitting point for A, since Wl > Wr

Strip Tree:Splitting point for C

Wl

Wr

• Strips become successively thinner

• The splitting finishes when all strips are thinner than a predefined value

A

Root strip

Basic idea: Represent the curve by strips enclosing portions of it

Page 38: Geometric Algorithms and Data Structures

ICS-II - 2006 38Lecture 14: Geometric Algorithms and Data Structures

Strip Tree: Algorithm

Recursive Splitting Join the endpoints of the curve (i.e. P and Q) The root corresponds to a rectangle enclosing the curve

and whose sides are parallel to line PQ The next split point

□Lies on the curve and on one side of the strip rectangle□Has maximum distance to line PQ

Node Structure The node is an 8-tuple and contains

□2 pairs of X,Y coordinates (the diagonal endpoints)□The strip width on each side of the line connecting the

endpoints□Pointers to the 2 sons

Page 39: Geometric Algorithms and Data Structures

ICS-II - 2006 39Lecture 14: Geometric Algorithms and Data Structures

Representation of Arbitrary Curves

Curves are well represented by chains, however indexing them is difficult

A strip-tree is a quadtree variant for representing arbitrary curves by hierarchical decomposition

Useful in applications that involve search and set operations

Page 40: Geometric Algorithms and Data Structures

ICS-II - 2006 40Lecture 14: Geometric Algorithms and Data Structures

Trees and Tries

We have seen (normal) trees for storing figures

We can also use Tries! Tries store the key “along the way”

Page 41: Geometric Algorithms and Data Structures

ICS-II - 2006 41Lecture 14: Geometric Algorithms and Data Structures

Kd-Tries: Example

L R

UD

L R

L R

D UDU

L R

UD UD UD

L: leftR: right

D: DownU: Up

X dim

Y dim

• Key stored along the path from the root, Ex: “RDRU”• The complete keys are located at the leaves

RDRU

Page 42: Geometric Algorithms and Data Structures

ICS-II - 2006 42Lecture 14: Geometric Algorithms and Data Structures

Binary Tries

0 1

10

0 1

0 1

0 101

0 1

10 10 10

A binary trie is a binary tree, whereby left sons correspond to a “0” at the corresponding position in the key, and right sons correspond to a “1”

100101

Page 43: Geometric Algorithms and Data Structures

ICS-II - 2006 43Lecture 14: Geometric Algorithms and Data Structures

Geometric Interpretation of the Binary Trie

A trie compresses a 1-dimensional space with 2d addresses through coding to a string with d characters In previous example: d=3+3=6

The root represents the complete space Left son (first character = 0) represents the lower half of

the search space Right son (first character = 1) represents the upper half of

the search space.

Page 44: Geometric Algorithms and Data Structures

ICS-II - 2006 44Lecture 14: Geometric Algorithms and Data Structures

Binary Tries, Revisited

0 1

10

0 1

0 1

0 101

0 1

10 10 10

X0X1X2

000 001 010 011 100 101 110 111

Y0Y1Y2

000

001

010

011

100

101

110

111

100101

Binary x coordinate of the

cell

Binary y coordinate of

the cell

In 2D each key is a pair of bit sequences (x,y)

The path to the key is composed of bits that are taken from the x and y coordinates on a rotating basis

Page 45: Geometric Algorithms and Data Structures

ICS-II - 2006 45Lecture 14: Geometric Algorithms and Data Structures

Observations

Kd-trie splits by rotating x and y coordinates A kd-trie is unique for a given set of keys

Trie structure does not depend on the insertion order Geometric kd-tries generate a total order of the search

space Two points P1 and P2 in the kd-Space will always have the

same order

Page 46: Geometric Algorithms and Data Structures

ICS-II - 2006 46Lecture 14: Geometric Algorithms and Data Structures

Building a Linear Order

Given a 2D grid how (1) to find a linear order for the cells of the grid such that

cells close together in space are also (as far as possible) close to each other in the linear order, and

(2) to define this order recursively for a grid that is obtained by a hierarchical subdivision of space.

The most popular solution is Bit interleaving (Z-Order)

Page 47: Geometric Algorithms and Data Structures

ICS-II - 2006 47Lecture 14: Geometric Algorithms and Data Structures

Z-Order

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 1 1 1

1 1 1 1

1 1

1 1

1 1

1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

0 0 0 0

0 0 0 0

0 00 0

0 0 0 0

Y0Y1Y2

Start with a vertical split for X0 (Z=X0)

000

001

010

011

100

101

110

111

X0X1X2

000 001 010 011 100 101 110 111

• Addresses in a 2-dimensional space are identified by pairs (x,y) of values• Each x and y value is a sequence of d bits• This results in a grid with 2d x 2d cells• How to build the addresses using bit interleaving?

Page 48: Geometric Algorithms and Data Structures

ICS-II - 2006 48Lecture 14: Geometric Algorithms and Data Structures

Z-Order

00 00 00 00

00 00 00 00

00 00 00 00

00 00 00 00

10 10 10 10

10 10 10 10

10 10

10 10

10 10

10 10

11 11 11 11

11 11 11 11

11 11 11 11

11 11 11 11

01 01 01 01

01 01 01 01

01 0101 01

01 01 01 01

Horizontal split for Y0 (Z=X0Y0)

X0X1X2

Y0Y1Y2

000 001 010 011 100 101 110 111

000

001

010

011

100

101

110

111

Page 49: Geometric Algorithms and Data Structures

ICS-II - 2006 49Lecture 14: Geometric Algorithms and Data Structures

Z-Order

000 000 001 001

000 000 001 001

000 000 001 001

000 000 001 001

100 100 101 101

100 100 101 101

100 100

100 100

101 101

101 101

110 110 111 111

110 110 111 111

110 110 111 111

110 110 111 111

010 010 011 011

010 010 011 011

011 011010 010

010 010 011 011

Vertical split for X1 (Z=X0Y0X1)

X0X1X2

Y0Y1Y2

000 001 010 011 100 101 110 111

000

001

010

011

100

101

110

111

Page 50: Geometric Algorithms and Data Structures

ICS-II - 2006 50Lecture 14: Geometric Algorithms and Data Structures

Z-Order

0000 0000 0010 0010

0000 0000 0010 0010

0001 0001 0011 0011

0001 0001 0011 0011

1000 1000 1010 1010

1000 1000 1010 1010

1001 1001

1001 1001

1011 1011

1011 1011

1100 1100 1110 1110

1100 1100 1110 1110

1101 1101 1111 1111

1101 1101 1111 1111

0100 0100 0110 0110

0100 0100 0110 0110

0111 01110101 0101

0101 0101 0111 0111

Horizontal split for Y1 (Z=X0Y0X1Y1)

X0X1X2

Y0Y1Y2

000 001 010 011 100 101 110 111

000

001

010

011

100

101

110

111

Page 51: Geometric Algorithms and Data Structures

ICS-II - 2006 51Lecture 14: Geometric Algorithms and Data Structures

Z-Order

00000 00001 00100 00101

00000 00001 00100 00101

00010 00011 00110 00111

00010 00011 00110 00111

10000 10001 10100 10101

10000 10001 10100 10101

10010 10011

10010 10011

10110 10111

10110 10111

11000 11001 11100 11101

11000 11001 11100 11101

11010 11011 11110 11111

11010 11011 11110 11111

01000 01001 01100 01101

01000 01001 01100 01101

01110 0111101010 01011

01010 01011 01110 01111

Vertical split for X2 (Z=X0Y0X1Y1X2)

X0X1X2

Y0Y1Y2

000 001 010 011 100 101 110 111

000

001

010

011

100

101

110

111

Page 52: Geometric Algorithms and Data Structures

ICS-II - 2006 52Lecture 14: Geometric Algorithms and Data Structures

Z-Order

000000 000010 001000 001010

000001 000011 001001 001011

000100 000110 001100 001110

000101 000111 001101 001111

100000 100010 101000 101010

100001 100011 101001 101011

100100 100110

100101 100111

101100 101110

101101 101111

110000 110010 111000 111010

110001 110011 111001 111011

110100 110110 111100 111110

110101 110111 111101 111111

010000 010010 011000 011010

010001 010011 011001 011011

011100 011110010100 010110

010101 010111 011101 011111

Horizontal split for Y2 (Z=X0Y0X1Y1X2Y2)

X0X1X2

Y0Y1Y2

000 001 010 011 100 101 110 111

000

001

010

011

100

101

110

111

Lowest z

z-low und z-hi are located in the left lower and right upper corner

highest z

Page 53: Geometric Algorithms and Data Structures

ICS-II - 2006 53Lecture 14: Geometric Algorithms and Data Structures

Z-Order

000000 000010 001000 001010

000001 000011 001001 001011

000100 000110 001100 001110

000101 000111 001101 001111

100000 100010 101000 101010

100001 100011 101001 101011

100100 100110

100101 100111

101100 101110

101101 101111

110000 110010 111000 111010

110001 110011 111001 111011

110100 110110 111100 111110

110101 110111 111101 111111

010000 010010 011000 011010

010001 010011 011001 011011

011100 011110010100 010110

010101 010111 011101 011111

X0X1X2

Y0Y1Y2

000 001 010 011 100 101 110 111

000

001

010

011

100

101

110

111

If each possible z-value represents a cell in the grid, this yields the following space filling curve:

Page 54: Geometric Algorithms and Data Structures

ICS-II - 2006 54Lecture 14: Geometric Algorithms and Data Structures

Example: Point Data

X0X1X2

Y0Y1Y2

000000 000010 001000 001010

000001 000011 001001 001011

000100 000110 001100 001110

000101 000111 001101 001111

100000 100010 101000 101010

100001 100011 101001 101011

100100 100110

100101 100111

101100 101110

101101 101111

110000 110010 111000 111010

110001 110011 111001 111011

110100 110110 111100 111110

110101 110111 111101 111111

010000 010010 011000 011010

010001 010011 011001 011011

011100 011110010100 010110

010101 010111 011101 011111

000 001 010 011 100 101 110 111

000

001

010

011

100

101

110

111

Data point: A = (3 , 5) = (011 ,

101)

Bit interleaving: z = 011011

This gives simple method for translating

between x,y coordinates and z-values

A

Page 55: Geometric Algorithms and Data Structures

ICS-II - 2006 55Lecture 14: Geometric Algorithms and Data Structures

Example: Region Data

X0X1X2

Y0Y1Y2

000000 000010 001000 001010

000001 000011 001001 001011

000100 000110 001100 001110

000101 000111 001101 001111

100000 100010 101000 101010

100001 100011 101001 101011

100100 100110

100101 100111

101100 101110

101101 101111

110000 110010 111000 111010

110001 110011 111001 111011

110100 110110 111100 111110

110101 110111 111101 111111

010000 010010 011000 011010

010001 010011 011001 011011

011100 011110010100 010110

010101 010111 011101 011111

000 001 010 011 100 101 110 111

000

001

010

011

100

101

110

111

001 10

0111

The object with a z-value of 001 contains all

elements with a prefix equal to 001

Page 56: Geometric Algorithms and Data Structures

ICS-II - 2006 56Lecture 14: Geometric Algorithms and Data Structures

Bit Interleaving: Recursive Definition

A vertical split differentiates values of X0

A horizontal split differentiates values of Y0

The address is given by the z-value (00,01,10,11) The z-value represents the path in the kd-trie We can use the z-values alone, s.t. we don’t need the

kd-trie anymore

01 11

00 10

Y0=1

Y0=0

X0=0 X0=1

L R

UDD U

00 01 10 11

1101

1100 1110

1111

Page 57: Geometric Algorithms and Data Structures

ICS-II - 2006 57Lecture 14: Geometric Algorithms and Data Structures

Explanation

Z-order encoding preserves the spatial proximity of points homogeneous regions are represented compactly the elements are clustered => efficient access to secondary

storage Z-order coded data can be stored into secondary storage

using conventional prefix B+ trees efficient “range queries” are possible direct access via z-value

Page 58: Geometric Algorithms and Data Structures

ICS-II - 2006 58Lecture 14: Geometric Algorithms and Data Structures

Geometric Data Structures fornon-Geometric Data?

Application of geometric data structures for geometric problems is obvious Geographic Information System (GIS) Computer graphic

A further application of geometric data structures: multidimensional databases OLAP (Online Analytical Processing) Data-mining

Page 59: Geometric Algorithms and Data Structures

ICS-II - 2006 59Lecture 14: Geometric Algorithms and Data Structures

Multidimensional Data Space

CokeFantaBeerMilkJuiceWater

1 2 3 4 5 6 7

WestEast

SouthNorthReg

ion

Pro

duct

Day

Each cell corresponds to an observation point, described by the attributes of individual cells. Each cell contains an observation, e.g. the sales value of Product “Coke” on Day “4” in Region “East”.

Page 60: Geometric Algorithms and Data Structures

ICS-II - 2006 60Lecture 14: Geometric Algorithms and Data Structures

Multidimensional (MD) Data Space

Each observed fact w can be expressed as a function of the dimensions, which define the multidimensional data space:

w = f(x,y,z)DOM(f) = DOM(x) x DOM(y) x DOM(z)

A fact w0 is the value of function f for the specific values (x0,y0,z0)

w0 = f(x0,y0,z0)

Page 61: Geometric Algorithms and Data Structures

ICS-II - 2006 61Lecture 14: Geometric Algorithms and Data Structures

Sparseness in the MD Space

Typically, only a small fragment of the space defined by DOM(a) x … x DOM(z) is actually used

Addressing in the MD space (a multi-dimensional array) is easy and fast

However inefficient memory usage Need to find mechanisms to compress the MD space

Linearization of the data space by totally ordering the facts with the aid of space filling curves

Extraction of all facts into a table, then join this table with descriptive dimension tables

Page 62: Geometric Algorithms and Data Structures

ICS-II - 2006 62Lecture 14: Geometric Algorithms and Data Structures

Linearization of the MD Space

Linearization with the aid of space filling curves (e.g. Z-Transforms or Hilbert construction)

The principle is based on a coding, that generates a total order of all points in the data space

The indexing is done by conventional, order preserving indexing methods (e.g. B+-Trees)

The mechanism is well suited for 2-4 dimensions (x,y,z,t) for tracking applications and range queries

Page 63: Geometric Algorithms and Data Structures

ICS-II - 2006 63Lecture 14: Geometric Algorithms and Data Structures

Data-Mining

Till now: Storage und search of data Evaluation and interpretation of results is done using

Data-Mining

Typical problem:“Where, in supermarket, should we put the beer that should be sold as early as possible (close date expiry, low sales volume ..)”

Page 64: Geometric Algorithms and Data Structures

ICS-II - 2006 64Lecture 14: Geometric Algorithms and Data Structures

Data-Mining

Overview of basic techniques for data-mining

VarianceDetection

Association ClusteringNumerical Prediction

Classification

Forecast, Prediction

Knowledge Discovery

Data Mining

Page 65: Geometric Algorithms and Data Structures

ICS-II - 2006 65Lecture 14: Geometric Algorithms and Data Structures

Prediction: Classification

Data entries are classified according to a certain property

Purchased Lending Lending to sortyear Total last year out1994 1578 5 yes2000 3410 203 No1982 2558 310 yes... ... ... ...

New data entry is automatically assigned

Purchased Lending Lending to sortyear total last year out1988 589 39 ?

Page 66: Geometric Algorithms and Data Structures

ICS-II - 2006 66Lecture 14: Geometric Algorithms and Data Structures

Prediction: Numerical Prediction

Numerical prediction is similar to classification, however, a value is predicted instead of a class.

Most important application: Weather forecast

Yesterday Today TomorrowTemp. Pressure Temp. Pressure Temp.17,0 990 19,2 1001 20,510,8 1011 12,1 973 8,230,5 1000 30,4 994 29,9... ... ... ... ...

14,2 980 17,0 991 ?

Page 67: Geometric Algorithms and Data Structures

ICS-II - 2006 67Lecture 14: Geometric Algorithms and Data Structures

Knowledge Discovery: Association

Tries to find common rules between the characteristics of data. Interesting relations are returned.

Example: From the previous weather data one could derive the following rules:

With a probability of 0.89: IF "Air pressure today" > "Air pressure yesterday"AND "Temperature today" > 12°

THEN "Temperature tomorrow" > "Temperature today"

With a probability of 0.75:IF "Air pressure today" < "Air pressure yesterday"AND "Temperature today" > 15°

THEN "Temperature tomorrow" < "Temperature today"

Page 68: Geometric Algorithms and Data Structures

ICS-II - 2006 68Lecture 14: Geometric Algorithms and Data Structures

Knowledge Discovery: Variance Detection

Given a data pool, variance detection tries to distinguish normal data entries from “Outlier” entries

Example:A home security system has 100 Sensors (temperature, light barrier, sound detector, ....) should detect intruders. Hereby, flying birds, shade in the moonlight or car headlight should not have any impact on the operation of the system.

The system gets a database describing “safe" configurations (where no alarm has to be triggered). The system creates a Model of the non-alarm-cases. Data for real intrusions are not provided!

Using this model, updates from sensors can be checked: If they do not fit in the non-alarm-cases, an alarm is triggered.

Page 69: Geometric Algorithms and Data Structures

ICS-II - 2006 69Lecture 14: Geometric Algorithms and Data Structures

Knowledge Discovery: Clustering

Find similar data entries and group them into clusters

Example: Exam, the percentage that exercises E1 .. E5 were correctly answered?

Student E1 E2 E3 E4 E5 S1 20 84 11 17 74S2 62 41 57 81 19S3 79 33 60 68 30S4 19 93 25 23 87S5 28 89 0 26 79Ø 41,6 68 30,6 43 57,8

Clustering may divide the students taking the exam into 2 groups: G1 = {S1, S4, S5} : good at exercises E2 und E5, G2 = {S2, S3} : good at exercises E1, E3 und E4.

Possibility of individual support!

Page 70: Geometric Algorithms and Data Structures

ICS-II - 2006 70Lecture 14: Geometric Algorithms and Data Structures

k-means Clustering: Example

F

HIJ

K

L

G

M

NOP

Q

Data1

Data2

Page 71: Geometric Algorithms and Data Structures

ICS-II - 2006 71Lecture 14: Geometric Algorithms and Data Structures

k-means Clustering: Algorithm

1. Fix the number of desired clusters Parameter k.2. Place K random points into the space initial group

centroids.3. For all m data objects

determine the Euclidian distance of the object (as vector) from all centroids und assign the object to the closest centroid.

4. For all k centroidsdetermine the real center of the assigned cluster

(average). These are the new centroids.5. Repeat steps 3 and 4 , until the centroids no longer move

(Old and new ones are so close to each other, so that no real improvement is more remarkable).

Page 72: Geometric Algorithms and Data Structures

ICS-II - 2006 72Lecture 14: Geometric Algorithms and Data Structures

k-means Algorithm: Properties

Finds a local optimum, but does not necessarily find the most optimal configuration (global optimum) Is a Heuristic

Significantly sensitive to the initial randomly selected cluster centers Optimizations Randomly modify the results between different rounds The k-means algorithm can be run multiple times

Operates with linear optimization Highly stable and frequently used approach Operates also for very large data sets with a controllable

complexity

Ian H. Witten, Eibe Frank “Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” Academic Press, San Diego, CA; 2000; ISBN 1-55860-552-5