Spatial Indexing and Visualizing Large Multi-dimensional Databases I. Csabai, M. Trencséni, L....

Spatial Indexing and Visualizing Large Multi-dimensional Databases

I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger

Eötvös University, Budapest

T.Budavári, A. SzalayJohns Hopkins University, Baltimore

URGENT!We have lot of data, and still collecting … STOPThe data is complex … STOPWe want to do complex stuff with it … STOPWe want to interactively visualize it … STOPFiles are not good enough for us … STOPCurrent DBMS are not designed for us … STOPPlease help ! … SOS!

FROM: Natural ScientistsTO: DB Community

Telegraph Message

Doing Science with Elephants

E = mc2

The data

5 years of Sloan Digital Sky Survey data

Public archive: SkyServer (SQL Server, A. Szalay, J. Gray)

Large: 3TB, 270M objects Multi-dimensional: 300 parameters/object

• Index only for key values (1D) and sky coordinates (2D)

Spatial … Upcoming surveys (Pan-Starrs, 1.4 Gpixel

camera) will produce same data in 1 week

120 Mpixel camera

u g r i z

270 million points in 5+ dimensions

The magnitude space

- Multidimensional point data- highly non-uniform distribution - outliers

The questions astronomers askpetroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and (petroMag_r > 0 and g > 0 and r > 0 and i > 0) and ( (petroMag_r-extinction_r) < 19.2 and (petroMag_r - extinction_r < (13.1 + (7/3) * (dered_g - dered_r) + 4 * (dered_r - dered_i) - 4 * 0.18) ) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) < 0.2) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) > -0.2) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r - extinction_r < 19.5) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) > (0.45 - 4 * (dered_g - dered_r)) ) and ( (dered_g - dered_r) > (1.35 + 0.25 * (dered_r - dered_i)) ) ) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 *

petroR50_r * petroR50_r) ) < 23.3 ) )

petroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and (petroMag_r > 0 and g > 0 and r > 0 and i > 0) and ( (petroMag_r-extinction_r) < 19.2 and (petroMag_r - extinction_r < (13.1 + (7/3) * (dered_g - dered_r) + 4 * (dered_r - dered_i) - 4 * 0.18) ) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) < 0.2) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) > -0.2) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r - extinction_r < 19.5) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) > (0.45 - 4 * (dered_g - dered_r)) ) and ( (dered_g - dered_r) > (1.35 + 0.25 * (dered_r - dered_i)) ) ) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 *

petroR50_r * petroR50_r) ) < 23.3 ) ) Skyserver log; a query from the 12 million

Star/galaxy separation Quasar target selection

Combination of inequalitiesCombination of inequalities

Multi-dimensional polyhedron query

Drop outliers, search for rare objectsDrop outliers, search for rare objects

Point density estimationPoint density estimation

Find similar galaxiesFind similar galaxies

K-nearest neighbor searchK-nearest neighbor search

The goalTRADITIONAL APPROACHFlat files, Fortran, C code+ Complex manipulation of data- Sequential slow access

TRADITIONAL APPROACHFlat files, Fortran, C code+ Complex manipulation of data- Sequential slow access

SQL DATABASESOracle, MS SQL Server, PostgreSQL …+ Organized, efficient data access- Hard to implement complex algorithms- Multi-dimensional support (OLAP) is limited to categorical data

MULTI-DIMENSIONAL INDEXINGB-tree, R-tree, K-d tree, BSP-tree …+ Many for low D, some for higher D+ Fast, tuned for various problems- Implemented mostly as memory algorithms, maybe suboptimal in databases

VISUALIZATIONTools using OpenGL, DirectX+ Fast- Using files, some tools access database, but not interactive

INTEGRATE •use for astronomical data-mining•and for fast interactive visualization

Implemented indexing techniques MS SQL Server 2005, .NET, C#

• CLR support – run complex procedural code inside the RDBMS

Quad-tree (32-tree)• Build (SQL 1h)• Range search, k nearest neighbor, visualization support (SQL)• Large query time variation in 5D with non-uniform data

Balanced k-d tree• Build: T-SQL (12h)• Range search, k nearest neighbor (C#)• Local polynomial regression (C#)

Voronoi tessellation• Limited number of random seeds

(build: 10000 points 1h, insertion: 270M points 12h)

• Density estimation, NN-search• C# wrapper for Qhull

Usage: Geometric queries

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

ratio of rows returned

kd-tree

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

ratio of rows returned

kd-tree

First run the query against the index

Select cells those are

• fully covered• fully outside • intersected

Run detailed SQL only on intersected cells

Usage: Non-parametric estimationTemplate fitting

Nearest neighbor + polynomial fit

foreach (Galaxy g in UnknownSet){ neighbors = NearestNeighbors(g, ReferenceSet) polynomCoeffs = FitPolynomial(neighbors.Colors, neighbors.Redshift) g.Redshift = Estimate(g.colors, polynomCoeffs)}

• For 1M galaxies (reference set) SDSS can measure redshift for the rest 269M (unknown set) not• Kd-tree based nearest neighbor search• Polynomial regression implemented in C# runs as CLR code in SQL Server

Usage: Search for similar spectra

PCA: • AMD optimized LAPACK routines called from SQL Server• Dimension reduced from 3000 to 5• Kd-tree based nearest neighbor search

Matching with simulated spectra, where all the physical parametersare known would estimate age, chemical composition, etc. of galaxies.

Adaptive Visualizer Using managed DirectX Visualize more data than fits into

memory Towards graphical SQL: mouse

actions are converted to queries and passed to SQL Server

• LOD, zoom in and out 270M points• Voronoi, kd-tree visualization• Brush select, click-connect to

SkyServer• Select nearest neighbors• Multi-resolution density maps• Multidim : quickly change axes • Interact with other Virtual Observatory

SDSS Database

Magnitude table

Kd-tree index Voronoi index

Stored procedures

Visualization application

Internet

Plugin

Visualizer Demo

The Tools

MS SQL Server 2005 OODB vs. RDBMS SDSS SkyServer using SQL ServerSQL Server 2005 CLR support – run complex

procedural code inside the DB

- No support for vector data

C# + native SQL VS.2005, rapid prototyping Managed DirectX Web Services support for Virtual Observatories

Why is magnitude space interesting?

LIGHT Spectrum1M objects

BROADBAND FILTERS

MAGNITUDE SPACE270M objects

REDSHIFT

PHYSICAL PARAMETRSage, dust, chemical comp.

GALAXYelliptic, spiral

3000 DIMENSIONALPOINT DATA 5 DIMENSIONAL

POINT DATA

3-10 DIMENSION 3-10 DIMENSION PCA

Similar to SkyServer HTM indexing

… but in 5 dimensions

Spatial indexing

Quad-trees

32-tree in 5D No need to store the

structure Number of nodes goes

exponentially Breaks down in high

dimensions or if data is highly non-uniformly distributed

32-tree in 5D No need to store the

structure Number of nodes goes

exponentially Breaks down in high

dimensions or if data is highly non-uniformly distributed

K-d trees

• Only one cut in each level• Store bounding boxes

Voronoi tessellation

• each point of the cell is closer to the seed than to any other• the solution space for NN• more spherical cells, 50 neighbors, 1000 vertices• density estimation, clustering• complex code, computationintensive in higher dimensions

Complex code in SQL/CLR

Spectrum Services• Composite, continuum and line fit, convolving

filters and spectra, dereddening

Non-parametric estimation Find k-nearest neighbors Polynomial fit (AMD optimized LAPACK)

• DR5: photometric redshift

• Garching DR4: ‘photometric’ Dn(4000), HδA, age, mass

Spatial Indexing and Visualizing Large Multi-dimensional Databases I. Csabai, M. Trencséni, L....

Documents

csabai et al migr1 hk83 2003 - CORE

› shop_ordered › 37879 › pic › ...Kerítésrendszerek Termékek áttekintése HU Betafence Hungary Zrt. Csabai út 7. H-1173 Budapest Hungary Tel: +36 1 253 0678 Fax: +36 1

Self-generated Self-similar Traffic Péter Hága Péter Pollner Gábor Simon István Csabai Gábor Vattay

BÉKÉS-CSABAI ÁGOST. HITV. EVANGÉLIKUS REAL GYMMSIÜMRDL · nevek ragozása és a melléknevek fokozása, szótanulás. Földrajzból a földnek általános topograpíiai ismerteté

ISI Publication Listcsabai.web.elte.hu/http/PubList.pdf · ISI Publication List I. Csabai September 26, 2012 SUMMARY: Number of ISI publications: 158 Number of ISI citations: 27335

A XIV. Csabai Kolbászfesztivál programja

Eötvös University Budapest in the Network. Seniors: István Csabai (node coordinator): »Photometric redshift estimation, virtual observatories, science

Csabai Csirkefogók Vízilabda Klub (4) 3:57 Góllövés ...mvlsz.webpont.com/store_pdf/MatchReturnData_107_11950.pdf · 011559 KELLE Gergő X 1 1 010791 NAGY Dávid X 2 1 010796

Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary

Gyöngyi Kerekes Eötvös Lóránd University, Budapest MAGPOP 2008, Paris István Csabai László Dobos Márton Trencséni

ADAT-INTENZÍV MEGKÖZELÍTÉS A MODERN …tisk.mafihe.hu/wp-content/uploads/2014/12/Csabai_TISK.pdf · PUBLIKUS ADATBÁZIS (VO) PZ Kunszt, AS Szalay, I Csabai, AR Thakar; ADASS IX

21. Csabai Kolbászfesztivál BORÁSZ KIAJÁNLÓ 2017...A csabai kolbász 2013-ban hungarikum lett, 2014-ben pedig a Csabai Kol bászfesztivált a Békés Megyei Értéktár Bizottság

Test Lélek Család Közösségmptpszichiatria.hu/.../MPT2012_Elektronikus_prog_fuzet.pdfBagdy Emõke Bánki M. Csaba Bitter István Bugán Antal Csabai Márta Demerovics Zsolt Faludi

A TARTALOMBÓL - turkevemik...Kromperger Zsolt, Balogh Viktória, Kiss Lili, Herczegh Lajos, Skultéti Mariann, Gaál Henrietta, Füleki Edit, Kiss Kálmán mesteredző, Fórizs Sándor

Csabai Mátyás, 16. századi irodalmunk egyik elfeledett ... · 7 udomány és társadalom szerint ismerte őket és merített is belőlük.12 Bodola Gyula szerint (aki egyedüliként

ATZ/MTZ worldwide az internetenjret/AJJ/AJJ_200912.pdf · 2009. 11. 18. · 4 2009/1–2. " KÍW K¸SN #WF 5BSUBMPN Tartalomjegyzék 3 Köszöntő – Trencséni Balázs 5 Intelligens

ANJOUK - lira8 . Anjouk II. utakat szerte Magyarhonban, hogy hírül vigyék, a hűt-len Trencséni Csák Máté után maradt utolsó vár is kirá - lyi kézre került, a Felvidék

Csabai Mérleg - BEHIR

Development of a New Efficient and Accurate Available Bandwidth Estimation Method Péter Hága Attila Pásztor István Csabai Darryl Veitch Viktória Hunyadi

Ching-Wa Yip Johns Hopkins University. Alex Szalay (JHU) Rosemary Wyse (JHU) László Dobos (ELTE) Tamás Budavári (JHU) Istvan Csabai (ELTE)