22
Joint Interface and Management Review • Tucson, Arizona • May 30 th – June 1 st , 2012 XLDB Asia 2012 • Beijing, China • June 22-23, 2012 1 XLDB and the Large Synoptic Survey Telescope Kian-Tat Lim - 林建LSST Data Management System Architect XLDB Asia 2012

XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

Joint Interface and Management Review • Tucson, Arizona • May 30th – June 1st, 2012XLDB Asia 2012 • Beijing, China • June 22-23, 2012 1

XLDB and the Large Synoptic Survey Telescope

Kian-Tat Lim - 林建达LSST Data Management System Architect

XLDB Asia 2012

Page 2: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

What is LSST?

2

Proposed telescope to be built in Chile

Page 3: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Large

3

3.2 gigapixel camera

8.4 meter diameter mirror

Page 4: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Synoptic Survey

Wide: entire visible sky

Fast: image every 15 seconds

Deep: faint and distant objects

4

Page 5: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Results

− Thousand-framemovie of the sky

5

Page 6: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Results

− Catalogs

6

Image Metadata

Moving ObjectsCatalog

Object Catalog

Source Catalog

Difference Image Source Catalog

ProvenanceStatistics

Summaries

Calibration Engineering and Facility Database

Lots of databases, but Object and Source (and ForcedSource) are most important and largest.

Page 7: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

How Big?

− Tens of billions of Objects• Hundreds of columns per Object

− Trillions of Sources• High signal-to-noise observations of Objects• Dozens of columns per Source

− Tens of trillions of ForcedSources• All observations of Objects• 7 columns

− Total space required at end of survey including all overheads, replication, and compression: 35 PB

7

Page 8: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Queries

− All about an object− All objects meeting criteria− All objects near objects meeting criteria− All objects with interesting time series− All pairs of objects with similar time series

8

Criteria may involve 1–30 attributes/columns, not entire row Selectivity on individual attributes may be low When interesting objects are identified, may need large fraction of the rowNear-neighbor queries involve self-join on multi-billion row table, but spatially localizedPairing time series may involve self-join on multi-trillion row table!

Page 9: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Usual Needs

ScalableFast

Fault-tolerantCost-effectiveOpen Source

9

Page 10: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

qserv

Prototype system

Demonstrates feasibility

Useful for large-scale Data Challenges

Will be turned into production system during construction

10

Don’t expect too much. Mostly the work of one person, Daniel Wang.

Page 11: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Supporting ad hoc Queries

− Random small queries• Indexing and sharding (also key/value)

− Narrow, full-table scans and aggregates• Vertical partitioning

− Diverse, simultaneous scans• Shared scans

− qserv may need to support all three

11

Page 12: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Architecture

− MPP RDBMS on shared-nothing commodity cluster, with incremental scaling, non-disruptive failure recovery

− Data clustered spatially and by time, partitioned with overlaps• Two-level partitioning

• 2nd level materialized on-the-fly

• Transparent to end-users

− Selective indices to speed up interactive queries, spatial searches, joins including time series analysis

− Shared scans− Custom software based on open source:

RDBMS (MySQL) + xrootd• SciSQL: MySQL UDFs for HTM-based spatial indexing

12

Apologies to Martin Kersten for independently choosing a name close to his SciQL.

Page 13: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Baseline Architecture

13

Page 14: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Prototype Implementation

14

Intercepting user queries

Worker dispatch, query fragmentation

generation, spatial indexing, query

recovery, optimizations, scheduling, aggregation

Communication, replication

Metadata, result cache

MySQL dispatch, shared scanning, optimizations,

scheduling

Single node RDBMS

RDBMS-agnostic

Page 15: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15

Large Scale Tests

− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed

2 billion objects, 55 billion sources, total ~32 TB

− Tested queries• Interactive (object retrieval,

object time series, spatially restricted filter)

• Scans (full sky filter, densities)• Joins (near neighbor,

sources not near objects)• Concurrency

Page 16: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15

Large Scale Tests

− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed

2 billion objects, 55 billion sources, total ~32 TB

− Tested queries• Interactive (object retrieval,

object time series, spatially restricted filter)

• Scans (full sky filter, densities)• Joins (near neighbor,

sources not near objects)• Concurrency

Object retrieval

~4-9s

Page 17: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15

Large Scale Tests

− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed

2 billion objects, 55 billion sources, total ~32 TB

− Tested queries• Interactive (object retrieval,

object time series, spatially restricted filter)

• Scans (full sky filter, densities)• Joins (near neighbor,

sources not near objects)• Concurrency

Full-sky density

~3-8m

Page 18: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15

Large Scale Tests

− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed

2 billion objects, 55 billion sources, total ~32 TB

− Tested queries• Interactive (object retrieval,

object time series, spatially restricted filter)

• Scans (full sky filter, densities)• Joins (near neighbor,

sources not near objects)• Concurrency

~10m – 5h

Page 19: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15

Concurrency Test

Large Scale Tests

− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed

2 billion objects, 55 billion sources, total ~32 TB

− Tested queries• Interactive (object retrieval,

object time series, spatially restricted filter)

• Scans (full sky filter, densities)• Joins (near neighbor,

sources not near objects)• Concurrency

Page 20: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Scalability Testing

− Constant data/node

16

Page 21: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Status

− Cleaning up for end-user testing− Then adding features:

• Shared scans• User tables• Fault tolerance• Updates• Query management

− Code available:git://git.lsstcorp.org/LSST/DMS/qserv.githttps://launchpad.net/scisql

17

Page 22: XLDB and the Large Synoptic Survey Telescopeidke.ruc.edu.cn/xldb/ Asia - LSST.pdfNear-neighbor queries involve self-join on multi-billion row table, but spatially localized ... −

XLDB Asia 2012 • Beijing, China • June 22-23, 2012

Thoughts on the Future

− qserv• Incorporate MonetDB back-end

− SciDB• What about the petabytes of raw image data?• Perhaps store in an array database• Cutouts, mosaics, image manipulation become queries• UDFs for detection, measurement• Evaluation before end of 2013

18