42
A UWS service to cross-match (very) large catalogues Thomas Boch, Fran¸cois-XavierPineau, S´ ebastien Derri` ere and Brice Gassmann CDS, Observatoire Astronomique de Strasbourg IVOA Interop, Nara, 07 December 2010 Thomas Boch (CDS) UWS X-Match service 7/12/2010 1 / 17

A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

A UWS service to cross-match (very) large catalogues

Thomas Boch, Francois-Xavier Pineau, Sebastien Derriereand Brice Gassmann

CDS, Observatoire Astronomique de Strasbourg

IVOA Interop, Nara, 07 December 2010

Thomas Boch (CDS) UWS X-Match service 7/12/2010 1 / 17

Page 2: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Context

Context

CDS cross-match service (in development)

Based on UWS (job submission)

Catalogues :

Algorithms :

A B A B

Particularity : deal with (very) large catalogues

Thomas Boch (CDS) UWS X-Match service 7/12/2010 2 / 17

Page 3: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Context

Context

CDS cross-match service (in development)

Based on UWS (job submission)

Catalogues :

Algorithms :

A B A B

Particularity : deal with (very) large catalogues

Thomas Boch (CDS) UWS X-Match service 7/12/2010 2 / 17

Page 4: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Context

Context

CDS cross-match service (in development)

Based on UWS (job submission)

Catalogues :

Algorithms :

A B A B

Particularity : deal with (very) large catalogues

Thomas Boch (CDS) UWS X-Match service 7/12/2010 2 / 17

Page 5: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Context

Context

CDS cross-match service (in development)

Based on UWS (job submission)

Catalogues :

Algorithms :

A B A B

Particularity : deal with (very) large catalogues

Thomas Boch (CDS) UWS X-Match service 7/12/2010 2 / 17

Page 6: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Context

Dealing with (very) large catalogues

Example

2MASSI ∼ 470x106 sourcesI minimal data ∼ 15 GB

F identifier (integer 4 Bytes)F positions (double 8 B+8 B)F errors (float 4 B+4 B+4 B)

USNO-B1I ∼ 109 sourcesI minimal data ∼ 28 GB

F identifier (integer 4 B)F positions (double 8 B+8 B)F errors (float 4 B+4 B)

LSST projection at 5 years:I V>26, ∼ 3x109 unique sourcesI minimal data ∼ 96 GB

ProblemsData size

I do not fit into memory

Performance issuesI data loadingI looking for candidates

SolutionsScalability: Healpix partitioning

Efficiency:I special indexed binary fileI kd-tree (cone search queries)I multithreadingI parallel processing

Thomas Boch (CDS) UWS X-Match service 7/12/2010 3 / 17

Page 7: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Context

Dealing with (very) large catalogues

Example

2MASSI ∼ 470x106 sourcesI minimal data ∼ 15 GB

F identifier (integer 4 Bytes)F positions (double 8 B+8 B)F errors (float 4 B+4 B+4 B)

USNO-B1I ∼ 109 sourcesI minimal data ∼ 28 GB

F identifier (integer 4 B)F positions (double 8 B+8 B)F errors (float 4 B+4 B)

LSST projection at 5 years:I V>26, ∼ 3x109 unique sourcesI minimal data ∼ 96 GB

ProblemsData size

I do not fit into memory

Performance issuesI data loadingI looking for candidates

SolutionsScalability: Healpix partitioning

Efficiency:I special indexed binary fileI kd-tree (cone search queries)I multithreadingI parallel processing

Thomas Boch (CDS) UWS X-Match service 7/12/2010 3 / 17

Page 8: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Context

Dealing with (very) large catalogues

Example

2MASSI ∼ 470x106 sourcesI minimal data ∼ 15 GB

F identifier (integer 4 Bytes)F positions (double 8 B+8 B)F errors (float 4 B+4 B+4 B)

USNO-B1I ∼ 109 sourcesI minimal data ∼ 28 GB

F identifier (integer 4 B)F positions (double 8 B+8 B)F errors (float 4 B+4 B)

LSST projection at 5 years:I V>26, ∼ 3x109 unique sourcesI minimal data ∼ 96 GB

ProblemsData size

I do not fit into memory

Performance issuesI data loadingI looking for candidates

SolutionsScalability: Healpix partitioning

Efficiency:I special indexed binary fileI kd-tree (cone search queries)I multithreadingI parallel processing

Thomas Boch (CDS) UWS X-Match service 7/12/2010 3 / 17

Page 9: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Scalability

Healpix

Hierarchical sky pixelisationI level 0 12 pixelsI level 1 12x4 pixelsI . . .I level n 12x22n

Pixels of equal area

Developed at NASA:healpix.jpl.nasa.gov

Available inI C, C++I FortranI IDLI JavaI . . . ?

Thomas Boch (CDS) UWS X-Match service 7/12/2010 4 / 17

Page 10: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Scalability

Scalable cross-match

Independent pixels cross-matchI but border effects

Cat. B pixel sources put in a kd-tree

Optimal partitioning levelI available memoryI minimisation of:

nPixels∑i=0

NAi log(1 + NBi + NbBi

)

I I/O cost

Level 0

Level 1

A BThomas Boch (CDS) UWS X-Match service 7/12/2010 5 / 17

Page 11: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Scalability

Scalable cross-match

Independent pixels cross-matchI but border effects

Cat. B pixel sources put in a kd-tree

Optimal partitioning levelI available memoryI minimisation of:

nPixels∑i=0

NAi log(1 + NBi + NbBi

)

I I/O cost

Level 0

Level 1

A BThomas Boch (CDS) UWS X-Match service 7/12/2010 5 / 17

Page 12: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Scalability

Scalable cross-match

Single machine

All sky correlation (small catalogues)I allow “on the fly” correlation

Correlation pixel by pixel (large catalogues)

Computer grid

Parallel processingI Distribute job pieces on separate

machines

“On the fly” correlation possible

Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Page 13: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Scalability

Scalable cross-match

Single machine

All sky correlation (small catalogues)I allow “on the fly” correlation

Correlation pixel by pixel (large catalogues)

Computer grid

Parallel processingI Distribute job pieces on separate

machines

“On the fly” correlation possible

Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Page 14: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Scalability

Scalable cross-match

Single machine

All sky correlation (small catalogues)I allow “on the fly” correlation

Correlation pixel by pixel (large catalogues)

Computer grid

Parallel processingI Distribute job pieces on separate

machines

“On the fly” correlation possible

Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Page 15: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Scalability

Scalable cross-match

Single machine

All sky correlation (small catalogues)I allow “on the fly” correlation

Correlation pixel by pixel (large catalogues)

Computer grid

Parallel processingI Distribute job pieces on separate

machines

“On the fly” correlation possible

Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Page 16: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Scalability

Scalable cross-match

Single machine

All sky correlation (small catalogues)I allow “on the fly” correlation

Correlation pixel by pixel (large catalogues)

Computer grid

Parallel processingI Distribute job pieces on separate

machines

“On the fly” correlation possible

Master

Slave 1 Slave 2

Slave 3 Slave 4

Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Page 17: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

Loading data: indexed binary files

Index filesOne by healpix level

For each pixelI offsetI nSources

Binary data file

Organized by blocks:I positionsI position errorsI identifiersI ...

Sources ordered by healpix pixel index

level 0 index fileIdx offset nSrcs11 xxx xxx

Binary file

Block position

Block posErr

. . .

Thomas Boch (CDS) UWS X-Match service 7/12/2010 7 / 17

Page 18: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

Loading data: indexed binary files

Index filesOne by healpix level

For each pixelI offsetI nSources

Binary data file

Organized by blocks:I positionsI position errorsI identifiersI ...

Sources ordered by healpix pixel index

level 0 index fileIdx offset nSrcs0 xxx xxx

1 xxx xxx

.

.

....

.

.

.

11 xxx xxx

Binary file

Block position

Block posErr

. . .

Thomas Boch (CDS) UWS X-Match service 7/12/2010 7 / 17

Page 19: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

Loading data: indexed binary files

Index filesOne by healpix level

For each pixelI offsetI nSources

Binary data file

Organized by blocks:I positionsI position errorsI identifiersI ...

Sources ordered by healpix pixel index

level 2 index fileIdx offset nSrcs

.

.

....

.

.

.

84 xxx xxx

.

.

....

.

.

.

Binary file

Block position

Block posErr

. . .

Thomas Boch (CDS) UWS X-Match service 7/12/2010 7 / 17

Page 20: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

kd-tree

What is a kd-Tree?A space-partitioning data structure

Allows for fast k-nearest neighbour/cone search queriesI nearest neighbour query in O(log(n))

ProblemNaive implementation can be memory consuming

We want a memory efficient kd-tree (capacity > 1 billion sources)

Solution

To use a single array (sorted using a kd-tree scheme)

Thomas Boch (CDS) UWS X-Match service 7/12/2010 8 / 17

Page 21: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

kd-tree

What is a kd-Tree?A space-partitioning data structure

Allows for fast k-nearest neighbour/cone search queriesI nearest neighbour query in O(log(n))

ProblemNaive implementation can be memory consuming

We want a memory efficient kd-tree (capacity > 1 billion sources)

Solution

To use a single array (sorted using a kd-tree scheme)

Thomas Boch (CDS) UWS X-Match service 7/12/2010 8 / 17

Page 22: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

kd-tree

What is a kd-Tree?A space-partitioning data structure

Allows for fast k-nearest neighbour/cone search queriesI nearest neighbour query in O(log(n))

ProblemNaive implementation can be memory consuming

We want a memory efficient kd-tree (capacity > 1 billion sources)

Solution

To use a single array (sorted using a kd-tree scheme)

Thomas Boch (CDS) UWS X-Match service 7/12/2010 8 / 17

Page 23: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

A kd-tree can be a simple sorted array of sourcesAlgorithm: quicksort alternating the sorted coordinate

α

δS1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

α

δα ≤ αS3

αS3

S3

αS3 ≤ α

α

δδ ≤ δS10 δS10

S10

δS10 ≤ δ

S3

δ ≤ δS8 δS8

S8

δS8 ≤ δ

α

δα ≤ αS11

S11

≤ α

S10

α ≤ αS6

S6

≤ α

S3

α ≤ αS4

S4

≤ α

S8

α ≤ αS2

S2

≤ α

α

δS5 S11 S13 S10 S15 S6 S14 S3 S7 S4 S9 S8 S1 S2 S12

Creation speed up by using multi-threading

Thomas Boch (CDS) UWS X-Match service 7/12/2010 9 / 17

Page 24: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

A kd-tree can be a simple sorted array of sourcesAlgorithm: quicksort alternating the sorted coordinate

α

δS1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

α

δα ≤ αS3

αS3

S3

αS3 ≤ α

α

δδ ≤ δS10 δS10

S10

δS10 ≤ δ

S3

δ ≤ δS8 δS8

S8

δS8 ≤ δ

α

δα ≤ αS11

S11

≤ α

S10

α ≤ αS6

S6

≤ α

S3

α ≤ αS4

S4

≤ α

S8

α ≤ αS2

S2

≤ α

α

δS5 S11 S13 S10 S15 S6 S14 S3 S7 S4 S9 S8 S1 S2 S12

Creation speed up by using multi-threading

Thomas Boch (CDS) UWS X-Match service 7/12/2010 9 / 17

Page 25: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

A kd-tree can be a simple sorted array of sourcesAlgorithm: quicksort alternating the sorted coordinate

α

δS1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

α

δα ≤ αS3

αS3

S3

αS3 ≤ α

α

δδ ≤ δS10 δS10

S10

δS10 ≤ δ

S3

δ ≤ δS8 δS8

S8

δS8 ≤ δ

α

δα ≤ αS11

S11

≤ α

S10

α ≤ αS6

S6

≤ α

S3

α ≤ αS4

S4

≤ α

S8

α ≤ αS2

S2

≤ α

α

δS5 S11 S13 S10 S15 S6 S14 S3 S7 S4 S9 S8 S1 S2 S12

Creation speed up by using multi-threading

Thomas Boch (CDS) UWS X-Match service 7/12/2010 9 / 17

Page 26: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

A kd-tree can be a simple sorted array of sourcesAlgorithm: quicksort alternating the sorted coordinate

α

δS1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

α

δα ≤ αS3

αS3

S3

αS3 ≤ α

α

δδ ≤ δS10 δS10

S10

δS10 ≤ δ

S3

δ ≤ δS8 δS8

S8

δS8 ≤ δ

α

δα ≤ αS11

S11

≤ α

S10

α ≤ αS6

S6

≤ α

S3

α ≤ αS4

S4

≤ α

S8

α ≤ αS2

S2

≤ α

α

δS5 S11 S13 S10 S15 S6 S14 S3 S7 S4 S9 S8 S1 S2 S12

Creation speed up by using multi-threading

Thomas Boch (CDS) UWS X-Match service 7/12/2010 9 / 17

Page 27: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

A kd-tree can be a simple sorted array of sourcesAlgorithm: quicksort alternating the sorted coordinate

α

δS1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

α

δα ≤ αS3

αS3

S3

αS3 ≤ α

α

δδ ≤ δS10 δS10

S10

δS10 ≤ δ

S3

δ ≤ δS8 δS8

S8

δS8 ≤ δ

α

δα ≤ αS11

S11

≤ α

S10

α ≤ αS6

S6

≤ α

S3

α ≤ αS4

S4

≤ α

S8

α ≤ αS2

S2

≤ α

α

δS5 S11 S13 S10 S15 S6 S14 S3 S7 S4 S9 S8 S1 S2 S12

Creation speed up by using multi-threading

Thomas Boch (CDS) UWS X-Match service 7/12/2010 9 / 17

Page 28: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

A kd-tree can be a simple sorted array of sourcesAlgorithm: quicksort alternating the sorted coordinate

α

δS1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

α

δα ≤ αS3

αS3

S3

αS3 ≤ α

α

δδ ≤ δS10 δS10

S10

δS10 ≤ δ

S3

δ ≤ δS8 δS8

S8

δS8 ≤ δ

α

δα ≤ αS11

S11

≤ α

S10

α ≤ αS6

S6

≤ α

S3

α ≤ αS4

S4

≤ α

S8

α ≤ αS2

S2

≤ α

α

δS5 S11 S13 S10 S15 S6 S14 S3 S7 S4 S9 S8 S1 S2 S12

Thread 1

Creation speed up by using multi-threading

Thomas Boch (CDS) UWS X-Match service 7/12/2010 9 / 17

Page 29: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

A kd-tree can be a simple sorted array of sourcesAlgorithm: quicksort alternating the sorted coordinate

α

δS1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

α

δα ≤ αS3

αS3

S3

αS3 ≤ α

α

δδ ≤ δS10 δS10

S10

δS10 ≤ δ

S3

δ ≤ δS8 δS8

S8

δS8 ≤ δ

α

δα ≤ αS11

S11

≤ α

S10

α ≤ αS6

S6

≤ α

S3

α ≤ αS4

S4

≤ α

S8

α ≤ αS2

S2

≤ α

α

δS5 S11 S13 S10 S15 S6 S14 S3 S7 S4 S9 S8 S1 S2 S12

Thread 1

Thread 2

Creation speed up by using multi-threading

Thomas Boch (CDS) UWS X-Match service 7/12/2010 9 / 17

Page 30: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

A kd-tree can be a simple sorted array of sourcesAlgorithm: quicksort alternating the sorted coordinate

α

δS1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

α

δα ≤ αS3

αS3

S3

αS3 ≤ α

α

δδ ≤ δS10 δS10

S10

δS10 ≤ δ

S3

δ ≤ δS8 δS8

S8

δS8 ≤ δ

α

δα ≤ αS11

S11

≤ α

S10

α ≤ αS6

S6

≤ α

S3

α ≤ αS4

S4

≤ α

S8

α ≤ αS2

S2

≤ α

α

δS5 S11 S13 S10 S15 S6 S14 S3 S7 S4 S9 S8 S1 S2 S12

Thread 1

Thread 2

Thread 3 Thread 4

Creation speed up by using multi-threading

Thomas Boch (CDS) UWS X-Match service 7/12/2010 9 / 17

Page 31: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

Modified kd-tree and multithreading

Modified kd-treeClassical kd-tree adapted for euclidian spaces

Solution 1: (rejected)I cartesian coordinates (x , y , z)

F time consuming (conversion)F memory consuming (+50%)

Solution 2: (approved)I spherical coordinates (α, δ)I classical creation algorithmI modified query algorithm

F angular distances (Haversine formula)F modified circle/rectangle intersection to enter a sub-tree

Multithreading

Single kNN or cone search query not multithread

Pool of threads executing multiple queries simultaneously

Thomas Boch (CDS) UWS X-Match service 7/12/2010 10 / 17

Page 32: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

Modified kd-tree and multithreading

Modified kd-treeClassical kd-tree adapted for euclidian spaces

Solution 1: (rejected)I cartesian coordinates (x , y , z)

F time consuming (conversion)F memory consuming (+50%)

Solution 2: (approved)I spherical coordinates (α, δ)I classical creation algorithmI modified query algorithm

F angular distances (Haversine formula)F modified circle/rectangle intersection to enter a sub-tree

Multithreading

Single kNN or cone search query not multithread

Pool of threads executing multiple queries simultaneously

Thomas Boch (CDS) UWS X-Match service 7/12/2010 10 / 17

Page 33: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Efficiency

Modified kd-tree and multithreading

Modified kd-treeClassical kd-tree adapted for euclidian spaces

Solution 1: (rejected)I cartesian coordinates (x , y , z)

F time consuming (conversion)F memory consuming (+50%)

Solution 2: (approved)I spherical coordinates (α, δ)I classical creation algorithmI modified query algorithm

F angular distances (Haversine formula)F modified circle/rectangle intersection to enter a sub-tree

Multithreading

Single kNN or cone search query not multithread

Pool of threads executing multiple queries simultaneously

Thomas Boch (CDS) UWS X-Match service 7/12/2010 10 / 17

Page 34: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Tests

Test Machine

Dell machine 2 600d(∼$3 600):I 24 GB of 1333 MHz memoryI 2x Quad Core 2.27 GHz (Xeon)I 16 threads (Hyper-Threading)I High speed HDD (10 000 rpm)

Thomas Boch (CDS) UWS X-Match service 7/12/2010 11 / 17

Page 35: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Tests

Test results

Full catalogue cross-correlation

SDSS DR7 (∼357 000 000 sources) 2MASS (∼470 000 000 sources)

Simple cross-match: ∼9 minI radius of 5′′

I Healpix level 3 (∼7.3◦)I Level 9 borders (∼7’)I ∼49 209 000 associations

With elliptical errors: ∼10 minI distance of 3.44σI distance max of 5′′

I Healpix level 3I ∼37 507 000 associations

Thomas Boch (CDS) UWS X-Match service 7/12/2010 12 / 17

Page 36: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Tests

Test results

Full catalogue cross-correlation

SDSS DR7 (∼357 000 000 sources) 2MASS (∼470 000 000 sources)

Simple cross-match: ∼9 minI radius of 5′′

I Healpix level 3 (∼7.3◦)I Level 9 borders (∼7’)I ∼49 209 000 associations

With elliptical errors: ∼10 minI distance of 3.44σI distance max of 5′′

I Healpix level 3I ∼37 507 000 associations

Thomas Boch (CDS) UWS X-Match service 7/12/2010 12 / 17

Page 37: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Tests

Test results

Full catalogue cross-correlation

SDSS DR7 (∼357 000 000 sources) 2MASS (∼470 000 000 sources)

Simple cross-match: ∼9 minI radius of 5′′

I Healpix level 3 (∼7.3◦)I Level 9 borders (∼7’)I ∼49 209 000 associations

With elliptical errors: ∼10 minI distance of 3.44σI distance max of 5′′

I Healpix level 3I ∼37 507 000 associations

Thomas Boch (CDS) UWS X-Match service 7/12/2010 12 / 17

Page 38: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Tests

Test results

Full all-sky catalogues cross-correlation

2MASS (∼470 000 000 sources) USNO-B1 (∼1 046 000 000 sources)

Simple cross-match: ∼30 minI radius of 5′′

I Healpix level 3I Level 9 bordersI ∼583 300 000 associations

Thomas Boch (CDS) UWS X-Match service 7/12/2010 13 / 17

Page 39: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

UWS

General architecture

Thomas Boch (CDS) UWS X-Match service 7/12/2010 14 / 17

Page 40: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

UWS

UWS layer

Provided by a Java library developed at CDS by Gregory ManteletI Documentation and tutorial :

http://saada.u-strasbg.fr/uwstuto/index.htmlI Distributed under LPGL licenceI Will be presented at GWS2 session on Friday

Internal UWS enables communication between master and slaves

Thomas Boch (CDS) UWS X-Match service 7/12/2010 15 / 17

Page 41: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

UWS

Web interface

Simple front-end to access the UWS interface

Job submission, retrieval of jobs status through JSON calls

Integrated with the CDS login service (used for the Annotations and thePortal)

I allow users to upload (and cross-match) their own tables

Demonstration

Thomas Boch (CDS) UWS X-Match service 7/12/2010 16 / 17

Page 42: A UWS service to cross-match (very) large catalogueswiki.ivoa.net/internal/IVOA/InterOpDec2010Applications/NaraInterop-… · Thomas Boch (CDS) UWS X-Match service 7/12/2010 6 / 17

Lessons

Lessons learned

HardwareFor our application:

RAM frequency does matter (lots of memory access)

Hyper-Threading does matter (on 8 cores, 16 threads ∼2x faster than 8threads)

Software: don’t have a priori

Efficient full Java code

Efficient modifed kd-trees (in our case)

Service

Existing and future (very) large catalogues can be processed

Bottleneck is data transfer (without surprise)I service colocated with data

Thomas Boch (CDS) UWS X-Match service 7/12/2010 17 / 17