19
VOMegaPlot Efficient Plotting of Large VOTable Datasets

VOMegaPlot Efficient Plotting of Large VOTable Datasets

Embed Size (px)

Citation preview

Page 1: VOMegaPlot Efficient Plotting of Large VOTable Datasets

VOMegaPlot

Efficient Plotting of Large VOTable Datasets

Page 2: VOMegaPlot Efficient Plotting of Large VOTable Datasets

VOPlot

VOPlot is a tool for visualizing astronomical data that is available in the VOTable format.

VOPlot reads the xml file in order to load entire data into memory and then processes it to draw various types of plots.

This approach of loading the entire data into memory cannot be used for VOTable files that are very large.

Page 3: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Approach for VOMegaPlot VOMegaPlot preprocesses the xml file to

create intermediate files which are subsequently used for plotting.

Entire data is divided into fixed sized blocks and individual blocks are loaded into memory thus reducing the memory requirement.

The number of intermediate files created is equal to the number of columns present in the xml file.

Page 4: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Pre-processing operationCreation of array blocks

Col 1 Col m …….

1

2

n

.

.

Col 2

File 2

…….

Original xml file with m columns and n rows

Block 1

File 1

Block 2

Block k

Block 1

Block 2

Block k

File m

Block 1

Block 2

Block k

Intermediate files on disk

Page 5: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Algorithm for drawing a scatter plot

1) Input the columns to be plotted, say A vs. B

2) Load a set of corresponding blocks for both columns, A and B.

3) Take corresponding data elements from both the blocks and plot them.

4) After plotting all the points, discard the blocks.

5) If there exist more blocks of data repeat step 2, else stop.

Page 6: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Advantages

The complexity for plotting is to O(2n) where n is the no. of rows. This complexity is independent of the no. of columns in the xml file.

If the user has to plot only a subset of data (as in case of zoom operation) then there exists another set of files which can be used for this purpose.

Page 7: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Dealing with subset of data

Data for every column is stored in an indexed fashion.

This helps in accessing the subset of data without having to go through the entire set of data.

As a result, operations like zoom become much faster.

Page 8: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Pre-processing operationCreation of tree blocks

Col 1 Col m …….

12

n

.

.

Col 2

Indexed File for col 1

0-10

10-20

.

.

.

Indexed File for col 2

2-4

4-6

.

.

.

…….

Original xml file with m columns and n rows

Intermediate files with indexed data

0.1–0.3

0.3–0.6

.

.

.

Indexed File for col m

Page 9: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Pre-processing operationCreation of tree blocks (contd)

0-20 20-40 40-60

20-30 30-40

30-35 35-40

Indexed file for a column

Page 10: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Results

Tycho Tycho-2 UCAC2

Data size 1 million rows and 56 columns

2.5 million rows and 32 columns

48.3 million rows and 9 columns

Pre-processing time

18 minutes 30 minutes 3 hours 26 minutes

Plotting time for scatter plot

9 seconds 22 seconds 5 minutes 46 seconds

Page 11: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Features of VOMegaPlot

Scatter Plot with zoom, reverse axis and logged axis

Projection Plot Density Plot Histogram

Page 12: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Scatter PlotTycho-1 catalogue ( RA vs. Vmag)

Page 13: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Density PlotTycho-1 catalogue ( RA vs. Vmag)

Page 14: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Density PlotTycho-2 catalogue ( DEC vs RA)

Page 15: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Scatter Plot UCAC2 Catalogue (2m_J vs. U2Rmag)

Page 16: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Density PlotUCAC2 Catalogue (2m_J vs. U2Rmag)

Page 17: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Future Enhancements

Support for reading data stored in binary format

Block level compression while creating intermediate files

Client Server version

Page 18: VOMegaPlot Efficient Plotting of Large VOTable Datasets

References VOTable

http://www.ivoa.net/Documents/latest/VOT.html

VOPlot http://vo.iucaa.ernet.in/~voi/voplot.htm VOMegaPlot

http://vo.iucaa.ernet.in/~voi/vomegaplot.htm

IUCAA http://www.iucaa.ernet.in Persistent Systems Pvt. Ltd. http://www.persistentsys.com

Page 19: VOMegaPlot Efficient Plotting of Large VOTable Datasets

Sample VOTable<TABLE><FIELD name=“RAJ2000” datatype =“double” ></FIELD><FIELD name = “DEC2000” datatype=“double” ></FIELD><DATA> <TABLEDATA>

<TR><TD>12.4524</TD><TD>34.2331</TD>

</TR><TR>

<TD>25.1321</TD><TD>47.9055</TD>

</TR><TR>

<TD>18.0723</TD><TD>33.5802</TD>

</TR></TABLEDATA></DATA></TABLE>

Back