28
SDSS Data Release 6 Data Release 6 Access to DR6 and SEGUE Catalog Access to DR6 and SEGUE Catalog Data Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban, Nolan Li, Wil O’Mullane, Adrian Pope, Tamas Budavari, George Fekete, Jordan Raddick,Sam Carliles JHU Brian Yanny, Svetlana Lebedeva FermiLab Jim Gray Microsoft Research

Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

Embed Size (px)

Citation preview

Page 1: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

SDSS Data Release 6Data Release 6Access to DR6 and SEGUE Catalog DataAccess to DR6 and SEGUE Catalog Data

Ani Thakar

Alex Szalay, Maria Nieto-Santisteban, Nolan Li,

Wil O’Mullane, Adrian Pope, Tamas Budavari, George Fekete,

Jordan Raddick,Sam Carliles

JHU

Brian Yanny, Svetlana Lebedeva

FermiLab

Jim Gray

Microsoft Research

Page 2: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 2Ani Thakar

Outline

• SDSS and Data Overview• SDSS-II and DR6• CAS Data Access

– SkyServer, ImgCutout, CasJobs– Help resources and sample queries– Restricted collab access– SDSS and other datasets

• VO services• EPO content

Page 3: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 3Ani Thakar

SDSS

Multi-Fiber SpectroGraph (JHU)Multi-Fiber SpectroGraph (JHU)

• Digital map in 5 spectral bands covering ¼ of the sky

• 40+ TB of raw pixel data• Photometric catalog with more than 200

million objects• Spectra of ~ 1 million objects• Data Release 5 (DR5) last public

release: 240 M images, 740 k spectra

Apache Point Observatory, NMApache Point Observatory, NM

Catalog Archive Server (JHU)Catalog Archive Server (JHU)

• JHU contributions:– Multi-Fiber Spectrograph

– 20” Photometric Telescope

– Catalog Archive Server DBMS

• All data is served from FermiLab (master archive site)All data is served from FermiLab (master archive site)• SDSS-II is the continuation of SDSS through 2008SDSS-II is the continuation of SDSS through 2008

Page 4: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 4Ani Thakar

SDSS Data Overview

Data Archive Server (DAS)Data Archive Server (DAS)FITS files (raw data)

Images, spectra, corrected frames, atlas images, binned images, masks

Online form-based accessRsync and wget file retrieval

Data Archive Server (DAS)Data Archive Server (DAS)FITS files (raw data)

Images, spectra, corrected frames, atlas images, binned images, masks

Online form-based accessRsync and wget file retrieval

Catalog Archive Server (CAS)Catalog Archive Server (CAS)Science parameters extracted to

catalogsStuffed into relational DBMS (SQL

Server)Heavily indexed, optimizedOnline access via SkyServer

Several levels of access, query tools

Catalog Archive Server (CAS)Catalog Archive Server (CAS)Science parameters extracted to

catalogsStuffed into relational DBMS (SQL

Server)Heavily indexed, optimizedOnline access via SkyServer

Several levels of access, query tools

SDSSData

Releasewww.sdss.org

das.sdss.org/DRx-cgi-bin/DAS

www.sdss.orgdas.sdss.org/DRx-cgi-bin/DAS

cas.sdss.org

skyserver.sdss.org

cas.sdss.org

skyserver.sdss.org

Page 5: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 5Ani Thakar

SDSS Data Releases

Rel Date CAS Size Images Spectra Sq Deg CAS Mirrors

(DR6) 2/09/07 5 TB 300 M 885k 8520 ---

DR5 6/28/06 4.5 TB 240 M 740k 8000JHU,Portsmouth,STScI,

Moscow, UIC

DR4 6/29/05 3.8 TB 180 M 608k 6670 JHU, India, UIC

DR3 9/27/04 3 TB 141M 478k 5200JHU,India,Portsmouth,UIC

worldwide distribution

DR2 4/15/04 2 TB 88M 330k 3324 JHU, UPitt, SDSC, Germany

DR1 6/15/03 1 TB 53M 186k 2099JHU, SDSC, CDAC, UPitt

UK, Germany, Japan, India

EDR 6/06/01 200 GB 14M 54k 462 JHU, SDSC, UK (ROE), Japan

Jan2001

Jan2002

Jan2003

Jan2004

Jan2005Jun Jun Jun Jun

EDR DR1 DR2 DR3

DR45%, 200GB 20%, 1TB 35%, 2TB52%, 3TB 66%, 3.8TB

Jan2006

Jan2007

DR5

Jun Jun

(DR6)80%, 4.5TB

Page 6: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 6Ani Thakar

SDSS-II

• Legacy– Continuation of SDSS-I (fill out 10k sq.deg.)– Completeness is same as for SDSS-I– Flux limits are the same – Target all galaxies with r_petro < 17.77, plus LRGs

• SEGUE– Detailed 3-d map of the Galaxy

• Spectra of 240,000 stars in the disk and spheroid• Age, composition and phase space distribution

– CAS component cataloged in SegueDRx DB

• Supernova Survey– Repeated scans of SDSS Southern Stripe over 3 mths/yr– Data not available in CAS yet, will be on DAS soon

Page 7: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 7Ani Thakar

Publication Policy• Click on CollaborationCollaboration link on www.sdss.org

– Scroll to bottom– Click on SDSS Publication ProceduresSDSS Publication Procedures

• Proprietary data– Announce project to SDSS Projects Page– Add SDSS credits/acknowledgements to papers– Reference SDSS Technical Papers– Post manuscript to SDSS Publications Page

• External Collaborators and Participants– Post requests to sdss-coco and sdss-general mailing

lists– Ask your local CoCo rep (he won’t bite)

Page 8: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 8Ani Thakar

CAS Datasets

• BestDRx– Latest, greatest calibration of the data– Photometric and spectroscopic objects– The default and most accessed (by far) dataset

• TargDRx– The calibration from which spectroscopic targets

were chosen

• RUNS– All the runs (processings) other than Best, Target

• SegueDRx– New with DR6 (SDSS-II)

Page 9: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 9Ani Thakar

CAS Data Model (Best DB)

Page 10: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 10Ani Thakar

DR6 CASDR6 CAS

• BestDR6, TargDR6 and SegueDR6 databases– SegueDR6 = SEGUE stripes

• May be rolled into BestDRx in the future

• SkyServer: http://cas.sdss.org/collabdr6– Also pwd access at http://cas.sdss.org/collabdr6pw

for non-collab IPs– See message sdss-archive/2935 for uname/pwd

• CasJobs:– DR6 and DR6QA targets point to BestDR6 DB– TARGDR6, SEGUEDR6 targets– Need to have “collab” privilege set in user profile

Page 11: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 11Ani Thakar

CAS Data Access

• SkyServer– Web browser-based synchronous access– Meant to support several levels of users

• From casual to moderately advanced queries

• From simple form-based to direct SQL queries

• From cone (radial) search to crossid type searches

– Visual tools to browse image and catalog data– API access, e.g. emacs interface, sqlcl (command-line)– Strict limits on execution time and output size

• Fair use for everyone, robots/crawlers discouraged

• ImgCutout– Finding Chart and JPEG image browser– Accessible from SkyServer (Visual Tools)

Page 12: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 12Ani Thakar

CasJobs• Link in SkyServer (http://cas.sdss.org/casjobs)• Batch Query Workbench, personal user DB (MyDB)

– Quick mode: 1 minute cutoff– Submit mode: up to 8 hours in “long” queue

• 24-hr queue for collab members

• Preferred method for serious queries• MyDB database to save results of your queries

– Define your own functions, procedures too– Share your tables with collaborators (groups)

• Job history, plotting, FITS/CSV/VOTable output• Table Import (upload) for your own data• Groups to share your results with collaborators• Command-line access Java tool also downloadable

Page 13: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 13Ani Thakar

Using CasJobs

• Every query has a default “target”– The database that it will operate on– e.g., MyDB, DR4, DR5, CollabDR4, DR5QA,

BESTRUNS etc.

• Each target is hosted on a separate server– Provides load balancing and performance– Some quirks/restrictions due to distributed execution– Help page and FAQ explain these

• Ability to do distributed joins between different datasets– e.g., between DR4 and DR5 or RUNS and DR5

Page 14: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 14Ani Thakar

Collab-only access

• collabdrx SkyServer sites– IP-restricted access to collabdrx URL– Password access to collabdrxpw URL from other IPs– Larger query limits (e.g., 1 hour/500k rows)

• “collab” privilege in CasJobs– Gives you access to restricted data, additional longer

queues (e.g. DR5QA, DR6QA 24-hr queues)– If you have collab priv set, you will see these queues– If you don’t have it, email [email protected]

Page 15: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 15Ani Thakar

Data available only to Collab - RUNSRUNS

• RUNS DB– SkyServer: http://cas.sdss.org/runs

• Also http://cas.sdss.org/runspw for pwd access

– CasJobs: BESTRUNS context– Mostly SEGUE (half) runs and stripe 82 (most)

• DRx runs still be added over next few months

– Imaging only, no spectra• May be possible to link to BEST spectra with join

– Use match tables to match up repeat observations in multiple runs

Page 16: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 16Ani Thakar

SkyServer Help Resources

• HelpHelp menu option on top right of SkyServer• Start with Archive IntroArchive Intro• Next look at Query LimitsQuery Limits and How ToHow To pages• Then Introduction to SQLIntroduction to SQL and Sample QueriesSample Queries

– Look at Optimizing QueriesOptimizing Queries page (esp. bookmark bug)– Try out some of the sample queries– Cut and paste to SQL search page (ToolsSearchSQL)

• Browse FAQ FAQ and Schema BrowserSchema Browser• Glossary Glossary , Table DescriptionsTable Descriptions and AlgorithmsAlgorithms

– Searchable, dynamically loaded from DB, interlinked– The , , symbols are links to Glossary, Algorithm and

Table Description entries• Data release and technical papersData release and technical papers

Page 17: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 17Ani Thakar

CasJobs Help

• Shares SQL IntroSQL Intro, Schema BrowserSchema Browser with SkyServer

• Has its own FAQFAQ page– Lists differences between CasJobs and SkyServer

due to distributed query execution

• Advanced CasJobs QueriesAdvanced CasJobs Queries page– Neighbor searches with fixed and variable search

radii– Cursors– Compound queries

Page 18: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 18Ani Thakar

Sample Queries

• 50+ sample queries from simple to complex• Available in SkyServer and CasJobs• Clean photometry meta-flags sample• INNER/OUTER JOIN samples• Sector/Region tables usage sample• Variability queries from Robert and Zeljko• CasJobs Advanced Queries Help page

– Has examples of neighbor searches, cursors etc.

Page 19: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 19Ani Thakar

SkyServer General Tips

• Use astro or collab sites– Less “frills”, more direct access to tools– More generous query limits (timeouts, row limits)

• See HelpHelpQuery LimitsQuery Limits page• Collab site is restricted access, largest query limits

– Some extra features• e.g. Imaging/Spectro form query

• Each release has separate sites– http://cas.sdss.org/collab/ (the public release)– http://cas.sdss.org/collabdr6/ (not yet public)

• Use ContactContact link when emailing help-desk

Page 20: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 20Ani Thakar

Find the right tool for the job

• Visual exploration: ToolsToolsVisual ToolsVisual Tools– Browse objects one at a time: ExploreExplore page

• Shows all parameters for object, also its image and spectrum

– Browse and find objects on a frame: Finding ChartFinding Chart– Navigate image frames: NavigateNavigate– View multiple objects with query: Image ListImage List

• Browse images: ToolsToolsGet ImagesGet Images – Frames: FieldsFields browser– Spectroscopic plates: PlatesPlates browser– View individual spectra: SpectraSpectra browser

Page 21: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 21Ani Thakar

Finding the right tool (contd.)

• SQL search: ToolsToolsSearchSearch – Cone (radial) search: Radial Radial search form– Region (rectangle) search: Rectangular Rectangular search form– Imaging form query: Imaging QueryImaging Query form– Spectroscopic form query: Spectro QuerySpectro Query form– All other searches: SQL SQL search page

• Cross-matching: ToolsToolsObject CrossidObject Crossid– Imaging crossid: UploadUpload– Spectro crossid: SpecListSpecList

• Advanced,unrestricted SQL queries: CasJobsCasJobs– Your own personal DB– Retrieve results when you are ready

Page 22: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 22Ani Thakar

CAS Dos and Don’ts

• Do notDo not submit a query unless you have some idea how long it will take!– It could tie up the server for hours (sometimes days)!– Do a “count” query first if necessary

• Casjobs also has a graphical query plan (PlanPlan button)

– Look at samples, query optimization pages– If not sure, use form queries at first

• Use the predefined views for unique/primary objects– PhotoObjPhotoObj, PhotoPrimaryPhotoPrimary for photometry– Consider using PhotoTagPhotoTag table if you only need popular fields

• Makes better use of cache

– SpecObjSpecObj for spectra

Page 23: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 23Ani Thakar

Dos and Don’ts (contd.)

• Use the ContactContact link to contact Help Desk– Fill the short form, which gives us necessary information– In CasJobs, press Contact after logging in

• Automatically attaches your userid to the message

– Will speed up response to your request

• Do notDo not contact Help Desk staff directly– Questions are answered by a pool of experts as available– More likely to get delayed or no response (unless you can bug

them in person )

• If you run out of MyDB space, ask for more!!– We’re pretty liberal about giving more space, but you have to

ask (to avoid empty/unused MyDBs taking up space)

Page 24: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 24Ani Thakar

SDSS and other datasets

• GALEX– Has its own CasJobs page hosted by MAST– SDSS vs GALEX cross-matches

• DR5 vs GR2 and DR4 vs GR2 available now• DR5 vs GR3 coming soon • Link table with IDs from both catalogs• SDSS parameters for GALEX matches also extracted

• Some older datasets matched in BEST DB– FIRST, USNOB, ROSAT, USNOB proper motions

• Open SkyQuery site for other datasets– Only small-area xmatches possible at the moment

Page 25: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 25Ani Thakar

Virtual Observatory

• JHU is one of the main participants– SDSS is one of the drivers for NVO– Co-PI (Szalay) and Project Manager (Hanisch) here

• JHU VO services – Open SkyQuery (http://openskyquery.net/)– VO services (http://voservices.org/)

• Spectrum services• Filter profiles• Footprint services (new!)• VO registry (with STScI)• Standard VO services: Cone Search, SIAP

Page 26: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 26Ani Thakar

Educational Resources

• Extensive EPO content in SkyServer– Use ProjectsProjects link on the menu at the top– K-12 and college level student exercises and

teacher resources

• Open SkyQuery / Cross-match EPO project• Jordan Raddick’s talk on May 1

Page 27: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 27Ani Thakar

Coming Attractions (or not …)

• FITS cutout service– VO-India is developing a service– There are technical difficulties with mosaicing

multiple SDSS frames • How to handle different S/N, PSFs across frames?

• Non-SDSS datasets in CasJobs– Merging of CasJobs and Open SkyQuery features– Ability to do large-scale cross-matches with other

datasets within CasJobs environment

Page 28: Data Release 6 Access to DR6 and SEGUE Catalog Data SDSS Data Release 6 Access to DR6 and SEGUE Catalog Data Ani Thakar Alex Szalay, Maria Nieto-Santisteban,

JHU CAS Seminar, March 6, 2007 28Ani Thakar

Thanks!

http://www.sdss.org (main site and DAS)http://cas.sdss.org/ (public CAS, redirected to latest public release)http://cas.sdss.org/drX (public CAS, release X)http://cas.sdss.org/astro (astronomers, latest pub release)http://cas.sdss.org/astrodrX (astronomers, release X)http://cas.sdss.org/collab (IP-restricted collab site, latest pub release)http://cas.sdss.org/collabdrX (IP-restricted collab site, release X)http://cas.sdss.org/collabpw (collab pwd access, latest pub release)http://cas.sdss.org/collabdrXpw (collab pwd access, release X)http://www.voservices.org/ (VO services @ JHU)http://www.openskyquery.net/ (Open SkyQuery)http://www.skyserver.org (support site)

Software downloads, mirror site resources, data download info