32
Ashish Mahabal aam at astro.caltech.edu Center for Data Driven Discovery, Caltech IAU 325: AstroInformatics, Sorrento, Italy 2016-10-23 From Sky to Earth: Data Science Methodology Transfer JPL Data Science Initiative NASA Advanced Information Systems Technology Program (AIST) Western States Water Architecture Study EarthCube VIFI

From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

  • Upload
    buinhu

  • View
    225

  • Download
    2

Embed Size (px)

Citation preview

Page 1: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Ashish Mahabal aam at astro.caltech.edu

Center for Data Driven Discovery, Caltech IAU 325: AstroInformatics, Sorrento, Italy

2016-10-23

From Sky to Earth: Data Science Methodology Transfer

JPL Data Science InitiativeNASA Advanced Information Systems Technology Program (AIST)

Western States Water Architecture Study

EarthCubeVIFI

Page 2: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Broad Outline

• Similarities in Big data of Astro and Earth Sciences

• The Hydrology case

• Example projects from BigSkyEarth

• EarthCube - the Earth VO

• Domain Adaptation

• Summary

2

Page 3: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Generic Big Data• Complex rather than just voluminous

• Real-time needs

• Complexity in terms of

• spatial distribution

• spatial and temporal resolution,

• time epochs (number of and irregularity),

• coverage (overlap)

Volume, Velocity, Volatility, Veracity, Value, …

3

Page 4: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

0 1

24

22

20

18

16

14

12

10

−8

−6

V838 MonM85 OT

M31 RV

SCP06F6

SN2006gySN2005ap SN2008es

SN2007bi

SN2008S

NGC300OT

SN2008ha

SN2005E

SN2002bj

PTF10iuvPTF09dav

PTF11bijPTF10bhp

PTF10fqs

PTF10acbp

PTF09atuPTF09cnd

PTF09cwlPTF10cwr

Thermonuclear Supernovae

Classical Novae

Luminous Red

Novae

Core−Collapse Supernovae

Luminous Supernovae

.Ia Explosions

Ca−rich Transients

P60−M81OT−071213

P60−M82OT−081119

0 1

24

22

20

18

16

14

12

10

8

6

V838 MonM85 OT

M31 RV

SCP06F6

SN2006gySN2005ap SN2008es

SN2007bi

SN2008S

NGC300OT

SN2008ha

SN2005E

SN2002bj

PTF10iuvPTF09dav

PTF11bijPTF10bhp

PTF10fqs

PTF10acbp

PTF09atuPTF09cnd

PTF09cwlPTF10cwr

Thermonuclear Supernovae

Classical Novae

Luminous Red

Novae

Core−Collapse Supernovae

Luminous Supernovae

.Ia Explosions

Ca−rich Transients

P60−M81OT−071213

P60−M82OT−081119 M85 OT

1038

1039

1040

1041

1042

1043

1044

1045

Peak

Lum

inos

ity [e

rg s−

1 ]

−24

−22

−20

−18

−16

−14

−12

−10

−8

−6

Peak

Lum

inos

ity [M

V]

10 10 10Characteristic Timescale [day]

0

log ( [sec]) 10 10

Characteristic Timescale [day]1 2 3 4 5 6 7

A

A

A

. A BB D

BA B A

C .

C

0 1 2

24

22

20

18

16

14

12

10

8

6

V838 MonM85 OT

M31 RV

SCP06F6

SN2006gySN2005ap SN2008es

SN2007bi

SN2008S

NGC300OT

SN2008ha

SN2005E

SN2002bj

PTF10iuvPTF09dav

PTF11bijPTF10bhp

PTF10fqs

PTF10acbp

PTF09atuPTF09cnd

PTF09cwlPTF10cwr

Thermonuclear Supernovae

Classical Novae

Luminous Red

Novae

Core−Collapse Supernovae

Luminous Supernovae

.Ia Explosions

Ca−rich Transients

P60−M81OT−071213

P60−M82OT−081119

10

10

10

10

10

10

10

10

- B C BA

- BA

B A

AB

2 1 log ( )

-1 -2 -3 -4

A B A ) (

Big Data - Astronomy• Complex rather than just voluminous (catalogs, spectra, polarimetry)

• Real-time needs (e.g. transient classification)

• Understanding in terms of existing models (e.g. Tabby’s star, HB stars)

• Complexity in terms of

• spatial distribution (data archives at different locations)

• spatial and temporal resolution (HST~0”.1 -> TESS~10”)

• time epochs (number of and irregularity) (SDSS - Kepler)

• coverage (overlap) (DLS -> Gaia)

J Cooke4

Page 5: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Big Data - Earth Science• In-situ measurements

• Satellite-based observations

• Models (predictive, computational)

• Real-time needs (e.g. predicting flashfloods, ephemeral water flow)

• Complexity in terms of

• spatial and temporal resolution (wells, snow, moisture, underground water, overground water)

• time epochs (number of and irregularity) (snow, wells)

• coverage (overlap)

5

Page 6: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Big Data - Earth Science

Python, R, GrADS, IDL, Matlab, ArcGIS, HydroDesktop, and Google’s Earth Engine.

Tools

Multi-dimensional Indexing GeoMesa GeoWave

6

Page 7: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Ontologies

Astronomical Objects

PDSSteve HughesDan Crichton

PDS -> Earth Science (NASA)7

Page 8: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Multiplicity of ontologies

Meta-data (and ontologies) are good Too many, or non-confirming systems may be hurtful ONTOLOG, EarthCube, OGC, SWEET

8

Page 9: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Parallels in Earth Science and astronomy methodology

• Water vapour

• Precipitation

• Surface Water

• Ground Water

• Snow

• Evaporation

• Rivers/Lakes/…

The Hydrology Case

9

Page 10: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

GRACE AQUA

Next few slides from ARSET10

Page 11: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

11

Page 12: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Less than 1 and up to two measurements per day12

Page 13: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Data latency under 3 hours to 3 monthsTESS will have downlinks every 15 days13

Page 14: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Very distributed and not talking enough to each other

14

Page 15: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Water

JPL Data Science InitiativeNASA Advanced Information Systems Technology Program (AIST) Western States Water Architecture Study

Input&Forcing-(e.g.,-GPM)-

For-Data-Assimila<on-(e.g.,-MODSCAG)-

Standard-Reports- Ad-Hoc-Queries-and-Custom-Reports-

Snow&Water-Equivalent- Surface-Water- Ground-Water-

Single&Month-Es<mates- Short-and-Long&Term-Trends-

Research(

Applica-ons(

Decision(Support(

Data(Science(Infrastructure((Tools,(Services,(Methods(for(Massive(Data(Analysis)(

A(Scalable(Data(Processing(System(for(Hydrological(Science(

(Web&Based-Interface)-

15

Page 16: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Western States Water Mission (WSWM)

hydrological state estimation on water availability, at 3km2 resolution for the Western US

timely actionable information

a close collaboration of hydrological modeling and data science expertise in a mission-style project architecture

WaterTrek: an interactive, web-based interactive analytics environment

Regularization of spatial resolution Time series regularization

Integration of datasets16

Page 17: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

WSWM

Pacific Northwest

California

Great Basin

Lower Colorado

Upper Colorado

WSWM domain: Continental US west of divide

17

Page 18: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

WSWM

Pacific Northwest

California

Great Basin

Lower Colorado

Upper Colorado

WSWM domain: Continental US west of divide

Franklin D Roosevelt Lake

Lake Koocanusa

Shasta Lake

Lake Mead

Lake Powell

Contains 5 of the 15 largest US reservoirs

18

Page 19: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

WSWM

Pacific Northwest

California

Great Basin

Lower Colorado

Upper Colorado

WSWM domain: Continental US west of divide

Franklin D Roosevelt Lake

Lake Koocanusa

Shasta Lake

Lake Mead

Lake Powell

Contains 5 of the 15 largest US reservoirs

Getting ready for SWOT

Actual model resolution

Largest rivers

19

Page 20: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

WSWM

Pacific Northwest

California

Great Basin

Lower Colorado

Upper Colorado

WSWM domain: Continental US west of divide

Franklin D Roosevelt Lake

Lake Koocanusa

Shasta Lake

Lake Mead

Lake Powell

Contains 5 of the 15 largest US reservoirs

Getting ready for SWOT

Actual model resolution

Largest rivers

658,702 river reaches (1,410,328 total length) 7,532 gauges (many now inactive)

Hyper-resolution with assimilation

20

Page 21: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

WSWM

Pacific Northwest

California

Great Basin

Lower Colorado

Upper Colorado

WSWM domain: Continental US west of divide

Franklin D Roosevelt Lake

Lake Koocanusa

Shasta Lake

Lake Mead

Lake Powell

Contains 5 of the 15 largest US reservoirs

Getting ready for SWOT

Actual model resolution

Largest rivers

658,702 river reaches (1,410,328 total length) 7,532 gauges (many now inactive)

Hyper-resolution with assimilation

Facilitates informed decisions at the local level

High-resolution modeling over large spatial domain

21

Page 22: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

High Level Concept of Data Management and Data Analytics

22

Page 23: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

COST’s

First Training School at Oberpfaffenhofen 2016

https://github.com/marcoq/BSE_TS2016_Oberpfaffenhofen/

23

Page 24: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

EarthCubeNSF 2011

Cyberinfrastructure sharing

visualization analysis

Interoperability standards better integration

democratizing dataJPL, CaltechScalable Arch

Test Environment24

Page 25: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

• BCube: Broker for Next generation Geoscience (meditating interactions)

• Integrating Long-Tail Data and Models

• Scalable Community Driven Architecture

• (SG Djorgovski, E Law, D Crichton, A Mahabal)

• ECITE (Graves, Yang, Law, Djorgovski, Mahabal)

• … (other Building Blocks)

EarthCubeFunded Projects

25

Page 26: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Scalable Community Driven Architecture

• Identify Stakeholders, key use cases

• Incorporate cross-agency informatics efforts to capture architectural drivers, principles, models

• Roadmap for extensible and sustainable participation coherent with cyberinfrastructure

• Design architecture, data intensive system leading to discovery in the big data era

Team: S.Caltagirone, D.Crichton, S.G.Djorgovski, T.Huang, S.Hughes, E.Law, A.Mahabal, D.Pilone, T.Pilone

26

Page 27: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

EarthCube Integration and Test Environment (ECITE)

• Seamless federeated system of scalable, location independent resources

• Compute and storage with minimal administration

• Integration, test, and evaluation

• Share ideas, concepts, experiments

SarvabhaumMandlik

Caltech + GMU27

Page 28: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Domain AdaptationWith Jingling Li, Samarth Vaijanapurkar, Brian Bui, …

28

Page 29: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Feature Correlations

Sample from Drake et al.

29

Page 30: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

RF, GFK, CODA, …

• Examine the baseline performance for three combinations of data using random forest blindly:

• Source to Target• Source + Target to Target • Target to Target

• Compare performance with Domain Adaptation• Misclassified objects and outliers

To be used with the various hydrology layers having irregular time series

Aspects to be explored through VIFITalukdar, Mahabal, Djorgovski, Crichton30

Page 31: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Earth to Sky Pokemon Go to Transient Go

Binary Transient Brokers combined with AR

CRTS +LSST; Gaia?

SUNY Oswego CS undergrads

31

Page 32: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer

Summary• Many parallels in Astro- and Earth-sciences

• In EarthScience many datasets still analyzed separately

• One big difference: intervention possible

• water distribution

• Citizen Science not explored enough

• monitoring presence of lead at different locations

• Many other use cases being explored32