Environmental Science, Big Data and the Cloud

  • View
    1.378

  • Download
    8

  • Category

    Science

Preview:

DESCRIPTION

Scientific instruments, environmental sensors, and large-scale simulations are creating more scientific data than ever before. By using advanced, large-scale information processing facilities, scientists are now able to analyze massive volumes of data in ways that never would have been possible just a few years ago. While a few researchers have access to these large computer systems, most are limited by the processing capacity they can access conveniently and quickly. Cloud computing solutions utilizing Microsoft Azure allow environmental science researchers to access the compute and storage resources that they need, when they need them—without the up-front financial investment required—and helps reduce the time between progress and breakthroughs. Microsoft Azure brings on-demand computing and data access to environmental scientists and researchers everywhere.

Citation preview

Microsoft Research: Computational Ecology and Environmental Science Group

http://research.microsoft.com/en-us/groups/ecology/

Manual Measurement

Automated Measurement

Sample Collection

Historical Photographs

Counting

Ubiquitous

Motes

Aircraft SurveysModel Output

Typing

Monitoring

Collation

Quality assurance

Aggregation

Analysis

Reporting

Forecasting

Distribution

Done poorly,but a few notablecounter-examples

Done poorly to moderately,not easy to find

Sometimes done well,generally discoverable and available,

but could be improved

Integration

(I. Zaslavsky & CSIRO, BOM, WMO)

Data-intensive Science

Data

Acquisition &

modelling

Collaboration

and

visualisation

Analysis &

data mining

Dissemination

& sharing

Archiving and

preserving

fourthparadigm.org

Complex shared detector Simple instrument (if any)

Complex and Heavy process by experts Ad hoc observations and models

KB

GB

TB

PB

Science happens when PBs, TBs, GBs, and KBs can be mashed up simply

Provenance and trust widely variesData acquisition, early processing, and reporting ranges from a large government agency to individual scientists.

Smaller data often passed around in email; big data downloads can take days (if at all)

Data sharing concerns and patterns varyOpen access followed by (non-repeatable and tedious) pre-processing

True science ready data set but concerns about misuse, misunderstanding particularly for hard won data.

Computational tools differ. Not everyone can get an account at a supercomputer center

Very large computations require engineering (error handling)

Space and time aren’t always simple dimensions

Getting what you need, when you need it

Cloud computing is good for…

http://github.com/windowsazure

Customer Data Center

http://fetchclimate2.cloudapp.net/

Data Marketplaces

Web search:

“open weather

data azure”

Weather Forecast Computation as a Service

ttp://aka.ms/oljnt2

http://weatherservice.cloudapp.net

http://research.microsoft.com/en-us/projects/azure/technical-papers.aspx

http://aka.ms/dm0 http://research.microsoft.com/projects/msrceesdm/

MODIS Azure: Computing Evapotranspiration (ET) in the Cloud

A pipeline for

download,

processing, and

reduction of

diverse NASA

MODIS satellite

imagery.

Catharine van Ingen (Microsoft Research), Jie Li, Marty Humphrey (UVA), Youngryel Ryu (UCB), Deb Agarwal (BWC/LBL), Keith Jackson

(BL), Jay Borenstein (Stanford) , Team SICT: Vlad Andrei, Klaus Ganser, Samir Selman, Nandita Prabhu (Stanford), Team Nimbus: David Li,

Sudarshan Rangarajan, Shantanu Kurhekar, Riddhi Mittal (Stanford)

MODIS Azure Service

Reduction #1 Queue

Scientific

Results

Downloa

d

Reduction #2 Queue

Source

Metadata

MODIS Azure

Service Web Role

Portal

Request

Queue

Analysis Reduction Stage

Data Collection Stage

Source Imagery Download Sites

. . .

Reprojection

Queue

Derivation Reduction Stage Reprojection Stage

Download

Queue

Scientists

Science results

Catharine van Ingen (Microsoft Research), Jie Li, Marty Humphrey (UVA), Youngryel Ryu (UCB), Deb Agarwal (BWC/LBL), Keith Jackson

(BL), Jay Borenstein (Stanford) , Team SICT: Vlad Andrei, Klaus Ganser, Samir Selman, Nandita Prabhu (Stanford), Team Nimbus: David Li,

Sudarshan Rangarajan, Shantanu Kurhekar, Riddhi Mittal (Stanford)

Use laptops &

desktop computers

Overwhelmed by

data

Finding analysis

ever more difficult;

sharing even

harder

www.azure4research.com

Recommended