Upload
lee
View
31
Download
3
Embed Size (px)
DESCRIPTION
Collect. Store. Present. Analyze. Search. Retrieve. The Data Author’s Perspective: Lessons Learned From Data Creation to Data Curation. Jeff Dozier James E. Frew. Snow spectral reflectance and absorption coefficient of ice. Landsat Thematic Mapper (TM) band combinations. - PowerPoint PPT Presentation
Citation preview
1
The Data Author’s Perspective: Lessons Learned From Data Creation to Data Curation
Collect
Store
Search
Retrieve
Analyze
PresentJeff DozierJames E. Frew
2
Snow spectral reflectance and absorption coefficient of ice
0.0
0.2
0.4
0.6
0.8
1.0
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
wavelength, mm
refl
ect
an
ce
1.0E-09
1.0E-08
1.0E-07
1.0E-06
1.0E-05
1.0E-04
1.0E-03
1.0E-02
abso
rpti
on
co
effi
cien
t
0.05 mm
0.2 mm
0.5 mm
1.0 mm
absorptioncoefficient
3
Landsat Thematic Mapper (TM) band combinations
Bands 4,3,2 (R,G,B) Bands 5,4,2 (R,G,B)
4
0
20
40
60
80
100
0.3 0.8 1.3 1.8 2.3wavelength (mm)
refl
ec
tan
ce
(%
)
snow
vegetation
rock
equal snow-veg-rock
80% snow, 10% veg, 10% rock
20% snow, 50% veg, 30% rock
What you see, through Earth’s atmosphere
5
Spatial, spectral characteristics of Landsat and MODIS
0.4 0.5 0.6 0.8 1.0 1.2 1.5 2 3 4 5 6 8 10 12 15
10
20
30
50
100
200
300
500
1000
wavelength, m
IFO
V, m
Landsatpanchromatic
visible NIR SWIR
thermal
MODIShigh-resolution
“land”
ocean/atmosphere
visible
visible
NIR
NIR
SWIR
thermal
6
0
20
40
60
80
100
0.3 0.8 1.3 1.8 2.3wavelength (mm)
refl
ec
tan
ce
(%
)
snow
vegetation
rock
equal snow-veg-rock
80% snow, 10% veg, 10% rock
20% snow, 50% veg, 30% rock
What a multispectral sensor sees
7
Set of equations for each pixel
1 1 12 11
2 2 22 22
2
, , ,
, , ,
, , ,
where 0 1 and 1
Solve for , and (least squares)
Still to co
snow M
snow M
MN snow N N NM
i
R r cF
R r cF
FR r c
F
r c
1
2
N
a
a
a
F
F
nsider: better corrections for illumination angle,
viewing angle, subpixel topography, and vegetation
8
Fractional snow cover, Sierra Nevada, March 7 2004
9
Sierra Nevada topography
10
Daily MODIS acquisition, processing for Sierra Nevada snow cover and albedo
Ingest from NASA DAACs
Sierra Nevada = 36 MB/daySnow-covered land = 8 GB/day
reproject,mosaic,subset,format
MODIS snow cover & albedo algorithm
Database
Sierra Nevada = 10 MB/daySnow-covered land = 2 GB/day
MODsterTerra
ServerAlexandria
11
Examples of fractional snow cover, January through April 2004
Jan 01 2004
Jan 17 2004
Mar 26 2004
Apr 08 2004
12
Examples of grain size, January through April 2004
Jan 01 2004
Jan 17 2004
Mar 26 2004
Apr 08 2004
13
2004, March 3 vs March 4 2004, March 4 vs March 5 2004, March 5 vs March 7
2004, March 7 vs March 8
SCA (%)
CCA (%) Sensor zenith (degrees)
March 3 73 0 50
March 4 74 18 48
March 5 78 27 36
March 7 69 9 15
March 8 55 31 62
Effect of vegetation
14
Applications: snowmelt modeling, Marble Fork of the Kaweah River(Molotch et al., GRL, 2004)
Snow Covered Area net radiation > 0 degree days > 0
where:
mq = Energy to water depth conversion, 0.026 cm W-1 m2 day-1
convection parameter, based on wind speed, humidity, and roughnessra
Melt Flux net q d rR m T a SCA
15
Magnitude of snowmelt: Modeled – Observed snow water equivalence
assumedalbedo
assumed w/ update
AVIRISalbedo
SWE difference, cm Tokopah basin, Sierra Nevada
16
The data author’s perspective on drivers and constraints
• The science information user:– I want reliable, timely, usable science information products
» Accessibility
» Accountability
• The funding agencies and the science community:– We want this to be done by a distributed federation of providers,
not just by data centers
» Scalability
• The science information provider:– I’m doing just fine, thanks.
» Transparency
17
Research vs. production computing
Research computing is …
• Heterogeneous– multiple platforms,
applications, languages
• Idiosyncratic– researchers typically have
highly customized computing environments
• Problem-driven– focus on results, not processes
Production computing is …• Robust
– reliable, not just correct
• Standardized– can easily substitute
components for repair, upgrade, etc.
• Scalable– accommodates steady or
increasing demand for product
18
Principles
• Goal– Help scientists become information providers in a
federated data system
• Prime Directive– Minimal disruption of a working scientist’s
computational environment
• Ultimate product– Software, system architecture, and procedures for
turning science projects into a federation of providers
19
Model structure for MODIS snow-covered area and albedo
Basinmask
Processing Lineage
Watershedinfo
MODIScloud mask
(48 bits)
MODIS 7 land bands (112 bits)
MODIS quality flags
Topography
MODIS snow cover and grain
size
MODISview
angles
Solarzenith,
azimuth
Snowfraction
albedoRMSerror
Vegfraction
Soilfraction
Shadefraction
Open water
fraction
Quality flag
20
Lineage: current best practice
21
ESSW: Our Earth System Science WorkbenchProducer and consumer issues can both be addressed
by a laboratory metaphor• Experiment
– Network of models– … ingesting / synthesizing data– … generating products
• Laboratory– Experiment execution environment
» Computing + storage = accessibility + scalability
• Lab Notebook– Persistent storage that can be queried– Keeps track of all experiments
» Documentation + lineage = accountability
22
Use existing science applications• No “standard” Earth science computing
environment– commercial packages (ArcGIS, ENVI, MATLAB, …)
– public packages/models (MM5, MODTRAN, …)
– locally-developed codes
• Example: Snow cover from AVHRR commercial + standalone programs– parameters highly customized for UCSB
• How do we get these programs to– communicate
– cooperate
with the Earth System Science Workbench (ESSW), without rewriting?
Navigate(Manual/Automatic)
Receive
Ingest and Calibrate
Rectify
Snow-Covered Area
SnowMaps
23
Wrap Your App: Scripts talk to ESSW
• No changes,just additions– Wrapper scripts
» Make program (groups) look like ESSW experiments
– ESSW daemon
» Convertswrapper outputtodatabase input
– ESSW database
» Stores converted wrapper output
Navigate(Manual/Automatic)
Receive
Ingest and Calibrate
Rectify
Snow-Covered Area
SnowMaps
ESSWDatabase
Perl APIESSW
daemon
XML + SQL
MySQL
JDBC
Java
Perl
24
avhrr_handNav
AVHRR telemetry ingest
AHVRR Level 1Bproduct
AVHRR Level 1B:navigatedMulti-channel
snow-coveredarea
algorithm
AVHRR Level 0 product
avhrr_copyNav
Hand navigation details
Snow-covered area
Copynavigatedimage
SCA: navigated
avhrr_snowModel
avhrr_navd_sca
Hand navigationprocedure
avhrr_ingest
avhrr_navd_l1b
avhrr_L0
avhrr_l1b
avhrr_sca
Detailedexample
25
ESSW Lessons
• Providers are customers– Federations aren’t much good unless scientists are happy to put
information in them
• A light touch is the right touch– Wrapping is easier for scientists and their programmers to deal with than
complete re-engineering
• Scientists do write scripts, but not necessarily Perl– Scripting (gluing stuff together) comes naturally to scientists
• Scientists don’t write DTDs
• Nobody calls metadata APIs
ESSW was automatic, but not automatic enough…
26
ES3 : Earth System Science Server
cheap server
RAID 5 controller
cheap server
(mirror)
RAID 5 controller
Back Up Brick (BUB)
read read (backup)
write
cheap server
RAID 5 controller
cheap server
(mirror)
RAID 5 controller
Back Up Brick (BUB)
read read (backup)
write
data lineage tracking
BUB data storage ROCKS processing
clusters
Alexandria Digital Library
Microsoft TerraServer
MODster
OpenDAP
MODIS
Corona
AVHRR
Watershed-scale snow
product
Global-scale snow
product
27
From ESSW to ES3: Summary
• Perl wrappers Probulators
• Perl API web services + RDF messages
• SQL XML database(s)
28
From wrappers to probulators
Wrappers: active lineage• Good
– Complete control over what gets recorded– Single language/API for all wrapped events– Not tied to execution
» You can even lie about what happened
• Bad– Must explicitly script everything– Scripts can drift from reality
» You can even lie about what happened
29
From wrappers to probulators
Probulators: passive lineage• Good
– Record what actually happened» Not just what you think happened» Not what didn’t happen
– Automatic: don’t have to write new scripts for everything
• Bad– Different flavors for different environments
» Can’t just do everything in Perl…
30
Probulator flavors• Instrumentation
– Insert lineage capture instructions directly into science codes» e.g. “I just created file ‘foo’”
– Typical implementation: preprocessor/precompiler
• Overriding– Replace standard routines/libraries with lineage-capturing versions
» e.g. open(…) → snoopy_open(…)– Typical implementation: modify execution environment
» environment variables» configuration files
• Passive monitoring– Trace program execution
» e.g. “called open() with args foo, bar, …”– Typical implementation: strace’d shell
31
ES3 lineage architecture
probulator1
probulatorn
logger transmitter ES3 core
logfiles
32
Now What?
• Probulator reports not universally unique– Q: How hook separate reports together?
– A: Logger assigns UUIDs to
» Data streams
» Processes
» Jobs (workflows)
• Lineage not explicit– Q: How publish lineage?
– A: ES3 Core builds serialized graph
33
Products available from http://www.snow.ucsb.edu (forthcoming)
• Fractional snow-covered area, grain size (and contaminants) from daily MODIS images– Quality flags for cloud cover, highly oblique viewing– Fractional coverage of other endmembers
• Best estimate of snow-covered area and broadband albedo on that date– Extrapolating from previous values to that date and
smoothing
• End-of-season reanalysis of daily snow-covered area and broadband albedo– Interpolation, smoothing, comparison with in situ snow
pillow data