Community Earth Science Informatics Initiatives & Their Impacts Lee Allison, Arizona Geological...

Preview:

Citation preview

Community Earth Science Community Earth Science Informatics Initiatives Informatics Initiatives

& Their Impacts& Their Impacts

Lee Allison, Arizona Geological SurveyAssociation of American State Geologists

In 5-10 years, if your data In 5-10 years, if your data are not online in an are not online in an integrated, interoperable integrated, interoperable network, you won’t exist.network, you won’t exist.

Prediction

200 million+ websites – if you don’t have a website, you don’t exist.

1000’s of National and Regional Databases1000’s of National and Regional Databases

topographic, orthoimagery, hydrography

mineral resources water geochemistry geophysics (aeromag,

gravity, aerorad) earthquake catalogs biological surveys vegetation/speciation

maps

Tower of Babel

Conclusions: Growing Consensus Conclusions: Growing Consensus for an NGSfor an NGS

Goals Goals – interoperable, distributed, Web-service – interoperable, distributed, Web-service based, synoptic 4-D systembased, synoptic 4-D system

ChallengesChallenges• TechnicalTechnical – adapting-adopting existing capabilities – adapting-adopting existing capabilities• Cultural –organizational – Cultural –organizational – controls, recognitioncontrols, recognition

How do we get there?How do we get there?• Agreement on standards, protocols, architectureAgreement on standards, protocols, architecture• Geological Surveys as data archives, providersGeological Surveys as data archives, providers• Parallel community efforts are linkingParallel community efforts are linking• Implementation is underwayImplementation is underway• Sustainability is an issueSustainability is an issue

Current electronic delivery

The GoalThe Goal

Most of the technology exists

Challenges are cultural and organizational

One system to rule them all, one system to find them, one system to bring them all, and in the darkness bind them.

With apologies to JRR Tolkien

How do we get there?How do we get there?

NSF to the Solid Earth Sciences: how do you build a sustainable community system? - 2-year community engagement process underway

Earth science cyberinfrastructure

DistributedWeb-basedInteroperable

Early paradigm:Central databases for each topic

Goal is making data interoperableGoal is making data interoperable

Ian Jackson, BGS

interoperabilityinteroperability

"The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires

the user to have little or no knowledge of the unique

characteristics of those units." ISO/IEC 2382-01 (SC36 Secretariat, 2003)

Example: the electrical utilityExample: the electrical utility Simple interface– put plug in wall, Simple interface– put plug in wall,

get electricityget electricity

Afghanistan 220 V 50 Hz Andorra 230 V 50 HzAnguilla 110 V 60 Hz Antigua 230 V* 60 Hz

Cayman Islands 120 V 60 Hz Cyprus 240 V 50 Hz

Czech Republic 230 V 50 Hz ……

ComplexityComplexity

Other complex thingsOther complex things

National Geoinformatics SystemNational Geoinformatics System “Killer applications” User cases & best practices in meeting stakeholder needs Data discovery, catalogs, inventories, metadata profiles,

metadata aggregation service(s) – 4D search engines, Informatics specifications, data model, interoperability, & standards

Web portal & Registry development and implementation Accessing & licensing protocols, recognition & credit Community of practice Communication, dissemination, & awareness Ontologies, vocabularies Access to high-resolution spatial geological & applied datasets “Big Iron” – high performance computing Digitization of legacy data Liaison and integration with related groups & initiatives Sustainability

Computer printer servicesComputer printer services Old daysOld days

Word Processor

Spreadsheet

HP Driver2

CalcompDriver2

Brothers Driver2

HP Driver1

CalcompDriver1

Brothers Driver1

HP printer

Calcomp plotter

Brothers printer

Each application has driver for each printer

Large format inkjet

NowNow

Word Processor

Spreadsheet

Printer driver

Metafileinterpreter Laserwriter

Film writer

Metafileinterpreter

Metafileinterpreter

Printing service, uses

Metafile= interchange

formatPrinter driver

wrapper wrapperservice

•Advantages•one driver (wrapper) per application

•Application need know nothing about printer—separation of concerns

Interoperability via Interoperability via web serviceweb service

GA

BGS

USGS

GSC

USGS

schema

BGS

schema

GA

schema

GSC

schema

GA

BGS

USGS

GSC

GA

BGS

NGMDB

GSC

wrapper

wrapper

wrapper

wrapper

wrapperWeb

ServicesClient

Communication between service providers and clientstakes place using XML mark up.Use of standard markup language means

schema mapping only needs to be done once

Participants implement one interface for each service

Applications focus on application logic, not data access.

Wrapper implements interface to service — formulate requests, interpret results

GeoSciML developersGeoSciML developers

GeoServerGeoServerKeyworth, UKKeyworth, UK

GeoServerGeoServerCanberraCanberra

CocoonCocoon

Uppalla, SwedenUppalla, Sweden

Mapserver Mapserver ArizonaArizona

CocoonCocoonOttawa, CanadaOttawa, Canada

GeoServer GeoServer Melbourne, AustraliaMelbourne, Australia

CocoonCocoonVirginia, USAVirginia, USA

IonicIonicOrleans, FranceOrleans, France

Tsukuba, JapanTsukuba, Japan

Mark-up language “wrapper” translates your dataMark-up language “wrapper” translates your data

Using a web service – step 1Using a web service – step 1

GeoSciML Web Services: Request

GeoSciML Web Services: Request

Web service request – step 2Web service request – step 2

GeoSciML Web Services: Response

Web service response – part 1Web service response – part 1

Web service response - part 2Web service response - part 2

GeoSciML Web Services: Response

Geoscience Information Network (GIN)Geoscience Information Network (GIN)

ORGANIZATION: Unique missions of geological surveys - collect, archive, disseminate data

2,000 – 3,000 databases 1000’s of collections

80,000+ geologic maps

DistributedWeb-basedInteroperable

•is distributed (vs centralized)is distributed (vs centralized)•is interoperable is interoperable •uses open source standards and uses open source standards and common protocols (OGC, GeoSciML)common protocols (OGC, GeoSciML)•respects and acknowledges data respects and acknowledges data ownershipownership•fosters communities of practice to fosters communities of practice to

grow grow •facilitates development of new web facilitates development of new web

services and clientsservices and clients

We agree on a data network that:We agree on a data network that:

System overview

GIN

Geologic map service scenario

ArcGISArcGIS

Catalog:NGMDB?

OneGeology?NDC?

GEON?NGDS?

OGC CSW

ArcMapOGC WMS

Survey map servers

Registration

National Geologic & Geophysical Data Preservation Program

-$1M per year

-National inventory

-Metadata catalogue

-National Digital Catalogue

79,000+ maps, 79,000+ maps, images, data, and images, data, and products from 350+ products from 350+ publishers publishers

Lexicon of Geologic Lexicon of Geologic Names of the United Names of the United StatesStates

Data discovery -

Defining GINDefining GIN collections of service collections of service

definitions, interchange definitions, interchange formats, and vocabulariesformats, and vocabularies

independent of hardware, independent of hardware, operating system, or lower-operating system, or lower-level network protocols level network protocols

new technology will only new technology will only require implementation of require implementation of network elements in a new network elements in a new environmentenvironment

architecture allows for the architecture allows for the use of multiple conventions use of multiple conventions for different user groups for different user groups

I nterchange format

standards

Communityengagement

Vocabularies

Discovery tools

Service definitions

GIN

I nterchange format

standards

Communityengagement

Vocabularies

Discovery tools

Service definitions

GIN

WWW WWW GINGIN http – hypertext http – hypertext

transfer protocol (& transfer protocol (& ftp, etc)ftp, etc)

html – hypertext html – hypertext mark-up languagemark-up language

url – universal url – universal resource locatorresource locator

browser – browser – built by built by othersothers

Open source Open source standards – Open standards – Open Geospatial ConsortiumGeospatial Consortium

data interchange tool data interchange tool – GeoSciML– GeoSciML

distributed data distributed data catalogues (National catalogues (National Geologic Map DB; Geologic Map DB; National Data National Data Catalogue, etc)Catalogue, etc)

Web services & Web services & applications – applications – built by built by othersothers

Challenges to building community

Who sets the standards?Who controls the system?Who makes the decisions?

Interoperability

The network is voluntary, not imposed from above

We won’t take your data away – We won’t take your data away – they stay with youthey stay with you

Your participation is voluntaryYour participation is voluntary

Keep your formats, system, Keep your formats, system, servers servers

Will 3,000 interoperable data bases Will 3,000 interoperable data bases become an 800-lb gorilla?become an 800-lb gorilla?

GIN is partnering with the global Earth science community

AASG & USGSNational Geoinformatics System OneGeology-Europe – 21 nations Marine Metadata Interoperability InitiativeUS DOE National Geothermal Data System (NGDS)US DOE Geothermal Technologies ProgramEnergy Industry Metadata Standards Working Group - Energistics

PARTNERS & COLLABORATORS:

MS SciScope – geospatial data discovery

Welcome to SciScope!SciScope is a tool by Microsoft Research to help geoscientists discover data from numerous data repositories with ease through a single, intuitive interface.

Users can display multiple map layers related to the scope of their study and interact with geographical features on the map including dams, rivers, water bodies, geology, aquifer systems, ecological regions and river basins.

GIN DEMO PROGRAMGIN DEMO PROGRAM

NSF INTEROP GINNSF INTEROP GIN 3 year development of 3 year development of

standards, servicesstandards, services Demos in ~6 SGSs; ~$80K Demos in ~6 SGSs; ~$80K

subcontractssubcontracts

““Circuit Riders”Circuit Riders” Part trainer, part management Part trainer, part management

consultant, part computer expertconsultant, part computer expert Write GeoSciML “wrappers”Write GeoSciML “wrappers” Guide server configurationsGuide server configurations Training, short coursesTraining, short courses $80K for demos across AASG$80K for demos across AASG

ADOPTION & DEPLOYMENTADOPTION & DEPLOYMENT

US Dept. of Energy (May, 2009)US Dept. of Energy (May, 2009)• National Geothermal Data System National Geothermal Data System

(NGDS)(NGDS)• GIN architecture, standardsGIN architecture, standards• $5M, 5 years$5M, 5 years• Adopted by US Geothermal Technologies Adopted by US Geothermal Technologies

ProgramProgram

Discovery, access, exchange (GIN)

National Geothermal Data SystemNational Geothermal Data System

Distributed data sources

Legacy data repository

Desktop applications (GeoSciNet)

Portals (GeoSciNet, SciScope)

NGDS

Ontologies, vocabularies

National Geothermal Data SystemNational Geothermal Data System

Data discovery, access, exchange: Data discovery, access, exchange: GINGIN

Distributed content: geothermal Distributed content: geothermal communitycommunity

Legacy data repository: NGDSLegacy data repository: NGDS Desktop applications (economic Desktop applications (economic

modeling tool, etc): GeoSciNetmodeling tool, etc): GeoSciNet Portals: GeoSciNet, SciScopePortals: GeoSciNet, SciScope

NATIONAL DEPLOYMENTNATIONAL DEPLOYMENT

US DOE “Geothermal Data Development, US DOE “Geothermal Data Development, Collection, and Maintenance”Collection, and Maintenance”

$20M, 1-5 awards$20M, 1-5 awards AASG proposal submittedAASG proposal submitted

106 nations

29 countries and European organizations are committed to create a geological map at 1:1.000.000 scale, integrated with metadata initially available in the following languages: English, French, Italian, Spanish, Swedish, Czech and Norwegian.

Network sustainabilityNetwork sustainability

tipping point at which users and tipping point at which users and providers will see the network as providers will see the network as critical to their basic functions critical to their basic functions

populating and using the network populating and using the network becomes a necessary cost of doing becomes a necessary cost of doing business business

how do we maintain network how do we maintain network functions?functions?

How do we get there?How do we get there?

NSF to the Solid Earth Sciences: how do you build a community system? - 2-year community engagement process underway

Geological Surveys as drivers? - USGS, 51 state surveys, 21 European surveys, 106+ nations

Linkage with other communities and natural science domains - MMI, OOS, CUAHSI-HIS, Geoscience Australia, iPlant, GBIF, ESIP, Energistics,…..

‘‘TIPPING POINT’TIPPING POINT’

Energy Industry Metadata Standards Energy Industry Metadata Standards Working GroupWorking Group• End-to-end discovery, access, and End-to-end discovery, access, and

exchange of upstream petroleum dataexchange of upstream petroleum data

97 members97 members

• American Geological Institute (AGI) American Geological Institute (AGI) • Baker Hughes Baker Hughes • BP BP • British Geological Survey (BGS) British Geological Survey (BGS) • Chevron Chevron • ConocoPhillips ConocoPhillips • Department of Interior (U.S. DOI-BLM-MMS) Department of Interior (U.S. DOI-BLM-MMS) • Directorate General of Hydrocarbons (India) (DGH) Directorate General of Hydrocarbons (India) (DGH) • ExxonMobil ExxonMobil • Ground Water Protection Council Ground Water Protection Council • Halliburton Halliburton • IBM Corporation IBM Corporation • IFP - Institut Francais du Petrole IFP - Institut Francais du Petrole • Norwegian Petroleum Directorate (NPD) Norwegian Petroleum Directorate (NPD) • Open Geospatial Consortium (OGC) Open Geospatial Consortium (OGC) • Pioneer Natural Resources Pioneer Natural Resources • SAIC-Science Applications Intl. Corp. SAIC-Science Applications Intl. Corp. • Saudi Aramco Saudi Aramco • Schlumberger Schlumberger • Shell Shell • Smith International Smith International • StatoilHydro ASA StatoilHydro ASA • TOTAL TOTAL • Woodside Energy Inc. Woodside Energy Inc.

Geoscience Information Geoscience Information NetworkNetwork

http://usgin.org