- 1. Cyberinfrastructure to integrate simulation, data and
sensors for collaborative eScience in CRESIS CERSER and
CRESIShttp://nia.ecsu.edu/ Elizabeth City State UniversityOctober
19 2006 Geoffrey Fox Computer Science, Informatics, Physics
Pervasive Technology Laboratories Indiana University Bloomington IN
47401 [email_address] http:// www.infomall.org
2. Abstract
- Cyberinfrastructuresupports eScience or collaborative science
with distributed scientists, computers, data repositories and
sensors.
- We describe the emergingGrid softwarefor eScience and the
underlying Cyberinfrastructure such as theTeraGrid .
- We give one examples in detail:iSERVO the International Solid
Earth Research Virtual Organization supporting Earthquake
Science
- This illustratesComputing Grids ,Geographical Information
System Grids ,Sensor Grids
- We suggest implications forCReSIS Center for Remote Sensing of
Ice Sheets
3. Why Cyberinfrastructure Useful
- Supportsdistributed science data, people, computers
- ExploitsInternet technology(Web2.0) adding management,
security, supercomputers etc.
- It has two aspects:parallel low latency (microseconds) between
nodes anddistributed highish latency (milliseconds) between
nodes
- Parallel needed to gethigh performanceonindividual3D
simulations, data analysis etc.; mustdecompose problem
- Distributed aspectintegratesalready distinct components
- Cyberinfrastructure is in general adistributed collection of
parallel systems
- Grids are made of servicesthat are just programs or data
sources packaged for distributed access
4. e-moreorlessanything and the Grid
- e-Scienceis about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it. from
its inventorJohn TaylorDirector General of Research Councils UK,
Office of Science and Technology
- e-Scienceis about developing tools and technologies that allow
scientists to do faster, better or different research
- Similarlye-Businesscaptures an emerging view of corporations as
dynamicvirtual organizationslinking employees, customers and
stakeholders across the world.
-
- The growing use ofoutsourcingis one example
- TheGridprovides the information
technologye-infrastructurefore-moreorlessanything .
- Adeluge of dataof unprecedented and inevitable size must be
managed and understood.
- People ,computers ,dataandinstrumentsmust be linked.
- On demandassignment of experts, computers, networks and storage
resources must be supported
5. TeraGrid: Integrating NSF Cyberinfrastructure TeraGrid is a
facility that integrates computational, information, and analysis
resources at the San Diego Supercomputer Center, the Texas Advanced
Computing Center, the University of Chicago / Argonne National
Laboratory, the National Center for Supercomputing Applications,
Purdue University,Indiana University , Oak Ridge National
Laboratory, the Pittsburgh Supercomputing Center, and the National
Center for Atmospheric Research. Today 100 Teraflop; tomorrow a
petaflop; Indiana 20 teraflop today. SDSC TACC UC/ANL NCSA ORNL PU
IU PSC NCAR Caltech USC-ISI Utah Iowa Cornell Buffalo UNC-RENCI
Wisc 6. Virtual Observatory Astronomy Grid Integrate Experiments
Radio Far-Infrared Visible Visible + X-ray Dust Map Galaxy Density
Map 7. Grid Capabilities for Science
- Opentechnologies for anylarge scale distributed systemthat is
adopted by industry, many sciences and many countries (including
UK, EU, USA, Asia)
-
- Security, Reliability, Management and state standards
- Serviceand messaging specifications
- User interfacesvia portals and portlets virtualizing to
desktops, email, PDAs etc.
-
- ~20 TeraGridScience Gateways(their name for portals)
-
- OGCE Portaltechnology effort led by Indiana
- Uniform approach to access
distributed(super)computerssupportingsingle (large) jobsandspawning
lots of related jobs
- Dataandmeta-dataarchitecture supporting real-time and archives
as well as federation
-
- Links toSemantic webandannotation
- Grid (Web service) workflow with standards and several
successful instantiations (such asTavernaandMyLead )
- ManyEarth science gridsincluding ESG (DoE), GEON, LEAD, SCEC,
SERVO; LTER and NEON for Environment
-
- http://www.nsf.gov/od/oci/ci-v7.pdf
8. APEC Cooperation for Earthquake Simulation
- ACESis a seven year-long collaboration among scientists
interested inearthquake and tsunami predication
-
- iSERVOis Infrastructure to support work of ACES
-
- SERVOGridis (completed) USGrid that isa prototype of
iSERVO
-
- http://www.quakes.uq.edu.au/ACES/
- Chartered underAPECthe Asia Pacific EconomicCooperation of 21
economies
9. Database Analysis andVisualization Portal Repositories
Federated Databases DataFilter Services StreamingData Sensors
SERVOGrid Research Simulations Research Education Customization
Services FromResearch to Education Education GridComputer Farm Grid
of Grids: Research Grid and Education Grid Sensor Grid Database
Grid Compute Grid Database Field Trip Data ? Discovery Services GIS
Grid 10. SERVOGrid and Cyberinfrastructure
- Gridsare the technology based on Web services that
implementCyberinfrastructurei.e. support eScience or science as a
team sport
-
- Internet scale managed services that linkcomputers data
repositories sensors instrumentsandpeople
- There is aportaland services inSERVOGridfor
-
- Applicationssuch as GeoFEST, RDAHMM, Pattern Informatics,
Virtual California (VC), Simplex, mesh generating programs ..
-
- Job managementand monitoring web services for running the above
codes.
-
- File managementweb services for moving files between various
machines.
-
- Geographical Information System services
-
- Quaketablesearthquake specific database
-
- Sensorsas well as databases
-
- Context(dynamic metadata) andUDDIsystem long term metadata
services
-
- Services supportstreaming real-timedata
11. a Topography 1 km Stress Change Earthquakes PBO
Site-specific Irregular Scalar Measurements Constellations for
Plate Boundary-Scale Vector Measurements a a Ice Sheets Volcanoes
Long Valley, CA Northridge, CA Hector Mine, CA Greenland 12. Some
Grid Concepts I
- Servicesare just (distributed) programs sending and receiving
messages with well defined syntax
- Interfaces(input-output)must be open ; innards can be open
source (allowing you to modify) or proprietary
-
- Services can be any language from Fortran, Shell scripts, C,
C#, C++, Java, Python, Perl your choice!!
-
- Web Servicessupported by all vendors (IBM, Microsoft )
- Service overheadwill be just afew milliseconds(more now) which
is < typical network transit time
-
- Any program that is distributed can be a Web service
-
- Any program taking execution time 20ms can be an efficient Web
service
13. Web services
- Web Servicesbuildloosely-coupled, distributedapplications,
(wrapping existing codes and databases) based on theSOA(service
oriented architecture) principles.
- Web Services interact by exchanging messages inSOAP format
- The contracts for the message exchanges that implement those
interactions are described viaWSDLinterfaces.
14. A typical Web Service
- In principle, services can be inanylanguage (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI
Messages, CGI Web invocations, totally compiled away
(inlining)
- The simplest implementations involveXML messages (SOAP)and
programs written in net friendly languages like Java and
Python
Payment Credit Card Warehouse Shipping control WSDL interfaces
WSDL interfaces Web Services Web Services Security Catalog Portal
Service 15. Some Grid Concepts II
- Systems are built from contributions from many different groups
you do not need one vendor for all components as Web services allow
interoperability between components
-
- One reasonDoD likes Grids(called Net-Centric computing)
- Grids aredistributedin services and data allowing anybody to
store their data and to produce their view
-
- Some think that University Library of future will curate/store
data of their faculty
- 2 level programming model : Classic programming of services and
services are composed using workflow consistent with industry
standards (BPEL)
- Grid of Grids : (System of Systems) Realistically Grid-like
systems will be built using multiple technologies and standards
integrate separate Grids for Sensors, GIS, Visualization, computing
etc. withOGSA(Open Grid Service Architecture from OGF) system Grid
(Security, registry) into a singleGrid
- Existing codes UNCHANGED ; wrap as a service with metadata
16. TeraGrid User Portal 17. LEAD Gateway Portal NSF Large ITR
and Teragrid Gateway - Adaptive Response to Mesoscale weather
events - Supports Data exploration,Grid Workflow 18. Grid Workflow
Data Assimilation in Earth Science
- Grid servicestriggered by abnormal events and controlled
byworkflowprocess real time data from radar and high resolution
simulations for tornado forecasts
Use a Portlet-based user portal to accessand control services
and workflow 19. SERVOGrid has a portal
- The Portal is built from portlets providing user interface
fragments for each service that are composed into the full
interface uses OGCE technology as does planetary science VLAB
portal with University of Minnesota
20. GIS and Sensor Grids
- OGChas defined a suite ofdata structuresandservicesto
supportGeographical Information Systems and Sensors
- GMLGeography Markup language defines specification of
geo-referenced data
- SensorMLandO&M(Observation and Measurements) define
meta-data and data structure for sensors
- Services likeWeb Map Service, Web Feature Service, Sensor
Collection Servicedefine services interfaces to access GIS and
sensor information
- Grid workflowlinks services that are designed to support
streaming input and output messages
- We built Grid (Web) service implementations of these
specifications for NASAsSERVOGrid
- UseGoogle mapsas front end to WMS and WFS
21. Grid Workflow Datamining in Earth Science
- Work withScripps Institute
- Grid servicescontrolled byworkflowprocess real time data from
~70 GPS Sensors in Southern California
NASA GPS Earthquake Streaming Data Support Transformations Data
Checking Hidden Markov Datamining (JPL) Display (GIS) 22.
Earth/Atmosphere Grids built as Grids of (library) Grids Ice Sheet
Sensors, SAR, Filters, EM, Glacier Simulations Physical Network
Registry Metadata Earthquake Data, Filters & Simulation
Services EarthquakeSERVOGrid Ice Sheet PolarGrid Data
Access/Storage Portals Visualization Grid Collaboration Grid Sensor
Grid Compute Grid GIS Grid Core Grid Services TornadoGrid Security
Workflow Notification Messaging 23. CReSIS PolarGrid
- Important CReSIS-specific Cyberinfrastructure components
include
-
- Managed data fromsensorsandsatellites
-
- Data analysis such asSAR processing possibly with parallel
algorithms
-
- Electromagnetic simulations(currently commercial codes) to
design instrument antennas
-
- 3D simulations ofice-sheets(glaciers) with non-uniform
meshes
-
- GISGeographical Information Systems
- Also need capabilities present in many Grids
-
- Portali.e. Science Gateway
-
- Submittingmultiple sequential or paralleljobs
24. What should we do?
- Identifyexisting programsthat should be wrapped asGrid
services
-
- One can do this even for commercial codes as one keeps existing
codes (Fortran, C++) unchanged and constructs a metadata wrapper
defining where programs and its data are located and how to
invoke.
- Identify whereparallel versionsneeded and ifhelpneeded in
creating these
-
- Parallel codes can be Grid services
-
- Electromagnetic codes are commercial in principle parallel
-
- Ice sheet models can be parallelized for high resolution
simulations
- Scope out system;Computationalneeds -Identify value ofTeraGrid
; datastorageneeds;networkrequirements
- Examinedata modeland produce a dataGrid architecture
-
- Use databases? Distributed? Metadata? Files? What are key
performance issues?
- Examine integration ofGISwith Grid Services
- Design and implementScience Gateway
- Are there importantvisualizationrequirements outside GIS?
- Are there key issues fromsecurity ?
- Bring up core services such asregistries
- Need infrastructure to run services ( Linux PC )
25. Benefits of CReSIS PolarGrid
- Shared resources supportcollaboration among CReSIS
scientists
- Integrationof Polar related data with appropriate compute
resources enabling research on specific topics and studies across
topics
- Polar Science Gatewayaccessing common services (programs), data
and their integration as workflow
- Access toTeraGridwith same interface for large scale
simulations
- Can sharecommon capabilities(SAR analysis, GIS) with related
Grids such as SERVOGrid, GEON, LEAD etc.
- Modular Grid servicesallow exchange of new capabilities
preserving systems
-
- e.g. Change EM Simulation service
- Managementof dynamic heterogeneous data
26. SERVO/QuakeSim Services Eye Chart Weve built a Web Service
version of this OGC standard.Weve extended it to support data
streaming for increased performance. Web Feature Service We built a
Web Service version of this Open Geospatial Consortium
specification.The WMS constructs images out of abstract feature
descriptions. Web Map Service We have built data model extensions
to UDDI to support XPath queries over Geographical Information
System capability.xml files.This is designed to replace OGC (Open
Geospatial Consortium) Web registry service Information Service
This uses capabilities built into portal. Note that simulations are
typically performed on machines where user has accounts while data
services are shared for read access Authentication andAuthorization
We use an OGCE based portal based on portlet architecture Portal We
built a file web service that could do uploads, downloads, and
crossloads between different services. Clearly this supports
specific operations such as file browsing, creation, deletion and
copying. File Services We have an Application and a Host Descriptor
service based on XML schema descriptors.Portlet interfaces allow
code administrators to make applications available through the
browser. Application and Host Metadata Service We store information
gathered from users interactions with the portal interface in a
generic, recursively defined XML data structure.Typically we store
input parameters and choices made by the user so that we can
recover and reload these later.We also use this for monitoring
remote workflows.We have devoted considerable effort into
developing WS-Context to support the generalization of this initial
simple service. Context Data ServiceThese can be all launched by a
single Job Management service or by custom instances of this with
metadatapreset to a particular application Specific Applications:
Virtual California, Geofest,Park, RDAHMM .. SERVO wraps Apache Ant
as a web service and uses it to launch jobs.For a particular
application, we design a build.xml template.The interface is simply
a string array of build properties called for by the template.Weve
also built a simple generic template engine version of this. Job
Management Description Service 27. Service Eye Chart Continued
WS-Security JSDL WSRF BPEL OGSA-DAI Key
interfaces/standards/software NOT Used (often just for historical
reasons as project predated standard) GML WFS WMSWSDL XML Schema
with pull parser XPPSOAP with Axis 1.x UDDI WS-ContextJSR-168 JDBC
Servlets WS-Management VOTables in Research Key
interfaces/standards/software Used We are developing a Web Service
based on the National Virtual Observatorys VOTables XML format for
tabular data.We see this as a useful general format for ASCII data
produced by various application codes in SERVO and other projects.
Data Tables Web Service We are developing Dislin-based scientific
plotting services as a variation of our Web Map Service: for a
given input service, we can generate a raster image (like a contour
plot) which can be integrated with other scientific and GIS map
plot images. Scientific Plotting Services The USC QuakeTables fault
database project includes a web service that allows you to search
for Earthquake faults. QuakeTables Database Services This supplies
alertsto users when filters (data-mining) detects features of
interest Notification Service This is used to stream data in
workflow fed by real-time sources. It is based on NaradaBrokering
which can also be used in cases just involving archival data
Messaging Service We are developing infrastructure to support
streaming GPS signals and their successive filtering into different
formats.This is built over NaradaBrokering (see messaging
service).This does not use Web Services as such at present but the
filters can be controlled by HPSearch services. Sensor Grid
Services The HPSearch project uses HPSearch Web Services to execute
JavaScript workflow descriptions.It has more recently been revised
to support WS-Management and to support both workflow (where there
are many alternatives) and system management (where there is less
work). Management functions include life cycle of services and QoS
for inter-service links Workflow/Monitoring/Management Services 28.
Key GIS and Related Services Description Component
Publish/subscribe system allows data streams to be reorganized
using topics.Sensor Grid Supports integration of local and remote
map services; treats Google maps as an OGC-compliant map server;
Web Map Services Supports both streaming and non-streaming returns
of query results. Web Feature Service Contexts can be used to hold
arbitrary content (XML, URIs, name-value pairs); can be used to
support distributed session state as well as persistent data;
currently researching scalability. WS-Context Support for streaming
data between services; supports scriptable workflows so not limited
to DAGs; implementation of WS-Distributed Management HPSearch