Upload
beatrix-snow
View
214
Download
0
Embed Size (px)
Citation preview
Data Grid Research GroupDept. of Computer Science and EngineeringThe Ohio State UniversityColumbus, Ohio 43210, USA
David Chiu & Gagan Agrawal
Enabling Ad Hoc Queries over Low-Level Scientific
Data Sets
2D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Presentation Outline
• Motivation‣ Current Trends in Scientific Data Management‣ Problem Discussion
• Data Registration Indexing‣ Metadata Extraction‣ Transformation
• Service Composition
• Conclusion
3D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Increased tremendously over the years
Scientific Data Sets
• The collection of scientific data has increased over the years with new instruments, simulations, etc.
• Data sets are stored in repositories around the globe
• Just within U.S. entities in the geospatial domain‣ NOAA: oceanic, climate, water
quality, ...‣ NASA: ozone, air quality, tropical, ...‣ NRCS: land quality, watershed, ...
4D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Increased tremendously over the years
Scientific Data Sets
• The collection of scientific data has increased over the years with new instruments, simulations, etc.
• Data sets are stored in repositories around the globe
• Just within U.S. entities in the geospatial domain‣ NOAA: oceanic, climate, water
quality, ...‣ NASA: ozone, air quality, tropical, ...‣ NRCS: land quality, watershed, ...
5D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Increased tremendously over the years
Scientific Data Sets
• The collection of scientific data has increased over the years with new instruments, simulations, etc.
• Data sets are stored in repositories around the globe
• Just within U.S. entities in the geospatial domain‣ NOAA: oceanic, climate, water
quality, ...‣ NASA: ozone, air quality, tropical, ...‣ NRCS: land quality, watershed, ...
6D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Repositories
Web or Data Grid InfrastructureMass StorageSystems (MSS)
7D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Scientific Data Sets
• Data sets are typically low level, i.e., ‣ Unstructured or semi-structured0101071895 0.34 -2.45 0.50 -0.65 -0.62 -0.71 0.00 -0.96 0101071896 -1.71 0.49 0.27 -0.79 -1.53 0.60 0.09 -2.210101071897 -0.53 0.14 4.32 1.95 -1.55 -1.68 -1.32 -0.690101071898 1.90 -2.64 -1.70 1.11 -2.18 -1.08 -0.53 -0.250101071899 0.44 0.97 1.65 -0.71 -2.02 -2.10 -0.50 -2.030101071900 -1.65 1.19 -1.34 0.57 -1.37 7.00 -0.48 -1.77 . . .
• However, data is well-documented‣ Accompanying XML-based metadata describing data sets is
typically required in today’s repositories
8D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Repositories
Mass StorageSystems (MSS)
Grid/Web Services & portals
Web or Data Grid Infrastructure
9D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Repositories in the Global Scale
US EU
AU ...
10
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
What Do the Users Want?
US
EU
AU
...
I don’t care where data is located.
I also want to share my own data with others!
Don’t just give me the data, but...
- Transform it - Manipulate it - Compose it with other processes and data sets
And do this with the least amount of work required from me!
11
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
System Goals
• To enable queries over low level data sets, which involves:‣ identification of relevant data sets‣ automatic planning for the composition of dependent
services (processes) for derivation
• ... while being non-intrusive to existing schemes, i.e.,‣ avoids a standardized format for storing data sets‣ accommodates heterogeneous metadata‣ this system should - fit - into existing MSS and scientific
computing infrastructures (Data Grid & the Web)
12
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
That’s good and all, but...
Challenges
• Not without challenges...‣ dealing with metadata from multiple entities‣ efficiently identifying relevant data sets‣ planning and executing accurate service compositions on
the spot
13
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
That’s good and all, but...
Challenges
• Not without challenges...‣ dealing with metadata from multiple entities‣ efficiently identifying relevant data sets‣ planning and executing accurate service compositions on
the spot
DOMAIN KNOWLEDGE & SEMANTICS
• And without question, the need for
14
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The AUSPICE System
AUSPICE: Automatic Service Planning and Execution in Cloud/Grid Environments
15
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The Semantics Layer
A Need for Domain Level Knowledge
• Assume the following service retrieves a satellite image pertaining to (x,y) with resolution respective to r
• Questions to ask the system:‣ How to deduce that this service can be used?‣ How to determine what information is needed for input?‣ Did the user provide enough information to invoke this service?
get_sat_image(double x, double y, double r)
inputsTo inputsToinputsTo
longitude latitude grid_size
outputsTo
satellite image
16
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
In the Semantics Layer
Applying Domain Information
Domain concepts can be derivedfrom executing a service
Domain concepts can also be derived from retrieving an
existing data setService parameters representdifferent domain concepts
17
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
• Handling heterogeneous metadata
• For instance, just within the geospatial domain,
Country Metadata Standards
US CSDGM
AU, NZ ANZLIC
EU ???
CDN ???
... ...
18
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
• Handling heterogeneous metadata
19
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
• Metadata Transformation
. .
.
(transform to spatial index)
20
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
• Metadata to DB transformations
. .
.
insert
21
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
22
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
23
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
24
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
In the Semantics Layer
Applying Domain Information
Data registration simplifies identification process within
25
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Indexing Services
• Services (inputs, outputs) are also registered in much the same way
26
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The Planning Layer
Service Composition: An Example
A subset of the ontology (unrolled)
27
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The Planning Layer
Service Composition
begin compSrvc(concept, Q[...])W := ()
//perform DFS starting from conceptlet v := concept be the currently visited node
if v is a data type then W := (W, index.getData(v, Q))
else //v is a servicelet (p1,..,pn) be v’s params
//recursive call on each piW := (W, (v, compSrvc(p1, Q), ... , compSrvc(pn, Q)))
end if
return Wend
28
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The Planning Layer
Service Composition: An Example
Ontology (unrolled)
A Derived Execution Plan This is what data registration provides
29
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Planning Times
30
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Conclusion
• The AUSPICE System...‣ unifies heterogeneous metadata‣ extracts certain metadata attributes and indexes low level
data sets and services for fast access from distributed repositories
‣ automatically composes these services and data sets to answer user queries
• Questions - Comments?‣ David Chiu [email protected]‣ Gagan Agrawal [email protected]