29
1 www.geongrid.org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Towards a Generic Framework for Towards a Generic Framework for Semantic Data Registration and Semantic Data Registration and Integration in Geosciences Integration in Geosciences Kai Lin, Chaitan Baru Kai Lin, Chaitan Baru San Diego Supercomputer Center San Diego Supercomputer Center University of California, San Diego University of California, San Diego

Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

  • Upload
    isi

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences. Kai Lin, Chaitan Baru San Diego Supercomputer Center University of California, San Diego. Data Integration Goal. Query heterogeneous data sources as a single resource - PowerPoint PPT Presentation

Citation preview

Page 1: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

1www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Towards a Generic Framework for Semantic Data Towards a Generic Framework for Semantic Data Registration and Integration in GeosciencesRegistration and Integration in Geosciences

Kai Lin, Chaitan BaruKai Lin, Chaitan Baru

San Diego Supercomputer CenterSan Diego Supercomputer Center

University of California, San DiegoUniversity of California, San Diego

Page 2: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

2www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Data Integration GoalData Integration Goal

• Query heterogeneous data sources as a single Query heterogeneous data sources as a single resourceresource– Query: not write a program (“ad hoc, non-procedural

query languages”)– Heterogeneous: local resource controls definition of the

data– Single resource: remove the burden of individually

accessing each data source

Page 3: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

3www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Data Integration Challenges: Data Integration Challenges: HeterogeneitiesHeterogeneities

• Syntactical Heterogeneity Syntactical Heterogeneity

heterogeneous data format heterogeneous data format

e.g. 02-04-2004 vs. 02/04/04

• Structural Heterogeneity Structural Heterogeneity heterogeneous data models and schemas

e.g. 02-04-2004 is saved as three columns or one columns

• Semantics HeterogeneitySemantics Heterogeneity fuzzy metadata, terminology, “hidden” semantics, implicit

assumptions

GEON Solution:• data should be semantically registered to GEON first• heterogeneities are resolved by registration

Page 4: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

4www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Levels of RegistrationLevels of Registration

• Metadata-level registrationMetadata-level registration– Register metadata associated with a resource submit required metadata. Predefined semantics.

• ““Item” level registrationItem” level registration– Register the “schema” of a resources, e.g. relational

database, shapefiles, …– Record semantics of schema elements, e.g. table name,

column name

• ““Item-Detail” level registrationItem-Detail” level registration– Register individual values in a dataset– Record semantics of each item in a record/column

Page 5: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

5www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Registering Structured DataRegistering Structured Data

• Relational databasesRelational databases• Shapefiles Shapefiles database tables database tables• Excel spreadsheets Excel spreadsheets database tables database tables• Delimited ASCII files Delimited ASCII files database tables database tables• Headers of scientific data files, e.g. netCDFHeaders of scientific data files, e.g. netCDF

Page 6: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

6www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Item Level Database Registration and AccessItem Level Database Registration and Access

Table

Table

Table

Table

View

View

Original Database

Table Def

Table Def View Def

Published Database select tables and

views to register

GEON Mediator

GEON JDBC Driver

Application

Page 7: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

7www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

How to Connect to GEON DatabasesHow to Connect to GEON Databases

• Download GEON JDBC Driver• Use the following code to create a connection

// load driverClass.forName ("org.geongrid.jdbc.driver.Driver");

// set the mediator URLString url = "jdbc:geon://geon01.sdsc.edu:2532/GEON-63cb404c-6038-11d9-a69f”;

// open the connectionConnection conn = DriverManager.getConnection(url, "geonuser", "geongrid");

GEON JDBC protocolThe host name and port number of GEON Mediator

GEON ID

Note: the original account information is not accessbile by end users

Page 8: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

8www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON Mediator Enables Write ProtectionGEON Mediator Enables Write Protection

Mediator

Database

UPDATE B

• Only accepts SELECT statements• Rejects any requests other than SELECT

A

B

C

B

Page 9: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

9www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Read Protection for Unregistered Tables and ViewsRead Protection for Unregistered Tables and Views

MediatorDatabase

SELECT *FROM A

An unregistered table or view is invisible to an end user• The data in the table can’t be viewed by SELECT statement • The schema can’t be fetched

A

B

C

B

Page 10: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

10www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON Database IntegrationGEON Database Integration

GEON Mediator supports integration at three levels

Level 1: Federation-Based Integration• End users need to be knowledgeable about each database

Level 2: View-Based Integration• End users see “integrated views”. An intermediary designs these views.

Level 3: Ontology-Based Integration• End users can query using familiar concepts• Requires middleware and formal representation of domain knowledge

Page 11: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

11www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Level 1: Federation-Based IntegrationLevel 1: Federation-Based Integration

C

A B

G

D

F

E

C

A B

D

GF

E

GEON Mediatorbackend

backendSELECT * FROM A, E WHERE ……

• Use SQL to query the federated database• Structural and semantic heterogeneity should be solved by users themselves

Page 12: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

12www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Level 2: View-Based IntegrationLevel 2: View-Based Integration

C

A B

G

D

F

E

CA B

D

GFE

GEON Mediatorbackend

backendSELECT * FROM V, W WHERE ……

• Allow defining views on top of the federated databases• Allow hiding the original backend schemas• Integration results can be shared and reused

V W

Page 13: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

13www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Level 3: Ontology-Based IntegrationLevel 3: Ontology-Based Integration

• Requires ontology annotations for backend databases • Use simple ontology query language to query the integrated database• End users do not need to know the backend schemas and local semantics

C

A B

G

D

F

E

CA B

D

GFE

GEON Mediatorbackend

backend Ontology Based Query

Page 14: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

14www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

GEON Ontology Based Data IntegrationGEON Ontology Based Data Integration

• Ontology Enabled Semantic IntegrationOntology Enabled Semantic Integration

Challenges for Computer Scientists and Domain ScientistsChallenges for Computer Scientists and Domain Scientists– Computer Scientists: build an integration system based on the

ontological registration of datasets– Domain Scientists: create domain ontologies– Data Providers: register datasets to ontologies

Ontology1 Ontology2 ontology3

dataset1 dataset2 dataset3 dataset4

Page 15: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

15www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Ontological Data Registration for Data integrationOntological Data Registration for Data integration

• Registering a dataset to an ontology for data integration Registering a dataset to an ontology for data integration is a procedure to generate a partial model of the ontology is a procedure to generate a partial model of the ontology from the dataset itselffrom the dataset itself

From registrationdataset

individuals ontology

p

Not all the constraints inthe ontology are satisfied

by the generated individuals

Page 16: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

16www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

• Associate one or more columns under an optional Associate one or more columns under an optional SQL condition to a selected class in the ontologySQL condition to a selected class in the ontology

• Provide a mapping method if no explicit names of Provide a mapping method if no explicit names of individuals should be generatedindividuals should be generated

Registering Relational Tables to Ontology ClassesRegistering Relational Tables to Ontology Classes

………… LatitudeLatitude ………… LongitudeLongitude …………

23.523.5 47.947.9

………… ………… ………… ………… …………

Location

(23.5, 47.9) is the name of an individual of the class Location

Same name indicates the same location

RockSampleRockSample GeologicAgeGeologicAge …… ……

Jurassic/TriassicJurassic/Triassic

PrecambrianPrecambrian

………… …………

GeologicalAge

Precambrian Cenozoic Paleozoic

Page 17: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

17www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Registering Relational Tables to Ontology Object PropertiesRegistering Relational Tables to Ontology Object Properties

• Associate two entities which are already registered to the Associate two entities which are already registered to the domain class and the range class of a selected object domain class and the range class of a selected object property in the ontologyproperty in the ontology

………… RockSampleIDRockSampleID ………… PERIODPERIOD …………

………… ………… ………… ………… …………

Rock GeologicAgehasAge

Page 18: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

18www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

Register item/item-detailto Ontology

ODAL(Ontological Database Annotation Language)

User querySOQL

(Simple Ontology Query Language)

ODAL and SOQLODAL and SOQL

Page 19: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

19www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

ODALODAL ((OOntological ntological DDatabase atabase AAnnotation nnotation LLanguage)anguage)

<odal:NamedIndividuals odal:id="RockSample" odal:database="VTDatabase"> <odal:Class odal:resource="http://geon.vt.edu#RockSample" /> <odal:Table>Samples</odal:Table> <odal:Table>RockTexture</odal:Table> <odal:Table>RockGeoChemistry</odal:Table> <odal:Table>ModalData</odal:Table> <odal:Table>MineralChemistry</odal:Table> <odal:Table>Images</odal:Table> <odal:Column>ssID</odal:Column> </odal:NamedIndividuals>

GUI

generateto ODALprocessor

The values in the column ssID of the table Samples, RockTexture, RockGeoChemistry, ModalData,MineralChemistry and Images represent instances of RockSample

• Create a partial model of ontologies from databases• Independent of end interface• Independent of specific database implementations• The ODAL mapping is itself a “first-class” object

Page 20: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

20www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

ODAL: Import OntologiesODAL: Import Ontologies

The Ontologies used for annotating a database can be imported as follows:The Ontologies used for annotating a database can be imported as follows:

<?xml version="1.0"?> <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” ><odal:Ontology> <odal:Imports rdf:resource="http://www.library.org/Book.owl"/> <odal:Imports rdf:resource="http://www.writer.org/Writer.owl"/></odal:Ontology>

……

</odal:ODAL>

Page 21: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

21www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

ODAL: Database Connection DeclarationODAL: Database Connection Declaration

The target databases for making annotation is declared as The target databases for making annotation is declared as follows:follows:

<?xml version="1.0"?> <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” >……<odal:Database odal:id="PublicationDatabase"> <odal:DatabaseProductName>Oracle<odal:DatabaseProductName> <odal:DatabaseProductVersion>9.1.21<odal:DatabaseProductVersion> <odal:Host>oracle.sdsc.edu</odal:Host> <odal:Port>3456</odal:Port> <odal:DatabaseName>Publications</odal:DatabaseName></odal:Database>……

</odal:ODAL>

Page 22: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

22www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

ODAL: Simple Named IndividualsODAL: Simple Named Individuals

<odal:NamedIndividuals odal:id="BookInTableBookPrice" <odal:NamedIndividuals odal:id="BookInTableBookPrice" odal:database="PublicationDatabase" >odal:database="PublicationDatabase" >

<odal:Class odal:resource="http://www.amazon.com/Book.owl#Book"/><odal:Class odal:resource="http://www.amazon.com/Book.owl#Book"/> <odal:Schema>Collections</odal:Schema><odal:Schema>Collections</odal:Schema> <odal:Table>book-price</odal:Table><odal:Table>book-price</odal:Table> <odal:Column>ISBN</odal:Column><odal:Column>ISBN</odal:Column>

</odal:NamedIndividuals></odal:NamedIndividuals>

Suppose the Book ontology contains a class Book and the schema Collection contains a table Book-Price with a column ISBN.

odal:id gives a name to the declaration, and represents the set of the individuals generated by the statement.

The statement says that each value in the column ISBN represents a book individual.

Page 23: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

23www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

ODAL: Named Individuals from Multiple ColumnsODAL: Named Individuals from Multiple Columns

<odal:NamedIndividuals odal:id="LocationInTableRockSample" ><odal:NamedIndividuals odal:id="LocationInTableRockSample" > <odal:Class odal:resource="http://www.usgs.org/Space.owl#Location"/><odal:Class odal:resource="http://www.usgs.org/Space.owl#Location"/> <odal:Schema>California</odal:Schema><odal:Schema>California</odal:Schema> <odal:Table>Rock-Sample</odal:Table><odal:Table>Rock-Sample</odal:Table> <odal:Column>Latitude</odal:Column><odal:Column>Latitude</odal:Column> <odal:Column>Longitude</odal:Column><odal:Column>Longitude</odal:Column></odal:NamedIndividuals></odal:NamedIndividuals>

Suppose an ontology contains a class Location and a database table Rock-Sample with two columns Latitude and Longitude.

The statement says that a pair of latitude and longitude gives a location

Page 24: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

24www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

ODAL: Named Individuals with ConditionsODAL: Named Individuals with Conditions

<odal:NamedIndividuals odal:id="MaleEmployeeInTableEmployee" > <odal:Class odal:resource="http://www.abc.com/Employee.owl#MaleEmployee"/> <odal:Table>employee</odal:Table> <odal:Column>EmployeeId</odal:Column> <odal:Condition><![CDATA[ Gender=’M’ >]]</odal:Condition></odal:NamedIndividuals>

<odal:NamedIndividuals odal:id="FemaleEmployeeInTableEmployee" > <odal:Class odal:resource="http://www.abc.com/Employee#FemaleEmployee"/> <odal:Table>employee</odal:Table> <odal:Column>EmployeeId</odal:Column> <odal:Condition><![CDATA[ Gender=’F’ >]]</odal:Condition></odal:NamedIndividuals>

A condition in an odal:Condition element should be a boolean expression which isvalid to be used in any WHERE clauses of SQL queries

Page 25: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

25www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

ODAL: Data Type Property DeclarationODAL: Data Type Property Declaration

<odal:NamedIndividuals odal:id="PersonInTablePerson" > <odal:Class odal:resource="http://www.foo.org/Person.owl#Person"/> <odal:Table>Person</odal:Table> <odal:Column>ssn</odal:Column></odal:NamedIndividuals>

<odal:OntologyProperty> <odal:DatatypeProperty odal:resource="http://www.foo.org/Person.owl#hasAge"/> <odal:Table>person</odal:Table> <odal:Domain odal:resource="PersonInTablePerson" /> <odal:Range odal:resource="age" /></odal:OntologyProperty>

……88……1234-56-78901234-56-7890……

……ageage……SSNSSN…… Person

double

hasAge

Page 26: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

26www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

• To join data across independent resources we need we need to know To join data across independent resources we need we need to know the correspondence between entities. the correspondence between entities.

• For example, does “10001” represent the same rock in the two For example, does “10001” represent the same rock in the two resources. By default, we assume they are not.resources. By default, we assume they are not.

• A set of datatype properties can be declared as a key for a class in the A set of datatype properties can be declared as a key for a class in the ontology. We do join cross multiple resources based on keys.ontology. We do join cross multiple resources based on keys.

e.g. e.g. { hasLatitude, hasLongitude}{ hasLatitude, hasLongitude} can be declared as a key of Location can be declared as a key of Location

Two locations from different resources are same if they have the same Two locations from different resources are same if they have the same latitude and longitude latitude and longitude

Conditions for Joining Individuals from Different ResourcesConditions for Joining Individuals from Different Resources

Rock

RockSampleIDRockSampleID

1000110001

… …......

RockIDRockID

1000110001

…… ……

Page 27: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

27www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

SOQL (SOQL (SSimple imple OOntology ntology QQuery uery LLanguage)anguage)

Query single or integrated resources • via ontologies (i.e., high level logical views)• independent of schema-level representation

RockSample Location

ValueWithUnit float

location

hasSiO2

value

lat long

unit

string

SELECT X.location.*; FROM RockSample X WHERE X.location.lat > 60 AND X.location.long > 100 AND X.hasSiO2.value < 30 AND X.hasSiO2.unit =‘weightPercetage’

GUIgenerate

to SOQLprocessor

Page 28: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

28www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

The Architecture of GEON Semantic MediatorThe Architecture of GEON Semantic Mediator

Portal or Application

Mediator JDBC Driver

GUI

SOQLSemantic Query Rewriter

SOQL Parser Ontology

Reasoner

SOQL Processor

Spatial SQL against federal schemas

SQL Parser

OWL ODAL

Query Execution

Query Optimization

QueryPlanning Internal Database

Oracle DB2 MySQLSQL

ServerPostgreSQL PostGIS

ODAL Processor

Page 29: Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

29www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES

SELECT X.code, X.location.* FROM SeismicStation X, Railroad Y WHERE distance(X.location, Y.geometry) < 1

SELECT X2.stationcode, X2.lat, X2.lon FROM railroads_of_the_united_states X1, stationdatatable X2 WHERE distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1

GEONSOQLGUI

SOQL Processor

Railroadshapefile

Seismic Stations

Schema Mediator

distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1

SELECT X1.the_geom FROM railroads X1

QuestionQuestion: Finding all seismic stations within 1 mile from railroads: Finding all seismic stations within 1 mile from railroads

SELECT X2.stationcode, X2.lat, X2.lon FROM stationdatatable X2

WHERE bounding box condition