20
SemantAqua: A Semantically- Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano, and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute Troy, NY, USA

SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Embed Size (px)

Citation preview

Page 1: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal

Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano, and Deborah L. McGuinness

Tetherless World ConstellationRensselaer Polytechnic Institute

Troy, NY, USA

Page 2: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Real Life Motivating Example

• In 2009, in Bristol County, Rhode Island, children became ill with symptoms such as diarrhea. The cause was found to be polluted water (E. Coli) and citizens were asked to boil water until the issue was resolved.

• Public concerns: o “When did the contamination

begin?”o “How did this happen?” o “How can we keep it from

happening again?”http://www.care2.com/news/member/464062319/925799

Page 3: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Challenges

1.Raw data from multiple sources and in different formats – difficult to integrate and query.

2.Semantics of the water quality data are not explicitly encoded in the data – machine can’t process data automatically.

3.Large amount of data due to large spatial region, long time span, and large number of pollutants and regulated limit – analysis can be time consuming and complex.

Page 4: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Semantics Can Help

1.Raw datasets can be represented in RDF and ontologies enable integration of data between sources

2.Ontologies also add meaning using OWL2 Datatype Restrictions and ObjectIntersectionOf

3.SPARQL CONSTRUCT and classification using Pellet allow automated reasoning over small subsets of data for efficiency.

Page 5: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

SemantAqua

• Identifies point sources of water pollution, including water sites monitored by USGS and polluting facilities regulated by EPA.

• Demonstrates the effectiveness of semantic web technologies in addressing the challenges faced by environmental informatics systems.

• Enable/Empower citizens & scientists to better explore water related information.

• Generalized to the SemantEco framework for representing environmental and ecological data

Page 6: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

SemantAQUA Workflow

Archive

CSV2RDF4LODEnhance

derive derive

integrate

archive

Publish

CSV2RDF4LODDirect

Visualize

Reason

Page 7: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

System Architecture

access

Virtuoso

Page 8: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Ontology

• Core SemantEco ontologyo Extends existing best

practice ontologies, e.g. SWEET, OWL-Time.

o Includes terms for relevant pollution concepts

o Can be use to conclude: “any water source that has a measurement outside of its allowable range” is a polluted water source.

Portion of the SemantEco and SemantAqua ontologies.

Page 9: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Ontology

• Regulation Ontologyo models the federal and state

water quality regulations for drinking water sources

o Can be use to define: for example, in California, “any measurement has value 0.01 mg/L is the limit for Arsenic”

o Combined with the core ontology, we can infer “any water source contains 0.01 mg/L of Arsenic is a polluted water source.”

Portion of Cal. Regulation Ontology.

Page 10: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Data Integration

• Adopt the data conversion and organization capabilities enabled by the TWC-LOGD portalo Linking to ontological terms: water:hasCharacteristico Linking to external data: “water:Arsenic”, linked to

“dbpedia:Arsenic” using rdfs:seeAlso• To date, encoded USGS and EPA data from

27 states (still ongoing)o 58,302 facilities, 586,902 water siteso 31,823,304 measurementso 3,096,127,859 triples

Page 11: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Provenance

• Preserves provenance in the Proof Markup Language (PML).

• Data Source Level Provenance:o The captured provenance data are used to

support provenance-based queries.• Reasoning level provenance:

o When water source been marked as polluted, user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data and the converted data.

Page 12: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

User Interface

1

2

3

4

Page 13: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

User Interface

5

Page 14: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Time Series Visualization

• Time series Visualization:o Presents data in time series visualization for

user to explore and analyze the data

Violation, measured value: 110

Violation, measured value: 8664

Violation, measured value: 24196

Violation, measured value: 11199

Limit value: 104

Page 15: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Link to Heath Domain

Page 16: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Data Reduction

• Segment data into state-wide per-agency graphs• Integration, pagination solved via SPARQL 1.1

Construct query:

CONSTRUCT {?site a water:WaterMeasurementSite .…?measurement a water:WaterMeasurement .…}WHERE {{SELECT ?site WHERE {GRAPH <…> { ?site a water:WaterMeasurementSite . }} limit 10}…}

Page 17: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Traditional SPARQL querySELECT DISTINCT ?site ?lat ?lngWHERE {?site a pol:MeasurementSite ; pol:hasMeasurement ?m ; geo:lat ?lat ; geo:long ?lng . ?m pol:hasCharacteristic ?c ; pol:hasValue ?v ; units:hasUnit ?u .?r a owl:Class ;rdfs:subClassOf [ owl:onProperty pol:hasCharacteristic ;owl:hasValue ?c ] ;rdfs:subClassOf [ owl:onProperty units:hasUnit ; owl:hasValue ?u ] ;rdfs:subClassOf [ owl:onProperty pol:hasValue ; owl:someValuesFrom [ ?p ?l ] ] .FILTER( isLiteral(?l) &&((?l < ?v && str(?p) = xsd:minExclusive) || …))}

With OWL reasoning

SELECT DISTINCT ?site ?lat ?lngWHERE { ?site a pol:PollutedSite ;geo:lat ?lat ; geo:long ?lng . }

Querying Simplification

Page 18: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Transparency and trust

• Provenance information encoded using semantic web technology supports transparency and trust. o SemantAqua provides detailed provenance

information: Original data, intermediate data, data source

o “What if” Scenario: user may trust data from certain authorities only. User can apply a stricter regulation from another state to a

local water source.

Page 19: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Future Work

• Expand SemantAqua to support all 50 states.• Add flood/weather information, and their effect on

water sources; regulations can be different under flood conditions

• Support reasoning over contaminants and their corresponding health effects.

• Expand use of SemantEco ontology to other environmental topics: soil quality, air quality (e.g. support data from EPA’s CASTNET)

Page 20: SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Questions?

http://tw.rpi.edu/web/project/SemantAQUAhttp://inference-web.org/wiki/Semantic_Water_Quality_Portal