Transcript
Page 1: Data Wiki:  A Semantic Web Approach to Government Data

Data.gov Wiki: A Semantic Web Approach to

Government Data

Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li,

Deborah L. McGuinness, Jim Hendler

Tetherless World ConstellationNov 2, 2009

Page 2: Data Wiki:  A Semantic Web Approach to Government Data

Synergy

• Government: data is out there “as is”

• Loop: gov data and linked data

• Loop: gov data and web developers

• Loop: gov data and end users

Page 3: Data Wiki:  A Semantic Web Approach to Government Data

Government Data on the Web

Page 4: Data Wiki:  A Semantic Web Approach to Government Data

Objectives

• Investigate the role of semantic web in producing, processing and utilizing government datasets– To enrich the value of data via normalizing,

linking and information-extraction– To realize the value of data via applications,

esp. visualization– To support web developers via machine

friendly data access and web services

Page 5: Data Wiki:  A Semantic Web Approach to Government Data

Data Processors(Web Services & Analyzers)Data Processors(Web Services & Analyzers)

SPARQL Web Service

XSLT Service Diff Service

RDF/XML

RSS Generator

SPARQL End Point

Linked Data

Linked DataGOV data

(RDF)

Google Viz MIT Exhibit RSS 1.0 tagCloud

CSVXSL…

Tabulator

Convert D

ataLink &

Enrich D

ataV

iew &

Use D

ata

Link Annotator

RDF/XML

Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug 7 2009 · http://data-gov.tw.rpi.edu/

Sem Wiki

Semantic Web Architecture for Government Data

Page 6: Data Wiki:  A Semantic Web Approach to Government Data

The Landscape

Page 7: Data Wiki:  A Semantic Web Approach to Government Data

The catalog data

Page 8: Data Wiki:  A Semantic Web Approach to Government Data

(#10) Residential Energy Consumption Survey

(#401) Budget Authority and

offsetting receipts1976-2014

(#403) Governmental

Receipts1962-2014

(#402) Outlays and

offsetting receipts1962-2014

(#249) 2006 Toxics Release

Inventory

(#90) 2005-2007 ACS PUMS

Housing (#191) 2005 Toxics Release

Inventory

(#91) 2005-2007 ACS PUMS Population

(#34) Worldwide M1+

Earthquakes past 7 days

(#9) CASTNET Visibility

(#397) 2007 Toxics Release

Inventory

(#8) CASTNET Ozone

Budget

Population

Energy and Utilities

Geography and Environment

(@10001)CASTNET sites

Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug 7 2009 · http://data-gov.tw.rpi.edu/

Data-gov Cloud (Aug 2009)

Page 9: Data Wiki:  A Semantic Web Approach to Government Data

Data-gov Cloud (Oct 2009)

Li Ding and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Oct 2009 · http://data-gov.tw.rpi.edu/

US-COMMUNITY(2005-2007)

CASTNET(1990 – Present)

RECS(2005)

GOV-BUDGET(1962-2014)

TOXIC-RELEASE(2005-2008)

EARTHQUAKE(Present)

STATE-LIB(2006-2007)

PUBLIC-LIB(1992-2006)

MED-COST(1994-2009)

LABOR-STAT(19xx-Present)

DATA-GOV-CATALOG(present)

Government

Community

Services

Environment

CASTNET sites

RECS code

US agency US location

Linked Data

USAspending(2008-2010)

GeoNamesGeoNames

Page 10: Data Wiki:  A Semantic Web Approach to Government Data

More statistics

Page 11: Data Wiki:  A Semantic Web Approach to Government Data

Demos

Page 12: Data Wiki:  A Semantic Web Approach to Government Data

Data.gov + epa.gov

Page 13: Data Wiki:  A Semantic Web Approach to Government Data

Gov Data + Corporate Data + User Data

Page 14: Data Wiki:  A Semantic Web Approach to Government Data

Computing Difference of Revisions

Page 15: Data Wiki:  A Semantic Web Approach to Government Data

More demos?

• http://data-gov.tw.rpi.edu/wiki/demos

Page 16: Data Wiki:  A Semantic Web Approach to Government Data

Technical Issues

Page 17: Data Wiki:  A Semantic Web Approach to Government Data

Issues in Data.gov

• Duplicated Datasets- Some datasets are part of another dataset

– Dataset 140 (2005 Toxics Release Inventory data for the state of California (EPA)) is a subset of Dataset 191.

• Formatting Issues - The format of some datasets is not friendly to machine processing.

– Dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases (US Bureau of Reclamation)).

– Dataset 335 (National Longitudinal Surveys (US Bureau of Labor Statistics)) tells you how to order data from the government.

• Access Point Issues - The access points are interactive webpage which is not friendly for machine access.

– Dataset 330 (Local Area Unemployment Statistics (US Bureau of Labor Statistics)

Sarah

Page 18: Data Wiki:  A Semantic Web Approach to Government Data

Linking Data

1. link similar datasets by reusing property namespace

2. link to rdfs:label (via rdfs:subPropertyOf) using semantic wiki

3. link to DBpedia (via owl:sameAs) using wikipedia widget

4. link instances (via common <property, literal-value> pair)

5. link government data with web data (via time and location)

6. link revisions of government data (via knowledge provenance)

Page 19: Data Wiki:  A Semantic Web Approach to Government Data

Semantic mapping: AI + CI

need manual disambiguation!

Map to Wikipedia/DBpedia Name

Page 20: Data Wiki:  A Semantic Web Approach to Government Data

RDF => SPARQL => Web

• We use SPARQL to bridge Web devlopers and Semantic Web data.

• A triple store is used to support handling multi-million triple RDF datasets

Page 21: Data Wiki:  A Semantic Web Approach to Government Data

Conclusion

semantic web enabled portal for linked government data

5 billion triples from data.gov hosts apps, demos & services provide education services integrates web users’ contributions