57
Bernadette Hyland CEO & co-founder David Wood CTO & co-founder 1400 Key Blvd, Ste 100 Arlington VA 22209 Tel. +1-877-290-2127 [email protected] @BernHyland [email protected] @prototypo [email protected] @3RoundStones Extend Your Reach. Better Data. Smarter Decisions Resource Conservation and Recovery Act information published as Linked Open Data Presented: 20-Nov-2014

US EPA Resource Conservation and Recovery Act published as Linked Open Data

Embed Size (px)

DESCRIPTION

A presentation by 3 Round Stones to the US EPA on the new Linked Open Data Management System, including Linked Open Data on 4M facilities (from FRS), 25 years of Toxic Release Inventory (TRI), chemical substances (SRS), and Resource Conservation and Recovery Act (RCRA) content. This represents one of the largest Open Data projects published by a federal government agency using Open Source Software (OSS), Open Web Standards and government Open Data.

Citation preview

Page 1: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Bernadette HylandCEO & co-founder

David WoodCTO & co-founder

1400 Key Blvd, Ste 100

Arlington VA 22209

Tel. +1-877-290-2127

[email protected]@BernHyland

[email protected]@prototypo

[email protected]@3RoundStones

Extend Your Reach.

Better Data. Smarter Decisions

Resource Conservation and Recovery Act information published as

Linked Open Data

Presented: 20-Nov-2014

Page 2: US EPA Resource Conservation and Recovery Act published as Linked Open Data

With everything else happening in the world, why does

Linked Open Data matter anyway??

Page 3: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Taxpayers spend billions of dollars for our government to

collect data

We, the people, expect government

to treat information as an asset. Information must comply with

regulations (Quality of Information Act, Section 508, protects PII) and

should be:public, accessible, described, reusable, complete, timely,

sustainable over election cycles

Page 4: US EPA Resource Conservation and Recovery Act published as Linked Open Data

4

Credits: WV Chemical spill: http://www.nytimes.com/2014/01/11/us/west-virginia-chemical-spill.htmlHurricane Sandy: http://www.nytimes.com/2012/10/28/us/hurricane-sandy-on-collision-course-with-winter-storm.htmlEbola: http://www.nytimes.com/interactive/2014/07/31/world/africa/ebola-virus-outbreak-qa.html

Page 5: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Linked Open Data A fast way to combine, visualize &

share data from government & the public Web

Vital to first responders, scientists, policy makers, journalists &

the general public

Page 6: US EPA Resource Conservation and Recovery Act published as Linked Open Data

US Federal Government is listening … “Open Data” per M13-13*

• Public

• Accessible

• Described

• Reusable

• Complete

• Timely

• Managed Post-Release

• Project Open Data

• OMB & OSTP online tools, best practices & schema to help agencies implement M13-13. See Project Open Data

• May 9, 2014 the Digital Accountability & Transparency Act (DATA Act) became Public Law 113-101

Page 7: US EPA Resource Conservation and Recovery Act published as Linked Open Data

The goal of treating Information as an asset is

not new …

Page 8: US EPA Resource Conservation and Recovery Act published as Linked Open Data

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s

future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as

data.

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s future. The Web

took off as a web of hyperlinked documents which were exciting to read, but which could not be

effectively used as data.”

- Tim Berners-Lee

Page 9: US EPA Resource Conservation and Recovery Act published as Linked Open Data

We all know the ground truth of data on the Web

Page 10: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Lots of [government] open data without labels or context

Page 11: US EPA Resource Conservation and Recovery Act published as Linked Open Data

What is needed is …data that describes itself

Page 12: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Linked Open Data is called “self-describing” data

Linked Data is “A method of publishing structured data so that it can be interlinked &

become more useful. … Extends Web pages to share information in a way that can be

read automatically by computers.”

- Sir Tim Berners-Lee

Page 13: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Linked Data on the Web

my data

collector

collected by

measurement

Michael

first name

Hausenblaslast name

Person

a

a measurement

2011-01-01date

0

valueunits of measure

degrees Centigrade

...

Galway Airport

collected at

or

Page 14: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Quick update on US Government

Open Data Project

Page 15: US EPA Resource Conservation and Recovery Act published as Linked Open Data
Page 16: US EPA Resource Conservation and Recovery Act published as Linked Open Data

• On 3rd iteration of a data catalog, (now using CKAN)• >500k datasets from 200+ USG authorities

• Sustained executive support for data.gov via OMB & OSTP - Project Open Data• GSA team engaging with Open Data / OSS / standards

community• Health, Energy, Law, Education & Public Safety specific

communities in place.• Agencies are [beginning] to name Chief Data Officers

But we still have a lot to do …

Page 17: US EPA Resource Conservation and Recovery Act published as Linked Open Data

RCRA = Resource

Conservation and Recovery

Act

A search for “EPA RCRA” shows displayed

the first dataset 6th position :-(

This dataset is just one piece of a complex set

of data in understanding solid

waste reporting

First 5 results are for

Facilities Registry

Service …

Page 18: US EPA Resource Conservation and Recovery Act published as Linked Open Data

For example, The Right-to-Know Network is a

consumer of EPA open data from

data.gov

Page 19: US EPA Resource Conservation and Recovery Act published as Linked Open Data

They’ve build some nice

visualizations!

Page 20: US EPA Resource Conservation and Recovery Act published as Linked Open Data

But the Toxics Release Inventory (TRI) is

complicated data . The RTF Network

would have benefited from more context

had it been available from the EPA…

Page 21: US EPA Resource Conservation and Recovery Act published as Linked Open Data

RTK Network provides access

to machine readable content (as XML) but … it lacks context

This data does not use shared vocabularies

:-( No units of

measure, No definition of codes

Page 22: US EPA Resource Conservation and Recovery Act published as Linked Open Data

The power of Open

Apps created in days using Open Government Data + Open Source

+ Open Web Standards … On the cloud

Page 23: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Linked Data Management SystemFor government open data publishing

Funded by

Page 24: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Landing page for new EPA Open Data site

Page 25: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Search for facilities in your neighborhood…

Click through to an individual facility

Page 26: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Site allows people to view by map or by table

Page 27: US EPA Resource Conservation and Recovery Act published as Linked Open Data

This app shows nuclear power plants regulated by EPA

Page 28: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Apps using data from multiple EPA programs +

Open Data

Page 29: US EPA Resource Conservation and Recovery Act published as Linked Open Data
Page 30: US EPA Resource Conservation and Recovery Act published as Linked Open Data

1

2

3

4

5

6

Key to data sources: 1 Open Street Maps (OSS) 2 Raw data available for developers (RDF/XML) 3 EPA Resource Conservation and Recovery Act (RCRA) 4 & 5 EPA Facilities (FRS) 6 EPA Toxic Release Inventory (TRI)

Page 31: US EPA Resource Conservation and Recovery Act published as Linked Open Data
Page 32: US EPA Resource Conservation and Recovery Act published as Linked Open Data
Page 33: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Pollution graphs created in < 1 week using Open Source Software & EPA Linked Data

Page 34: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Pollution reports from multiple EPA programs available for a facility, not previously possible

Page 35: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Use of shared vocabularies, e.g. Places, Geographis, Dublin Core, Geo, FOAF, ORG, Vcard are the “lingua franca” of data interoperability

Page 36: US EPA Resource Conservation and Recovery Act published as Linked Open Data

V4 Handler Module

Page 37: US EPA Resource Conservation and Recovery Act published as Linked Open Data

V5 Handler Module• HHANDLER5

• HBASIC

• HNAICS5

• LU_NAICS

• HSTATE_ACTIVITY5

• LU_STATE_ACTIVITY

• HOWNER_OPERATOR5

• HUNIVERSAL_WASTE5

• LU_UNIVERSAL_WASTE

• HWASTE_CODE5

• LU_WASTE_CODE

• HCERTIFICATION5

• HOTHER_PERMIT5

New

Page 38: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Linked Data Model

Page 39: US EPA Resource Conservation and Recovery Act published as Linked Open Data

(RCRA Facility ID)

(FRS Facilities ID)

owl:sameAs

rdfs:label

(FRS State)

frs:state

(FRS Region)

frs:region

rcra:isNonNotifer

vcard:hasAddress

rcra:landType

vcard:hasAddress

street-address

region

postal-code

country-name

vcard:Address

a

locality

(address)

county-name

street-address

region

postal-code

country-name

locality

(address)

(Appropriate RCRA Class)

a

foaf:based_near

frs:state

frs:state

Handler

asubClassOf

(land type)

rcra:LandType

ardfs:comment

rdfs:label

(RCRA Activity)

rcra:hasActivity

vcard:Postal

a

(non-notifier code)

rdfs:label

rcra:NonNotifierCode

a

Page 40: US EPA Resource Conservation and Recovery Act published as Linked Open Data

(RCRA Activity)

vcard:Postal

rcra:Activity

a rcra:receivedReportOn

xsd:Date

rcra:reportTypeor

rcra:activityTypercra:reportedInCycle

rcra:inaccessibleDueTo

rcra:has_naics

(Owner/Operator)

rcra:has_current_ownerrcra:has_past_owner

rcra:has_current_operatorrcra:has_past_operator

vcard:hasAddress

a

street-address

region

postal-code

country-name

locality

(address)

(State Activity Type) rcra:state_activity

(NAICS code)

rcra:active_status

rcra:naics_cycle rdfs:comment

rdfs:label

rdfs:comment

rdfs:label

frs:state

rcra:active_status

xsd:Boolean

xsd:Boolean

(source type)

rcra:ReportTypeor

rcra:ActivityType

ardfs:comment

rdfs:label

(accessibility)

rcra:AccessibilityCode

a

rdfs:comment

rdfs:label

(RCRA Facility ID)

rcra:hasActivity (State or Region )

rcra:hasRegulator

rdfs:label

frs:Stateor

frs:Region

a

(universal waste type)

rcra:reportsUniversalWasteType rdfs:label

rcra:UniversalWasteType

a

rcra:isActive

xsd:Boolean

rcra:accumulated

xsd:Boolean

rcra:generated xsd:Boolean

(waste type)

rcra:reportsWasteType

rdfs:label

rcra:WasteTypea

rcra:isActive

xsd:Boolean

rcra:hasRegulator

(certification)

rcra:hasCertification

rcra:Certification

a rcra:hasCertificationSequence

rcra:certifiedOnxsd:Integer

(point of contact)

rcra:hasPOC

foaf:Person

a

foaf:name

foaf:title

(other permit)

rcra:hasOtherPermit

rdfs:comment

rcra:OtherPermit

a

rcra:hasPermitNumber

Page 41: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Issues: Incomplete Human-readable descriptions

LU_UNIVERSAL_WASTE

• Still missing California code descriptions (3)

• Should be entered in V5 Handler Module

LU_STATE_ACTIVITY

• 118 of 224 codes missing descriptions (53%)

• Alabama’s recovered from https://rcrainfo.epa.gov/rcrainfo/help/dataentry/rpt_lu_state_activity.pdf

• Should be entered in V5 Handler Module

Page 42: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Issues: Incomplete Human-readable descriptions

LU_WASTE_CODE

• Most descriptions excluded, e.g.

• "from br conversion”

• “Description”

• “?”

• Some descriptions cleaned, e.g.

• "from br conversion [UN1255 is the UN-NA code for petroleum naphtha]” → “petroleum naphtha”

• “WASTE PCBs” → “Waste PCBs”

Page 43: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Q&A Next steps for RCRA as

Linked Data

Page 44: US EPA Resource Conservation and Recovery Act published as Linked Open Data

WeatherHealthA mobile app for chronic asthma/COPD

patients with weather alerts

Funded by

Page 45: US EPA Resource Conservation and Recovery Act published as Linked Open Data
Page 46: US EPA Resource Conservation and Recovery Act published as Linked Open Data

User

NOAA US EPA AirNow

DBpediaNational Library of Medicine

US EPA SunWise

Page 47: US EPA Resource Conservation and Recovery Act published as Linked Open Data

OrgpediaAn open organizational data project

on public & private companies

Funded by

Page 48: US EPA Resource Conservation and Recovery Act published as Linked Open Data
Page 49: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Callimachus apps allow for crowdsourcing

Page 50: US EPA Resource Conservation and Recovery Act published as Linked Open Data

How did we handle data publishing & application

development US EPA, Sentara Healthcare &

Orgpedia?

Page 51: US EPA Resource Conservation and Recovery Act published as Linked Open Data

The leading Web application server for Linked Data

Fanatically standards compliant **

Used to creating data-driven applications that combine data across silos

** http://www.w3.org/2013/data/

Page 52: US EPA Resource Conservation and Recovery Act published as Linked Open Data

<HTML>

Enterprise Data Documents

Read/ Write

Point to, include

Our customers use Callimachus to:

Create responsive apps with many different data sources &

types of data

Page 53: US EPA Resource Conservation and Recovery Act published as Linked Open Data

CONTENT MANAGEMENT

SYSTEM

LINKED DATA MANAGEMENT

SYSTEM

UN

ST

RU

CT

UR

ED

T

EX

T

TE

XT

ST

RU

CT

UR

ED

D

AT

A

DA

TA

Page 54: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Callimachus Enterprise customers are creating data-driven applications with data from leading

graph databases:

Page 55: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Do not recreate the wheel!

Page 56: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Summary

• Billions of dollars are spent by taxpayers for government to collect useful information - e.g., geospatial data, population, healthcare, medicine & clinical trials, environment, energy, law, education …

• Data consumers must help government to fulfill its goal to treat “information as an asset” by participating & giving feedback

• Steady forward progress has been made however, take care to not re-create the wheel!

• Use Open Data, Open Source, Web standards & published best practices whenever possible

• More work to be done …

Page 57: US EPA Resource Conservation and Recovery Act published as Linked Open Data

Addi%onal  Resources• “Open  by  Default”  presenta%on  by  Dr.  David  Wood  to  Virginia  Commonwealth  officials  

10/7/2014,  see  hJp://www.slideshare.net/3roundstones/open-­‐by-­‐default-­‐39976290  – Open  Data  is  the  idea  that  "certain  data  should  be  freely  available  to  everyone  to  use  and  

republish  as  they  wish,  without  restric%ons  from  copyright,  patents  or  other  mechanisms  of  control”.  Open  Data  follows  similar  “open”  concepts  that  have  proven  to  be  valuable  in  the  informa%on  economy  such  as  Open  Standards,  Open  Source  SoRware,  Open  Content  and  has  been  followed  more  recently  by  varia%ons  on  the  theme  such  as  Open  Science  and  Open  Government.    

–  Linked  Data  Developer  website,  see  hJp://linkeddatadeveloper.com/  

–  Linked  Data:  Structured  Data  on  the  Web,  see  hJp://books.google.com/books/about/Linked_Data.html?id=rA8-­‐mQEACAAJ  

–  Add  Linked  Data  to  HTML  with  RDFa.info,  see  hJp://seman%cweb.com/new-­‐resource-­‐for-­‐web-­‐developers-­‐announced-­‐add-­‐linked-­‐data-­‐to-­‐html_b28813  

–See  also  RDFa  website  on  GitHub,  see  hJps://github.com/rdfa/rdfa-­‐website

57