Open Data Trentino - Seminar at Universidad Simon Bolivar - 15th October 2013

Preview:

DESCRIPTION

Seminar on Open Data at Universidad Simon Bolivar presented by Lorenzino Vaccari. Authors: Juan Pane, Lorenzino Vaccari. Contributions (CC-BY) from Maurizio Napolitano: Slides 7,8, 55,56,57 and from 61 to 69 Five parts: 1. Open Data: introduction 2. Open Data: Issues 3. Open Data in Trentino Project 4. Open data: Applications 5. Open Data: Semantic Issues

Citation preview

11/04/231 Lorenzino Vaccari, Juan Panehttp://dati.trentino.it

Open Government DataSeminar @USB*

*This presentation is taken from the “Open Government Data Tutorial” presented at CLEI2013

Lorenzino Vaccari1, Juan Pane2

1Autonomous Province of Trento, Trento, Italy lorenzino.vaccari@provincia.tn.it

2University of Trento, Trento, Italy – Universidad Nacional de Asuncion, Asuncion, Paraguay pane@disi.unitn.it

11/04/23 Lorenzino Vaccari, Juan Pane2

Goal of the Seminar• Introduce Open Government Data

• Intro, Issues (Part 1)

• If you need it, how can you organize it?• Real experience (Part 2)

• Methods for opening data• Applications (Part 3)• Semantic Issues (Part 4)

11/04/23 Lorenzino Vaccari, Juan Pane3 15/10/2013Juan Pane, Lorenzino Vaccari3http://www.point-fort.com/index.php?2012/01/25/805-why-how-what

http://www.point-fort.com/index.php?2012/01/25/805-why-how-what

11/04/23 Lorenzino Vaccari, Juan Pane4

What?

“is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and

sharealike.” *

*(Source: )

http://www.opendefinition.org

11/04/23 Lorenzino Vaccari, Juan Pane5

usereuse

“open” = redistributioncommercial reusederivative works

BUT, may require:- attribution- share alike

http://myfbcovers.com/uploads/covers/2012/06/09/16628a1094aa012f7c6e0025902480d2/watermarked_cover.jpg

J. Gray (OKF): http://www.slideshare.net/jwyg/open-government-data-what-why-how

11/04/23 Lorenzino Vaccari, Juan Pane6

The value is in its use

11/04/23 Lorenzino Vaccari, Juan Pane7 Maurizio Napolitano: http://www.youtube.com/watch?v=YlkjrVAW43Q

11/04/23 Lorenzino Vaccari, Juan Pane8

Is open data useful?

Maurizio Napolitano: http://www.youtube.com/watch?v=YlkjrVAW43Q

11/04/23 Lorenzino Vaccari, Juan Pane9

Open Data Benefits The Open data are the knowledge base to:

Improve the economic grow and the entrepreneurship based on the development of digital services reusing Public Sector Information

Answer to social needs through the publication of innovative services and applications

Aims at reducing the cost of the public administrative activities within Public – Private Partnerships (PPP)

Improve the transparency of the activities of the public institutions and the participation of the citizens to these activities

11/04/23 Lorenzino Vaccari, Juan Pane10

Principles

Tim Berners-Lee (5-Stars of Linked Open Data)Vs.Tim Davis (5-Stars of Open Data Engagement)

http://5stardata.info/

http://www.timdavies.org.uk/2012/01/21/5-stars-of-open-data-engagement/

11/04/23 Lorenzino Vaccari, Juan Pane11

5 Starts Linked Open DataTim Berners-Lee

http://5stardata.info

11/04/23 Lorenzino Vaccari, Juan Pane12

5-Stars of Open Data Engagement

* Be demand driven * * Provide context * * * Support conversation * * * * Build capacity & skills* * * * * Collaborate with the community

Tim Davis

http://www.timdavies.org.uk/2012/01/21/5-stars-of-open-data-engagement/

11/04/23 Lorenzino Vaccari, Juan Pane13

Create Communityhttp://msnbcmedia.msn.com/j/MSNBC/Components/Photo/_new/pb-121007-spain-tarragona-pyramid-nj-02.photoblog900.jpg

11/04/23 Lorenzino Vaccari, Juan Pane14

Open Government Data

11/04/23 Lorenzino Vaccari, Juan Pane15

State of the ArtWhat is happening around us?-Globally-Europe-Latin America

11/04/23 Lorenzino Vaccari, Juan Pane16

Open Data Charter - G8The principles are:Open Data by DefaultQuality and QuantityUseable by AllReleasing Data for Improved GovernanceReleasing Data for Innovation

http://opensource.com/government/13/7/open-data-charter-g8

https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex

11/04/23 Lorenzino Vaccari, Juan Pane17http://opensource.com/government/13/7/open-data-charter-g8

http://census.okfn.org/

OGD around the world

11/04/23 Lorenzino Vaccari, Juan Pane18http://opensource.com/government/13/7/open-data-charter-g8

http://census.okfn.org/country/

11/04/23 Lorenzino Vaccari, Juan Pane19

OGD in Europe

http://open-data.europa.eu/

11/04/23 Lorenzino Vaccari, Juan Pane20

OGD in Europescreenshots

http://epsiplatform.eu/content/european-psi-scoreboard

11/04/23 Lorenzino Vaccari, Juan Pane21

OGD in EuropeInsert table

http://epsiplatform.eu/content/european-psi-scoreboard http://epsiplatform.eu/content/psi-scoreboard-indicator-list

11/04/23 Lorenzino Vaccari, Juan Pane22

OGD in Italy

http://www.dati.gov.it/content/infografica

11/04/23 Lorenzino Vaccari, Juan Pane23

OGD in Latin America*

*In Venezuela some OD projects have been started by the USB

11/04/2324 Lorenzino Vaccari, Juan Pane

Questions?

OGD: Part 2 - Issues

11/04/2325 Lorenzino Vaccari, Juan Pane 08/10/2013Juan Pane, Lorenzino Vaccari25http://evian-thesource.com/kids-having-fun/http://evian-thesource.com/kids-having-fun/

11/04/2326 Lorenzino Vaccari, Juan Pane

Open Data. Oh ohh

08/10/2013Juan Pane, Lorenzino Vaccari26

LegalLegalOrganizationalOrganizational TechnicalTechnicalAdoptionAdoptionBarriersBarriers

ContextualContextual

http://www.wallpapermania.eu/wallpaper/trick-or-treat-cute-pumpkins-lanterns-halloween-wallpaper

11/04/23 Lorenzino Vaccari, Juan Pane27http://de.straba.us/wp-content/uploads/2012/08/barrieres_for_implementation_of_ogd.png

11/04/23 Lorenzino Vaccari, Juan Pane28

Organizational Barriers

Not readyLack of resources

ITHuman

Don’t want to be ready

http://montcomediation.org/images/MCMC_MyWayYourWay.jpg

11/04/23 Lorenzino Vaccari, Juan Pane29

Legal barriersOpen the Data

All the data that was produced using public money has to be made publicly available (with exceptions)

vs PrivacyYou cannot open data that could allow

correlation of private personal data

Or the complete lack of legislation!

11/04/23 Lorenzino Vaccari, Juan Pane30

Adoption barriersData is not contextualizedPeople are not informedOpening data is a complex task, opening

cleaned data is even more complex.Unclear licenses

11/04/23 Lorenzino Vaccari, Juan Pane31

Technical BarriersAccess to data:

OrganizationalTechnical, Downtimes, logins, Payment fees

Fragmentation, incomplete data, scattered

FormatCataloging, indexing, searchLack of explicit semantics, metadataData is not reliableConflicting standards, models,

ontologies

11/04/23 Lorenzino Vaccari, Juan Pane32

BarriersZuiderwijk et al 2010

Listed 118 socio-technical impediments for opening data in the literature.FindabilityUsabilityUnderstandablityQualityLinkingComparability and compatibilityMetadata….

http://www.ejeg.com/issue/download.html?idArticle=255

11/04/23 Lorenzino Vaccari, Juan Pane33

Context Barriers

Privileged access to dataOther companies what to avoid legislation

of privacy.Transparency is bad for fraudulent business

http://img.gawkerassets.com/img/182n8vzdlg1iojpg/original.jpg

11/04/23 Lorenzino Vaccari, Juan Pane34http://netdna.webdesignerdepot.com/uploads/photo_manipulation/manipulation-9.jpg

11/04/2335 Lorenzino Vaccari, Juan Pane

Preguntas?

Part 3 - Real Experience

11/04/23 Lorenzino Vaccari, Juan Pane36http://goo.gl/T2Xp80

11/04/23 Lorenzino Vaccari, Juan Pane37

The “Open Data in Trentino” project

• The “Open Data in Trentino” project is a 3 years initiative finalized to develop an open data infrastructure to enhance Service Innovation for Trentino following the PAT strategy for services innovation enabled by ICT. The project will be developed within a partnership between Trento RISE and the Autonomous Province of Trento (PAT) according to the innovation PAT model

• Goals• Improved quality of life for citizens• Open Data and local businesses• Transparency• Improved efficiency and productivity

11/04/23 Lorenzino Vaccari, Juan Pane38

Workplan - Steps

11/04/23 Lorenzino Vaccari, Juan Pane39

Nome (Acronimo) Descrizione

Tipo di Dato Estensione del file

Comma Separated Value (CSV) Formato testuale per l'interscambio testuale di tabelle, le cui righe corrispondono a linee e i cui valori delle singole colonne sono separati da una virgola (o punto e virgola)

Dato tabellare .csv

Geographic Markup Language (GML) Formato XML utile allo scambio di dati territoriali di tipo vettoriale

Dato geografico vettoriale

.gml

Keyhole Markup Language (KML) Formato basato su XML creato per gestire dati territoriali in tre dimensioni nei programmi Google Earth, Google Maps

Dato geografico vettoriale

.kml

Open Document Format (ODF) Formato per l'archiviazione e lo scambio di documenti di testo, fogli di calcolo, diagrammi e presentazioni

Dato tabellare .odc

Resource Description Framework (RDF) Basato su XML, e' lo strumento base proposto da World Wide Web Consortium (W3C) per la codifica, lo scambio e il riutilizzo di metadati strutturati e consente l'interoperabilità tra applicazioni che si scambiano informazioni sul Web

Dato strutturato .rdf

ESRI Shapefile (SHP) Lo Shapefile ESRI è un popolare formato vettoriale per sistemi informativi geografici. Il dato geografico viene distribuito normalmente attraverso tre o quattro files (se indicato il sistema di riferimento delle coordinate). Il formato è stato rilasciato da ESRI come formato (quasi) aperto

Dato geografico vettoriale

.shp, .shx, .dbf,

.prj

Extensible Markup Language (XML) E' un formato di markup, ovvero basato su un meccanismo che consente di definire e controllare il significato degli elementi contenuti in un documento o in un testo attraverso delle etichette (markup)

Dato strutturato .xml

11/04/23 Lorenzino Vaccari, Juan Pane40

…MeteoMeteo GeoDatiGeoDati StatisticaStatistica Comune

TrentoComuneTrento TrasportiTrasporti Etc…Etc……

Tecnological platform

11/04/23 Lorenzino Vaccari, Juan Pane41

Catalog

The Open Knowledge Foundation (OKF) is a non-profit organisation founded in 2004 and dedicated to promoting open data and open content in all their forms – including government data, publicly funded research and public domain cultural content.

(2004)

http://okfn.org

11/04/23 Lorenzino Vaccari, Juan Pane42

http://dati.trentino.it*

Analysis: http://dati.trentino.it/stats Admin: http://dati.trentino.it/admin Harvesting: http://dati.trentino.it/harvest

* Available for all the data providers of Trentino  

11/04/23 Lorenzino Vaccari, Juan Pane43

Services

11/04/23 Lorenzino Vaccari, Juan Pane44

Also Trentino is going to launch a challenge to build software applications and creative products (multimedia, audiovisual products, posters, illustrations) based on the datasets published on the http://dati.trentino.it open data catalog.

 #ODTChallenge will be the official hashtag for our first open data challenge in Trentino! 

11/04/23 Lorenzino Vaccari, Juan Pane45

11/04/23 Lorenzino Vaccari, Juan Pane46

7 months until now68.555 visits 7.988 unique visits2.516 downloads

37,36% returning visitors

62,64% new visitors

NOW- ALL the departmnets demand to be involved- Plus other local actors

AgricultureCultureGeographical DataWelfareWeather ForecastSocial policiesStatisticsTransports…MUNICIPALITY OF TRENTO, and

INFORMATICA TRENTINA

567 datasetsprovided by 10 departments of PAT…

20 reporting errors15 asking for new data10 new suggestions6 OD Applications

100% ENTHUSIASTIC REACTIONS

11/04/23 Lorenzino Vaccari, Juan Pane47

Want to Know & Learn more?

11/04/23 Lorenzino Vaccari, Juan Pane48http://www.theodi.org/

11/04/23 Lorenzino Vaccari, Juan Pane49http://schoolofdata.org/

11/04/23 Lorenzino Vaccari, Juan Pane50http://opendatahandbook.org/pt_BR/

11/04/23 Lorenzino Vaccari, Juan Pane51 http://www.od4d.org/category/open-data/how-to/

11/04/23 Lorenzino Vaccari, Juan Pane52http://schoolofdata.org/online-resources/

11/04/23 Lorenzino Vaccari, Juan Pane53

Thanks to the project team !!!!• General Manager: Isabella Bressan

• Project coordinator: Lorenzino Vaccari• Organizational/Communication issues: Francesca Gleria,

Roberto Cibin • Data gatherer: Luca Paolazzi • Catalog: Maurizio Napolitano, Samuele Santi• Semantics: Juan Pane, David Leoni, Alberto Zanella• Legal issues: Eleonora Bassi, Stefano Leucci• Communities: Maurizio Napolitano, Francesca De Chiara• System integration: Marco Combetto, Lorenzo Dallapè• Statistical Linked Data: Pavel Shvaiko

11/04/2354 Lorenzino Vaccari, Juan Pane

Questions?

OGD: Part 4 - Applications

11/04/23 Lorenzino Vaccari, Juan Pane55

Apps4Italy

11/04/23 Lorenzino Vaccari, Juan Pane56

Best Application: http://parlamento17.openpolis.it/

11/04/23 Lorenzino Vaccari, Juan Pane57

Open Bilancio

Best Idea: http://opendata.comune.fi.it/open_bilancio/

11/04/23 Lorenzino Vaccari, Juan Pane58

What?

DAL America Latina (2012): http://desarrollandoamerica.org/aplicaciones-2012/

DAL America Latina (2013): http://2013.desarrollandoamerica.org/appschallenge/

11/04/23 Lorenzino Vaccari, Juan Pane59

http://limaio.innovacion.pe/ http://www.limaio.com/demo

11/04/23 Lorenzino Vaccari, Juan Pane60http://www.mysociety.org/2007/more-travel-maps/morehousing

11/04/23 Lorenzino Vaccari, Juan Pane61

Johann MITTHEISZ (CIO der Stadt Wien)

http://www.slideshare.net/BrigitteLutz/keynote-mittheisz-cio-stadt-wien/16

Total hours to develop 38 applications:around 2.600

City of Wien saved around 208.000 Euro

11/04/23 Lorenzino Vaccari, Juan Pane62

The Open Data Ecosystem(and the OpenStreetMap case)

11/04/23 Lorenzino Vaccari, Juan Pane63

11/04/23 Lorenzino Vaccari, Juan Pane64

OpenStreetMap

~

OpenStreetMap project creates and provides geographical data, such as road maps, freely available to anyone. Behind the establishment and growth of the project have been restrictions on use or availability of map information across much of the world and the advent of inexpensive portable satellite navigation devices.

OpenStreetMap is a free map of theworld, created by someone like you

11/04/23 Lorenzino Vaccari, Juan Pane65http://tools.geofabrik.de/mc/?mt0=mapnik&mt1=googlemap&lon=11.12042&lat=46.07224&zoom=18

11/04/23 Lorenzino Vaccari, Juan Pane66http://haiti.ushahidi.com

11/04/23 Lorenzino Vaccari, Juan Pane67

Watercolor maps

http://content.stamen.com/files/cartography/index_watercolor.html#18.00/46.07204/11.12097

11/04/23 Lorenzino Vaccari, Juan Pane68

From maps to blankets…

http://softcities.net

11/04/23 Lorenzino Vaccari, Juan Pane69

Sharing Data Globally(the eHabitat example)

11/04/23 Lorenzino Vaccari, Juan Pane70

21th Century Challenges

Source: http://www.slideshare.net/angeled/geoss © GEO secretariat

11/04/23 Lorenzino Vaccari, Juan Pane71

The Group of Earth Observation

Source: http://www.slideshare.net/angeled/geoss © GEO secretariat84 GEO members and 61 Participating organizations

11/04/23 Lorenzino Vaccari, Juan Pane72

GEOSS Data Sharing Principles

• Full and Open Exchange of Data, recognizing Relevant International Instruments and National Policies

• Data and Products at Minimum Time delay and Minimum Cost

• Free of Charge or minimal Cost for Research and Education

http://www.geoportal.org/web/guest/geo_home

11/04/23 Lorenzino Vaccari, Juan Pane73

“Venezuela is considered a state with extremely high biodiversity, with habitats ranging from the Andes mountains in the west to the Amazon Basin rainforest in the south, via extensive llanos plains and Caribbean coast in the center and the Orinoco River Delta in the east."

Source: Wikipedia

11/04/23 Lorenzino Vaccari, Juan Pane74

GEOSS for biodiversity

http://www.eurogeoss-broker.eu/

11/04/23 Lorenzino Vaccari, Juan Pane75

The eHabitat Model

http://ehabitat-wps.jrc.ec.europa.eu/ehabitat/

11/04/2376 Lorenzino Vaccari, Juan Pane

Questions?

OGD: Part 5 - Semantics

11/04/23 Lorenzino Vaccari, Juan Pane77

Available

Structured

Open formats

Redefenceable

Linked

Linked Open Data

The best data is an open data

Vs.

All data must be perfect

11/04/23 Lorenzino Vaccari, Juan Pane78

Lack of explicit semanticsThe real meaning of the data was kept in the developers mind when creating the data

78http://goo.gl/npEHKr

11/04/23 Lorenzino Vaccari, Juan Pane79

Lack of explicit semanticsCan lead to things like:

11/04/23 Lorenzino Vaccari, Juan Pane80

Semantic heterogeneityDifference in the meaning of local data

11/04/23 Lorenzino Vaccari, Juan Pane81

Issues when Opening Trentino Data

Each department has authority on only some part of the data.

Dataset originally created for internal use only.Dataset created for a specific need.Dataset created with custom format:

For structure (some exceptions)For data

Lack of reuse -> duplication.Lack of programmers.We cannot TELL them what/how to do (always).Data changes

11/04/23 Lorenzino Vaccari, Juan Pane82

Available

Structured

Open formats

Redefenceable

Linked

Entity CentricSemantic Layer

Data Catalog

Data Catalog

11/04/23 Lorenzino Vaccari, Juan Pane83

Entity centric: Added valueAggregated dataAccurate data, manually curatedUnique identifiers, distributed perspectives

Re-think identifiersSemantified values

E1

name Juan Pane

nationality italian

lives in Trento

affiliation Univ. Trento

E2

name Ignacio P. F.

born in Paraguay

date of birth 1980

affiliation PF-UNA

11/04/23 Lorenzino Vaccari, Juan Pane84

EntitiesReal world: is something that has a distinct,

separate existence, although it need not be a material (physical) existence. Has a set of properties, which evolve over time. Example:

Mental: personal (local) model created and maintained by a person that references and describes a real world entity.

Digital: capture the semantics of real world entities, provided by people.

11/04/23 Lorenzino Vaccari, Juan Pane85

Entity Centric Semantic Layer:• Address the integration problems due to

semantic heterogeneity:• Different formats• Different identifiers• Implicit semantics• Homonyms, synonyms, aliases• Partial knowledge• Knowledge evolution

http://www.webfoundation.org/2011/11/5-star-open-data-initiatives/

11/04/23 Lorenzino Vaccari, Juan Pane86

Entity-based Integration• Focus on entities as first class citizens

• Entities are objects which are so important in our everyday life to be referred with a name

• Each entity has its own metadata (e.g. name, latitude, longitude, …)• Each entity is in relation with many other entities (e.g. Einstein was

born in Ulm, his affiliation was Charles University, Ulm is a city in Germany)

• There are relatively “few” commonsense entity types (person, …, event)

• There are many domain specific entities (bus stops, cycling paths, ..)• All components have explicit semantics: schema, entities, attributes,

values

11/04/23 Lorenzino Vaccari, Juan Pane87

Importing pipeline, Macro Steps1. Domain analysis

Study the needed entity types, adapt the knowledge base accordingly. First time bootstrapping

2. Import entities Semi-automatic tool.

Domain experts are expensive. Human attention is a scarce resource. Incremental enrichment and aggregation of

entities.

11/04/23 Lorenzino Vaccari, Juan Pane88

Open Data PeculiaritiesAll data comes from a CKAN repository

(DCAT).Process one data file at a time.Each data file can be represented as a

table.Each row in the table represents a (partial)

entity.The format of the values might not be

enforced in the data files.Not all data is relevant.

11/04/23 Lorenzino Vaccari, Juan Pane89

Importing tool process

11/04/23 Lorenzino Vaccari, Juan Pane90

1. Source SelectionImport one data file at a time

11/04/23 Lorenzino Vaccari, Juan Pane91

2. Schema MatchingSelect a target type of entity -> correspondences between the input columns and the output attributes

nome provincia descrizione funivie lat long

Andalo (1047) Provincia di Trento

Sorge su un'ampia sella prativa al centro...

3 654463 712857

Canazei (1450) Trento Prov. Situato all'estremità settentrionale della...

2 511504 147444

11/04/23 Lorenzino Vaccari, Juan Pane92

3. Data ValidationApplies format and structure validation and possible automatic transformations needed to have the input data in the expected format.

11/04/23 Lorenzino Vaccari, Juan Pane93

4. Semantic Enrichment (1/2)Entity disambiguation: Transform text references into links to existing entities.

11/04/23 Lorenzino Vaccari, Juan Pane94

4. Semantic Enrichment (2/2)Natural Language Processing: Extract concepts and entity references from free-text.

11/04/23 Lorenzino Vaccari, Juan Pane95

5. ReconciliationRun Identity Management Algorithms to identify each row as a new or existing entity.

Result•No Match•Match•Multiple Matches

Action:•Use ID•New ID•Ignore Row

11/04/23 Lorenzino Vaccari, Juan Pane96

6. ExportingAt this point:We know what to export.All values for target attributes conform to the expected format.All text has been semantified (NLP).All textual references to entities are converted to linksEach row has an identifier

i i+1v0

11/04/23 Lorenzino Vaccari, Juan Pane97

7. PublishingPut back the semantified entities into CKAN so that the entities can be Open Data and can be found in the same catalog as the original data.Developers and find the data files of the cleaned, aggregated entitiesBut can also interact with the entities via the Entitypedia APIs

8. VisualizationSearch and Navigation

11/04/23 Lorenzino Vaccari, Juan Pane98

Semantic Layer: ServicesTool for aiding the “semantification” of the datasets in the catalog based on:

• Schema matching services• Identity Management services

• Entity Matching services• Global Unique Identifier services

• Semantic search and indexing services• Natural Language Processing• Entity store

11/04/23 Lorenzino Vaccari, Juan Pane99

Our Goal

TN

UK

BEES

11/04/23 Lorenzino Vaccari, Juan Pane100 http://www.youtube.com/watch?v=Bq_ZWl1ZXA0

BEYOND

11/04/23 Lorenzino Vaccari, Juan Pane101

Gracias!

Grazie!

Mercy!

Gràcies!Gratias!

Thanks!

Danke!

Dank u!

Kiitos!

ευχαριστώ

We thank in particular CLEI 2013, Autonomous Province of Trento, TrentoRise association, Universidad Nacional de Asuncion, Universidad Simon Bolivar and University of Trento

Lorenzino Vaccari1, Juan Pane2

1Autonomous Province of Trento, Trento, Italy lorenzino.vaccari@provincia.tn.it

2University of Trento, Trento, Italy – Universidad Nacional de Asuncion, Asuncion, Paraguay

pane@disi.unitn.it

Recommended