Upload
opening-upeu
View
183
Download
2
Tags:
Embed Size (px)
Citation preview
THE ROAD TO OPEN DATA
ENLIGHTENMENT
IS PAVED WITH NICE EXCUSES
Toon Vanagt CEO data.be @Toon3rd Dec 2014
Some data.be features
Autocomplete
Mashing up govsources
Data enrichment
Financial ratios
OCR in PDFs
Entity recognition
Alerts
Belgian Finance department and
Police Force are top users of
data.be
On the
internet you
must always
remember:
If something
of value is
free, you’re
the product!
Definitions
‘Open knowledge’ is any
content, information or
data that people are
free to use, re-use and
redistribute — without
any legal, technological
or social restriction.
(okfn.org)
‘Open data’ and ‘open content’ mean anyone can freely access, use, modify, and share for any purpose —subject, at most, to requirements that preserve provenance and openness. (opendefinition.org)
Open Data Enlightenment vs
BuzzThe Age of Enlightenment is the era from the1650s to the 1780s in which cultural and intellectual forces emphasized reason, analysis and individualism rather than traditional lines of authority….
The current open data philosophy redefines ‘authority’ too and appeals to analytical power of citizens, hackers, journalists and entrepreneurs to put data to good use.
Open data:
fosters “bottom up”-approach
stimulates to get more out of the data sets
delivers unexpected results & insights
Beware of fancy alchemy headlines:
Open Data Is The New Oil
Unlocking The Gold Mine
Turning Government Data Into Gold
€40 Billion boost to the EU's economy each year…
Excuse 1: But how will we make
money?
Does your government (department) really have to make money with open data?
Open data quickly evolved into primary state infrastructure & service.
Open data benefits society as a whole, so why tax usage separately?
If you still want or have to charge users, limit the cost in PSI-spirit to your marginal data delivery expense (extra bandwidth).
Who pays for open data gov
cost?
1. Government subsidizes underlying open data department costs as a primary service. Government covers the open data related cost as part of tis general expenses.
2. Government agencies charge each other for cost of data usage between federal, regional and city level departments
3. 11 open data revenue models for government agencies as authentic sources
3 options at input side
8 options at output side
Charging the INPUT side
Government makes the user pay for (legally required!) data mutations:
1. Creation of data sets (company creation, alarm system registration, publication of annual accounts,…)
2. Change of data: (address move, new stakeholder in company, name changes, corrections…)
3. Deletion of a dataset (inactive company, bankruptcy,…)
Downsides of INPUT based revenue
model
Introduces financial hurdlesRemoves incentives to keep data up to
dateResults in lower data qualityRequires higher ‘enforcement’ costRequires cost to clean up outdated data
sets
Charging the OUTPUT side1. User pays for individual consultation
2. Basic data are free, but user has to pay to consult extended data or meta data
3. User pays for use of structured data sets (csv, xml, batch, API,..)
4. User pays for real-time data sets, which reflect current state in authentic data source (daily update versus monthly update)
5. User pays for removed data (from archive) or for change log (historic overview)
6. Users pays to Service Level Agreement (eg guaranteed bandwidth or outside business hours)
7. User pays for monitoring keywords (or events) in (or about) certain data sets to receive alerts (push notifications, e-mails, SMS,…)
8. User pays for custom bench marking, segmentations, ratios or advanced filtering options
Downsides of OUTPUT based revenue
model
Financial hurdle for ‘newcomers’
Reduces innovation and consolidates ‘status-quo’
Inequality (more for those who can pay, higher service through faster access, better informed)
Results in limited usage and applications
Requires costs for billing & payment system with back office operations
Belgian example 1: Official State
Gazette / Belgisch Staatsblad /
Moniteur
Input based:
1. Creation of data sets (company creation, publication of annual accounts,…)
2. Change of data: (address move, name changes, capital changes, new stakeholders…)
Belgian example 2:
National Bank Balance sheets
Input
Pay for publication of annual accounts (274 EUR for BVBA/SPRL = limited liability company)
Output
User pays for use of structured data sets via a webservice (roughly between 1.850 EUR and 15.000 EUR per year).
User pays for old archived data sets which are no longer shown on the National Bank’s website
User pays for custom industry bench marking and ratios of competitors, customers or prospects (but one self-owned company benchmarking remains free)
Belgian example 3:
Crossroads bank for enterprises
Input
Creation of data sets
Change of data, such as address move or registering extra business entity,…
Output
User pays for use of structured data sets (copy of public part of database with names of company stakeholders and self employed persons at 75.000 EUR/year
User pays for real-time data sets, which reflect current state in authentic data source (daily update versus monthly update) via API (2.000 API request for 50 EUR in prepaid balance)
User pays for removed data for change log (historic overview)
Users pays to Service Level Agreement (eg guaranteed bandwidth or outside business hours)
Excuse 1:
Avoid conflict of interest for gov
agencies
Battle for budget: creates competition between government agencies
Inequality in support services and quality between paying and non-paying customers or agencies
Battle to secure authentic source as single gatekeeper and extend reach
Creates competition with private sector. Due to government agencies acting as commercial data brokers selling whole sale personal contact details to intermediates
Excuse 2:
Our data quality is too low to release
Open Data is not your real challenge, you have much bigger data quality issues…
Accuracy: is the data correctly representing the real-world entity or event?
Completeness: Does the data include all data items representing the entity or event?
Conformance: Is the data following accepted standards?
Consistency: Is the data not containing contradictions?
Credibility: Is the data based on trustworthy sources?
Processability: Is the data machine-readable?
Relevance : Does the data include an appropriate amount of data?
Timeliness: Is the data representing the actual situation and is it published soon enough?
Excuse 3: Yes, the data are open but
the process and partner chain is
not…
Document data process
partners
Describe steps in
information chain upward
of your authentic source
(data.be had to reverse
engineer processes)
Excuse 4: We think we might have
some privacy sensitive data
elements…
Keep the lawyers out of your open data project if you want to make a fast start
It’s complicated
It’s Personal
Privacy concept evolves over time and is culturally defined
Many grey zones
Don’t forget to try to anonymise your unstructured data too… accidents will happen
We can technologically do much more than we are permitted to culturally, morally or legally…
Beware that very few data points are needed to identify a person in this big data era. Eloquently phrased by Jonathan Mayer: “The idea of personally identifiable information not being identifiable is completely laughable in computer-science circles”.
Excuse 5: On second thought, we’re
not that open…
Availability: Can the data be accessed now and over time?
Be consistent and offer long term commitments and stable data set formats (integration mapping)
Data.be received a ‘Cease & Desist’ after a government hackathon: “Our government website is the only authentic source for air quality measurement. Stop using our data immediately or …”
Excuse 6: We opened the data in a
layer on our WMS…
Web Map Service (WMS) is a standard
protocol for serving geo-referenced map
images over the Internet that are generated by
a map server using data from a GIS database.
It is very hard to share the layer data…in other
applications
Next frontiers for Open Data
Linked & graph data
Metadata
Unstructured data
Structured feedback loops
Gatekeepers to the rescue
Don’t just ‘input’ the data which are presented
Inform general public on long term use of their ‘public’ data.
Once online, always online…
Evangelise the use of open data inside and outside your organisation
Open up your organisation
Invite a data scientist to work. Share insights internally, learn, optimize quality of data sets
Be open about quality and refresh rates
Specify the license under which the data may be re-used.
Provide a feedback loop (now data.be often is feedback for outdated company data…)
Maintenance of metadata and data is critical!
Toon Vanagt CEO [email protected]
@Toon
THANK YOU3rd Dec 2014 #OUP14
Opening up conference in Brussels
Picture copyright & attribution
The brick laying machine pictures can be found at Tiger Stone:http://www.tiger-stone.nl/index.php?option=com_content&view=article&id=47&Itemid=55
Keep calm cup: http://www.keepcalm-o-matic.co.uk/product/mug/keep-calm-and-open-up-67/
Storify with pictures of opening-up.eu event: https://storify.com/openingup_eu/opening-up-final-conference-1