55
OPEN DATA, BIG DATA STORY Ratko Mutavdzic, PROJEKTURA, Experience Architect

(PROJEKTURA) open data big data @tgg osijek

Embed Size (px)

DESCRIPTION

Presentation from The Geek Gathering 2013 event, Osijek, October 2013. Great event, great people, looking forward to the next year.

Citation preview

Page 1: (PROJEKTURA) open data big data @tgg osijek

OPEN DATA, BIG DATA STORY

Ratko Mutavdzic, PROJEKTURA, Experience Architect

Page 2: (PROJEKTURA) open data big data @tgg osijek

OPEN DATA VS. BIG DATA

THE NEW REALITY BUT TWO DIFFERENT THINGS

• Open Data =/= Big Data (usually)• Big Data = Open Data (usually)• Open Data could grow to Big Data• Big Data = BUSINESS THEN TRANSPARENCY (Business

Conversation)• Open Data = TRANSPARENCY THEN BUSINESS (Government

Conversation)

Page 3: (PROJEKTURA) open data big data @tgg osijek

OPEN DATA

THE NEW REALITY2006 EC MEPSIR Study

International Right-To-Know Goal

“to raise global awareness of individuals’ right to access government information”, and

„to promote access to information as a fundamental human right”

Page 4: (PROJEKTURA) open data big data @tgg osijek

WHY OPEN DATA

TRANSPARENCYPARTICIPATIONCOLLABORATION

Page 5: (PROJEKTURA) open data big data @tgg osijek

WHAT MAKES DATA OPEN

right information is available for people to make a right decisions

... at all levels of the organization

• open format • publised via industry standards like XML, RDF, HTML, CSV for

data, PDF for documents• metadata

• published via standards like Dublin Core• catalouge of open data sources

• http://logd.tw.rpi.edu/

PU

BLIC

DATA

OP

EN

FO

RM

ATS

MA

CH

INE

REA

DA

BLE

AC

CESS

IBLE

Page 6: (PROJEKTURA) open data big data @tgg osijek

PUBLIC INFORMATION POOL

OBJECTIVE

source: adapted from OECD, 2006

Public Information / Content Pool

Public Sector Information

Public Sector Content

geo data, statistical data, other numbers

geo data, statistical data, other numbers

INFORMATION RE-USE

CONTENT AVAILABILITY

• transformation of raw data by value addition

• frequent combination of information types

• education and cultural value

• limited commercial exploatation

• content not transformed

CHARACTERISTICS

EXAMPLE

CATEGORY

Page 7: (PROJEKTURA) open data big data @tgg osijek

KEY COMPONENTS

WHAT IS OPEN DATA SOLUTIONfew words of wisdom

• Open Data Portal• Open Data API

Page 8: (PROJEKTURA) open data big data @tgg osijek

CKAN, Cloud

Most respected Open Data implementation

OPEN DATA IN U.K.open.data.gov.uk

Page 9: (PROJEKTURA) open data big data @tgg osijek

OPEN LINKED DATA

Page 10: (PROJEKTURA) open data big data @tgg osijek

AGENDA

(LINKED) (OPEN) DATAfew words of wisdom

• What is Open (Linked) Data?• Linked Data Standards and Tools• Linked Open Data in Practice

Page 11: (PROJEKTURA) open data big data @tgg osijek

PUBLIC DATA vs. OPEN DATA

(LINKED) (OPEN) DATAfew words of wisdom

• difficult to find• difficult to reuse• difficult to integrate

Page 12: (PROJEKTURA) open data big data @tgg osijek

WHAT IS OPEN DATA?

Page 13: (PROJEKTURA) open data big data @tgg osijek

WHAT IS LINKED OPEN DATA?

Page 14: (PROJEKTURA) open data big data @tgg osijek

WHAT IS LINKED OPEN DATA?

OSIJEKima državnu upravu

ima zapošljavanje

ima financije

ima zabavni život

ima sportska događanja

ima sveučilište

Page 15: (PROJEKTURA) open data big data @tgg osijek

LINKED OPEN DATA

2 KEY INGREDIENTS tbd

Facilitating data integration through:• Common data model• Building relations

Page 16: (PROJEKTURA) open data big data @tgg osijek

KEY INGREDIENTS

2 KEY INGRIDIENTS tbd

1. RDF RESOURCE DESCRIPTION FRAMEWORK (GRAPH BASED DATA)• identifies objects (URIs)• interlink information (Relationships)

2. VOCABULARIES (ONTOLOGIES)• provide shared understanding of data• organize knowledge in a machine comprehensible way• give an exploitable meaning to the data

Page 17: (PROJEKTURA) open data big data @tgg osijek

LINKED OPEN DATA

5 STARS OPEN DATA MODELTim Berners-Lee, Linked Data initiative

make your stuff available on the Web (whatever format) under an open licencemake it available as structured data (e.g. Excel instead of image scan of a table)use non-proprierary format (e.g. CSV instead of Excel)user URI to denote things, so that people can point at your stufflink your data to other data to provide context

http://lab.linkeddata.deri.ie/2010/star-scheme-by-example

Page 18: (PROJEKTURA) open data big data @tgg osijek

ON WEB, OPEN LICENSE

1 STAR

• ON THE WEB• wide access• google can index it• people can find it themselves

• OPEN LINCENCE• regulate reuse of data• helps maintain provenance• strengthens business reuse

http://opendefinition.org/licenses/

Page 19: (PROJEKTURA) open data big data @tgg osijek

STRUCTURED DATA

2 STAR

• MACHINE READABLE

Page 20: (PROJEKTURA) open data big data @tgg osijek

FORMATS

2 STAR

• GOOD XLSX, CSV, JSON, MICRODATA• „GOOD” WEB, DOCX• BAD PDF• BAD, BAD charts, maps, images

• SCREENSCRAPING? http://scraperwiki.com

http://opendefinition.org/licenses/

Page 21: (PROJEKTURA) open data big data @tgg osijek

NON PROPRIETARY FORMATS

3 STAR

• Freedom of how to process, analyse and visualise data• PROPRIETARY

• DOCX, XLSX, PDF• NON PROPRIETARY

• CSV, XML, JSON, MICRODATA, RDF

http://opendefinition.org/licenses/

Page 22: (PROJEKTURA) open data big data @tgg osijek

USE OF URI

4 STAR

• Unique identifiers enable others to point to the data

http://opendefinition.org/licenses/

Page 23: (PROJEKTURA) open data big data @tgg osijek

LINKING DATA (AND RDF)

5 STAR: Link your data to other data to provide context

http://lod-cloud.net

• „Linked Data” approach have its use cases in Web Applications with LOT of Data and little Semantics

• Example: definme simple relationship and apply to large, heterogenous data collections

Page 24: (PROJEKTURA) open data big data @tgg osijek

RESOURCE DESCRIPTION FRAMEWORK

Part fo the 5 STAR story

http://lod-cloud.net

• Web is a global, universal information space for documents• Can we do the same for DATA and make the web into a

database?• RDF is the DATA FORMAT for that database

Page 25: (PROJEKTURA) open data big data @tgg osijek

RDF 101small pieces, loosely joined, easy to reuse, easy to recombine, unexpected reuse, iterative

Page 26: (PROJEKTURA) open data big data @tgg osijek

TYPICAL DATABASE TABLE

Part of the 5 STAR story

http://lod-cloud.net

ISBN TITLE AUTHOR PUBLISHERID PAGES

112349987 Practical RDF David Nelson Jr. 11692 443

234998021 C# for Dummies

Rick Torrensen 11692 1120

501334301 Calling the Stack

Shelly Monroe 45009 128

...

...

Page 27: (PROJEKTURA) open data big data @tgg osijek

TYPICAL DATABASE TABLE

Part of the 5 STAR story

http://lod-cloud.net

ISBN TITLE AUTHOR PUBLISHERID PAGES

112349987 Practical RDF David Nelson Jr. 11692 443

234998021 C# for Dummies

Rick Torrensen 11692 1120

501334301 Calling the Stack

Shelly Monroe 45009 128

...

...

prop

ertie

s

subjects

Intersection is a property of the

subject

Page 28: (PROJEKTURA) open data big data @tgg osijek

LINKING DATA

bookC# for

Dummies

title

subject value

property

The essence of RDF: the „TRIPLE”

Page 29: (PROJEKTURA) open data big data @tgg osijek

TYPICAL DATABASE TABLE

SELECTING MULTIPLE PROPERTIES

ISBN TITLE AUTHOR PUBLISHERID PAGES

112349987 Practical RDF David Nelson Jr. 11692 443

234998021 C# for Dummies

Rick Torrensen 11692 1120

501334301 Calling the Stack

Shelly Monroe 45009 128

...

...

Page 30: (PROJEKTURA) open data big data @tgg osijek

LINKING DATA

bookC# for

Dummies

title

2349908

Rick Torrense

n

isbn

author

multiple properties

graphically: think in the

terms of graphs, not

XML or documents

Amazonpublishe

r

name

publisher

Relationship between „things”

Page 31: (PROJEKTURA) open data big data @tgg osijek

USING THE WEB INFRASTRUCTURE

Part of the 5 STAR story

http://lod-cloud.net

• For Web scale database we need to be able to identify things globally and uniquely

• URI (URLs) already provide those capabilities• Name things with URIs, specifically http://• This is THE KEY to linked data

Page 32: (PROJEKTURA) open data big data @tgg osijek

RDF IN PRACTICE

http://example.com/

thing

named relations

„text”

http://example.com/rel

3.141592

http://example.com/other

numeric values and literals

named resources

• The URI identifies the thing you are describing• If two people create data using the same URI then they are describing

the same thing• That makes it easy to merge data from different sources together• RDF data can use URIs from many different websites

Page 33: (PROJEKTURA) open data big data @tgg osijek

Cloud

Monitors air and water qualityCitizens rate quality via SMS

OPEN LINKED DATA IN U.K.open.data.gov.uk

Page 34: (PROJEKTURA) open data big data @tgg osijek

LINKED DATA STANDARDS

Government Linked Data (GLD) WG http://www.w3.org/2011/gld/

Page 35: (PROJEKTURA) open data big data @tgg osijek

SPARQL

Common understaning about „things”supports the automatic generation of new information

• Query language of the semantic web. It lets us:• Pull values from STRUCTURED and SEMI STRUCTURED data• Explore data by querying UNKNOWN RELATIONSHIPS• Perform, COMPLEX JOINS OF DISPARATE DATABASES• Transforms RDF from one vocabulary to another

Page 36: (PROJEKTURA) open data big data @tgg osijek

SPARQL

Common understaning about „things”supports the automatic generation of new information

# prefix declarationsPREFIX foo: <http://example.com/resources/>...# dataset definitionFROM ...# result clauseSELECT ...# query patternWHERE { ...}# query modifiersORDER BY ...

Page 37: (PROJEKTURA) open data big data @tgg osijek

SO, WHAT DO WE DO WITH THE DATA?

Page 38: (PROJEKTURA) open data big data @tgg osijek

HACKATONS: DATA + APPLICATIONS!

BE SURE THAT YOU HAVE APP BUILDING PROCESS… or paid teams, unpaid volunteers, hackatons, open data camps, student competitions

• value is not in the raw data alone (but the data needs to be published first!)

• applications for use of the data is key to open data success

• size of the application does not guarantee its value and success

• value to the Citizen is the bottom line

Page 39: (PROJEKTURA) open data big data @tgg osijek

MANY FORMS, SAME PURPOSE

FOR EXAMPLE, INVOLVE… HACKATONSstrange word for a noble cause. building together the future in… 48 hours.

Page 40: (PROJEKTURA) open data big data @tgg osijek

PROTOTYPES

OR RESULTS COMING FROM HACKATONSbut also many different scenarios of organization and citizen engagement resulting in apps

Page 41: (PROJEKTURA) open data big data @tgg osijek

OPEN GOVERNMENT: PARTICIPATION

Page 42: (PROJEKTURA) open data big data @tgg osijek

OPEN DATA ARCHITECTURE

Page 43: (PROJEKTURA) open data big data @tgg osijek

ARCHITECTURE FOR OPEN: HYBRID?

Department B

Internal PrivatePORTAL

Published Data

Department A

Internal Network

Model that controls sensitive data and supports external scalability and availability. Brigde to PUBLIC.Keywords: Public and Private Cloud. Provider Datacenter. SLA.

Agency

External PublicPORTAL

Published Data

is using

is publishing

Everybody

External Network

Page 44: (PROJEKTURA) open data big data @tgg osijek

CURRENT VIEW ON OPTIONS

CKAN ?

Private Cloud

LINUX VMMS VM

Public Cloud (MS AZURE)

Public Cloud (nonMS)

public azure infrastructure public cloud infrastructure

PaaS IaaS (LINUX VM)

CKAN SOCRATAODGI

IaaS (LINUX VM)

CKAN SOCRATA

private infrastructure

private infrastructure that can be built on Microsoft or Linux based stack

public Microsoft Azure cinfrastructure supporting „pure play” PaaS solutions and VM based solutions (MS and nonMS)

public cloud infrastrucutre non Microsoft (usually AWS or OpenStack or …)

solutions on Linux

solutions on MS

Page 45: (PROJEKTURA) open data big data @tgg osijek

CKAN

OPEN SOURCE DATA PORTAL SOFTWARECKAN is open source and can be downloaded and used for free

• fully featured, mature, open source data management solution:• publish and find datasets• store and manage data• engage with users and others• customize and extend

• rich user base: data.gov.uk, publicdata.eu,…

Page 46: (PROJEKTURA) open data big data @tgg osijek

BIG DATA

Page 47: (PROJEKTURA) open data big data @tgg osijek

CIO

Source: Forrester, „Evaluating Big Data Predictive Analytics Solutions”, 2012

„Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all of the data it needs to operate, make decisions, reduce risks, and serve customers.”

“Predictive analytics solutions allow firms to discover, evaluate, optimize, and deploy predictive models by analyzing data sources to improve business outcomes.”

Page 48: (PROJEKTURA) open data big data @tgg osijek

Internet of ThingsInternet of Everything

source: „US Unprepared for Internet Device Flood”, Kurt Stammbergerm MOCANA

Page 49: (PROJEKTURA) open data big data @tgg osijek

source: „Big Data Analytics”, survey of 325 companies, TDWI 2011

Examples of New MultiStructured Data

OtherScientific (atronomy, genomes, physics)

Machine-generated (sensors, RFID, devices)Spatial (long/lat coordinates, GPS output)

Web logs and clickstreamsSocial media (blogs, tweets, social networks)

Unstructured (text, audio, video)Events (messages, usually in real time)

Complex (hierarchical or legacy)Semistructured (XML and similar)

Structured (tables, records)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Data types collected as big data and/or with advanced analytics

Page 50: (PROJEKTURA) open data big data @tgg osijek

source: „Understanding te elements of Big Data”, Karmasphere, 2011

Analytics and Use

Data Management and

Storage

BI and Visualization

Applications

DATA USE

Unstructured

Structured

DATA

AnalyticsDevelopmen

t

BIG ANALYTICS

Hadoop

BIG DATA

Key Elements of Big Data

Page 51: (PROJEKTURA) open data big data @tgg osijek

Where is BigData coming from... today

• twitter 12+ TB of tweet data every day• facebook 25+ TB of logs data every day• google XX+ TB of search logs every day• people 2+ bilion people on the web today

• RFID 30 billion RFID tags today• smart meters 76 million smart meters today• GPS devices 100+ millions GPS devices per year• trade 5 mililon trade events per second

• phone cams 4,6 billion cams today• cameras 100+ thousands video feeds from surveillance

Two types of Big Data• Data in Movements (Streams)• Data at Rest (Oceans)

New era of computing

New Questions

VOLUME: SIZESOCIAL ANALYTICS: What’s the social sentiment of my product?

VELOCITY: SPEEDLIVE DATA FEED: How do I optimize my services based on patterns of weather, traffic, etc.?

VARIETY: STRUCTUREADVANCED ANALYTICS: How do I better predict future outcomes?

... so that is not our ordinary enterprise environment? Well...

Page 52: (PROJEKTURA) open data big data @tgg osijek

So What? Well... Big Data For FinanceSocial Media: Trustworthy Borrowers vs. Defaulters

They are all using BigData approach and combine that with „socring as a services mechanisms” like...

KREDITECH• Looks at 8.000 indicators like

location data, social graph, behaviooral analytics, e-commerce shopping behavior and device data...

• So, GPS, likes, friends, locations, posts, movement, duration on page, shopping, apps installed, operating system...

ZESTFINANCECredit socring information via big data, looks at 70.000 signals and feeds them into 10 spearate underwriting models

KLOUT

LENDDO https://www.lenddo.com/ Looking at applicant’s connection on Facebook and

Twitter Key to get the loan: highly trusted individuals in your

social network

LENDUP https://www.lendup.com/ looks at social media activity to ensure that factual data

provided on the online application matches what can be inferred from Facebook and Twitter.

WONGA https://www.wonga.com/ considers the time of the day and the way a candidate

clicks around the site in determining whether to grant a loan

Page 53: (PROJEKTURA) open data big data @tgg osijek

So What? Well... Big Data for Telcos

• Two different strategies for growth and mature markets:• Growth: aquisition strategy, simple BI needs (reporting)• Mature: differentiation strategy, complex BI needs (data

mining)

• Classical BigData problems: Churn Management (on prepaids)• When to engage (mid of billing cycle) www.globys.com • When not to engage (leave good customers alone)

www.venda.com

• How telcos can invent business models?• IMPROVE SERVICES: Data = Improved Business (Amazon)• MOBILE ADVERTISING: Data = Better Advertising (Google)• SELL ACCESS TO INSIGHTS: Data = Business (comScore)• BECOME GATEKEEPER: Data = Personal Risk (

www.reputation.com )

Telco: competition in a mature market

Early operator initiatives will still involve a strong element of traditional business intelligence and analytics: structure records: Call and Billing Records, Electronic Data Records, Location Records...

Unstructured: Phone Calls, Text Messages, Social Media posts...

Page 54: (PROJEKTURA) open data big data @tgg osijek

OPEN DATA + BIG DATA?

IMAGINE THE WORLD...Where you dont have control over the things that happen around you.

• OPEN DATA• You can fetch and use any data that exist around. You can

connect that data to any other source and personalize the use.

• BIG DATA• You can fetch and use any volume of data that is flowing from

devices around you and from your own usage. • OPEN BIG DATA

• WE CAN PREDICT AND REACT TO ANY ACTION IMMIDIATELLY

Page 55: (PROJEKTURA) open data big data @tgg osijek

INTROjust a few words about me

so, if we all nod our heads... we can continue...

• Ratko Mutavdzic is founder of PROJEKTURA, consulting company that work with new and emerging technologies and introduce them to the corporate and enterprise environments. Prior to this one, he spent 15 years Microsoft, starting in a consulting practice and then leading several different sales and technology teams.

• He is the author of number of published papers on different aspects of the technology, successful blogs on new technologies and project management, and active contributor in a number of social networks exploring the use and advance of new ways to connect and share innovation and invention.

• He frequently speaks on conferences, meetings, workshops, coffee shops and generally at every place where people like to explore, challenge, investigate, think and innovate.

• Keywords: change, project, program, portfolio, innovation, startup

note: more contact info on a last slide