Introduction to the IBM Watson Data Platform

Preview:

Citation preview

IBM Watson Data Platform and Open Data

27 February 2017

Margriet Groenendijk | Developer Advocate | IBM Watson Data Platform

@MargrietGr

https://medium.com/ibm-watson-data-lab

@MargrietGr

About me

Developer Advocate, Data scientist

Previous

Research Fellow at University of Exeter, UK

PhD at VU University Amsterdam, the Netherlands

@MargrietGr

IBM Watson Data Platform

Connect Discover Accelerate

@MargrietGr

IBM Watson Data Platform

IBM Bluemixhttps://console.ng.bluemix.net/

@MargrietGr

Bluemix

https://console.ng.bluemix.net/

@MargrietGr

https://github.com/snowch/movie-recommender-demo

@MargrietGr

https://movie-recommender-demo-margrietgroenendijk-1234.mybluemix.net/

@MargrietGr

@MargrietGr

@MargrietGr

@MargrietGr

@MargrietGr

APIs

https://github.com/MargrietGroenendijk/Bristol

https://github.com/MargrietGroenendijk/Bristol

@MargrietGr

Example : twitter

@MargrietGr

Example : Watson Tone Analyser

@MargrietGr

EmotionLanguage style

Social propensities

Analyze how you are coming across to others

CloudantNoSQL

@MargrietGr

Cloudant is a database

id firstname lastname dob

1 John Smith 1970-01-01

2 Kate Jones 1971-12-25

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01" }

@MargrietGr

Cloudant is "schemaless"

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "john.smith@gmail.com" }

@MargrietGr

Cloudant is "schemaless"

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "john.smith@gmail.com", "confirmed": true }

@MargrietGr

Cloudant is "schemaless"

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "john.smith@gmail.com", "confirmed": true, "tags": ["tall", "glasses"] }

@MargrietGr

Cloudant is "schemaless"

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "john.smith@gmail.com", "confirmed": true, "tags": ["tall", "glasses"], "address" : { "number": 14, "street": "Front Street", "town": "Luton", "postcode": "LU1 1AB" } }

@MargrietGr

Cloudant is built for the web

▪Store JSON Documents

▪Speaks an HTTP API

▪Lives on the web

@MargrietGr

Cloudant is fault tolerant

@MargrietGr

Cloudant is fault tolerant

@MargrietGr

Cloudant is resilient

"write"

@MargrietGr

Cloudant is resilient

"ok"

"write"

@MargrietGr

Cloudant is scalable

@MargrietGr

Cloudant replicates

@MargrietGr

Cloudant replicates

@MargrietGr

Cloudant replicates

@MargrietGr

Cloudant replicates

@MargrietGr

Runkeeper

@MargrietGr

@MargrietGr

Open Street Map Data

IBM Cloudant Use from anywhere!

Daily updatesVM daily cron Python script

Always up to date! Currently 12,467,460 POIs

@MargrietGr

wget -c http://download.geofabrik.de/europe/netherlands-latest.osm.pbf

Several data sources - world, continent, country, city or a user defined box

Several data formats for which free to use conversion tools exist - pbf, osm, json, shp

Example:

@MargrietGr

Extract the POIs with osmosis

osmosis --read-pbf netherlands-latest.osm.pbf \--tf accept-nodes \

aerialway=station \aeroway=aerodrome,helipad,heliport \amenity=* craft=* emergency=* \highway=bus_stop,rest_area,services \historic=* leisure=* office=* \ public_transport=stop_position,stop_area \shop=* tourism=* \

--tf reject-ways --tf reject-relations \--write-xml netherlands.nodes.osm

(easy to install with brew on Mac)

@MargrietGr

Some cleaning up with osmconvert

Convert from osm to json format with ogr2ogr

osmconvert $netherlands.nodes.osm --drop-ways --drop-author --drop-relations --drop-versions >$netherlands.poi.osm

ogr2ogr -f GeoJSON $netherlands.poi.json $netherlands.poi.osm points

@MargrietGr

Upload to Cloudant with couchimport

export COUCH_URL="https://username:password@username.cloudant.com"

cat $netherlands.poi.json | couchimport --db poi-$netherlands --type json --jsonpath "features.*"

https://github.com/glynnbird/couchimport

IBM Cloudant

@MargrietGr

Examples from

https://console.ng.bluemix.net/docs/services/Cloudant/api/cloudant-geo.html#cloudant-geospatial

@MargrietGr

@MargrietGr

UK Crime Data from https://data.police.uk/data/

@MargrietGr

Python - requests

dashDBData warehouse

@MargrietGr

Add the dashDB service in Bluemix

Add a service

Search for dashDB

@MargrietGr

@MargrietGr

3

1

2

posted:2016-08-01,2016-10-01 followers_count:3000 friends_count: 3000 (weather OR sun OR sunny OR rain OR hail OR storm OR rainy OR drought OR flood OR hurricane OR tornado OR cold OR snow OR drizzle OR cloudy OR thunder OR lightning OR wind OR windy OR heatwave)

REST API docs:https://new-console.ng.bluemix.net/docs/services/Twitter/twitter_rest_apis.html#rest_apis

Search for tweets

4 Select table

Use an existing service

@MargrietGr

Apache Spark

@MargrietGr

Apache Spark

@MargrietGr

@MargrietGr

RDDs : Resilient Distributed Datasets

Data does not have to fit on a single machine

Data is separated into partitions

Creation of RDDs

Load an external dataset

Distribute a collection of objects

Transformations construct a new RDD from a previous one (lazy!)

Actions compute a result based on an RDD

@MargrietGr

Load tweets from dashDB with Spark SQL

@MargrietGr

Clean data, summarise and load into pandas DataFrame

IBM Data Science Experience

datascience.ibm.com

@MargrietGr

Getting started

▪ Go to datascience.ibm.com and sign in with your Bluemix account when you have one, else sign up for one at the top right of the screen

Create a project

▪ Create New project, click on the link in top of the screen▪ Or go to the My Projects in the menu on the left of the screen and click Create New Project

here

Create a project

▪ Name the Project▪ Choose a Spark Service▪ Choose an Object Storage▪ Click Create

Add collaborators

▪ Click add collaborator▪ Search for your project members▪ Select Permission

Add a notebook

▪ Click add notebooks

Add a notebook

▪ Click add notebooks▪ Pick your favourite:▪ Python 2 ▪ Scala▪ R▪ Choose Spark 1.6 or 2.0▪ Click Create Notebook

Let’s write some code

▪ Click the pen icon to start adding code (edit mode)▪ When collaborating only one person can edit, others can add comments to the notebook

when in view mode

@MargrietGr

Example : Bristol open data

@MargrietGr

Object-store

@MargrietGr

Python package PixieDust

@MargrietGr

Watson Machine Learning

@MargrietGr

IBM Watson Data Platform

Bluemix

Data storage

Apps

Watson APIs

Weather

Data Science Experience

Watson Machine Learning

Watson Analytics

Thanks!https://github.com/MargrietGroenendijk/Bristol

http://www.slideshare.net/MargrietGroenendijk/presentations

@MargrietGr

https://medium.com/ibm-watson-data-lab

Recommended