34
© 2015 IBM Corporation Using Bluemix and dashDB for Twitter Analysis Session # 1824 Torsten Steinbach @torsstei

IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Embed Size (px)

Citation preview

Page 1: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

© 2015 IBM Corporation

Using Bluemix and dashDB for Twitter Analysis Session # 1824

Torsten Steinbach @torsstei

Page 2: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Please Note:• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in acontrolled environment. The actual throughput or performance that any user will experience will varydepending upon many factors, including considerations such as the amount of multiprogramming in theuser’s job stream, the I/O configuration, the storage configuration, and the workload processed.Therefore, no assurance can be given that an individual user will achieve results similar to those statedhere.

2

Page 3: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

3

dashDB

Page 4: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

IBM Insights for Twitter Service in Bluemix

2

Page 5: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Query exactly the data that your social application needs.

Get IBM analytics enrichmentsin addition to base Twitter data.

Whenever needed, check whether previously received Tweets are still valid(compliance).

Ingest, enrich, curate, govern Decahose data over time.

Receive & process compliance events.

Social Application using the IBM Insights for Twitter Service

IBM Insights forTwitter Service:

Search over enriched Decahose Data

TwitterGNIP APIs

SocialApplication

IBM Insights for Twitter System

on Softlayer

Twitter Data enriched through IBM Analytics

Store and Index up to 2-year history of enriched Tweets, point in time compliant

5

PowerTrack collection rules & filters.

Page 6: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Queries

6

keyword Matches tweets that have “keyword” in their body. The search is case-insensitive. cat

“exact phrase match” Matches tweets that contain the exact keyword sequence <”exact”, “phrase”, “match”>. "cats and dogs"

#hashtag Matches tweets with the hashtag “#hashtag”. #insight2014

from: twitterHandle Returns tweets from authors with the preferredUsername twitterHandle. Must not contain the @ sign. from:alexlang11

followers_count:lowerfollowers_count:lower,upper

Matches tweets of authors that have at least “lower” followers. The upper bound is optional and both limits are inclusive. followers_count:500

posted:startTimeposted:startTime, endTime

Matches tweets that have been posted at or after “startTime”. The “endTime” bound is optional, and is inclusive.Timestamps have to be in one of the following two formats:“yyyy-mm-dd”“yyyy-mm-dd'T'HH:MM:SS'Z'”Timezone is UTC

posted: 2014-12-1T00:00:00Z,2014-12-12T00:00:00Z

The query language mimics the Gnip Powertrack query language, a subset of Powertrack operators is available. See documentation in Bluemix as we roll out more query operators.

Boolean Operators 

Operator precedence: “-” is stronger than “AND” and “AND” is stronger “than OR”. You can (and should) use parentheses to make operator precedence explicit.Example: ibm twitter -(lame OR boring) searches for tweets that contain both the terms “ibm” and “twitter” but neither “lame” nor “boring”. 

Query terms 

All of the following query terms can be freely combined with the boolean operators introduced above, e.g. ibm apple followers_count:500

Operator Example(s) Description

term1 AND term2cat dogcat AND dog#cutecat food

Returns tweets that contain both term1 and term2. Whitespace between two terms is treated as AND, so the operator can be omitted

term1 OR term2 #money OR broke Returns tweets that contain either term1 or term2

-term1 ibm -apple Returns tweets that do not contain term1

Page 7: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Count: /messages/count?q=QUERY

• Use to find out how many Tweets match a given query

7

Http Code Description Example Response

200

Number of results at json_path(“search.results”)URL to retrieve documents at json_path(“related.search.href”)Note: add you client_id and your client_secret to this URL

{"search":{ "results":21695 }"related":{ "search":{ "href":"https://server.bluemix.net/api/v1/messages/search?q=ibm" } },}

4xx There was a problem with your query. Please have a look at json_path(“error”) to identify the problem.  

5xx There was a problem with the service. Please have a look at json_path(“error”) and contact support.  

Page 8: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Search: /messages/search?q=QUERY&size=NUMBER

• Search & retrieve <= NUMBER Tweets matching QUERY

8

Http Code Description Example Response

200

Number of overall results at json_path(“search.results”)First batch of results at json_path("tweets")URL to retrieve the next batch of documents (if available) at json_path(“related.next.href”)Note: add you client_id and your client_secret to this URL

{ "search": { "results": 16283624 }, "tweets": [ { "message": { … “body”: “this is a nice tweet ” … “actor” : { “followersCount”: 456, “displayName”: “IBM Tweeter” … “cde” : { "sentiment": { "polarity": "POSITIVE" ... “author”: { “gender”:”male” …}

4xxThere was a problem with your query. Please have a look at json_path(“error”) to identify the problem.

 

5xxThere was a problem with the service. Please have a look at json_path(“error”) and contact support.

   

Page 9: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Example Queries

• Get Tweets about an upcoming movie for a given time frame to sense interest & reactions to trailer:

search?q="posted:2015-02-01T00:00:00Z AND #starwars"&size=5

• Get Tweets with positive/negative sentiment about a product to learn what customers like / dislike about the product:

search?q="IBM Bluemix sentiment:positive"

• Get Tweets about a product being marketed and compare over time to sense audience reaction to the campaign:

search?q="posted:2015-02-01T00:00:00Z,2015-02-15T00:00:00Z AND #IBM"

9

Page 10: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Built-in Tool to load Tweets to dashDB

Page 11: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

R & Python for dashDB

Page 12: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

dashDBPredictive Analytics With R In dashDB 1/3• Built-in R runtime

& R Studio

• ibmdbR package Data frames logically representing data physically residing in dashDB tables

> con <- idaConnect("BLUDB", "", "")> idaInit(con)> sysusage<-ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> systems<-ida.data.frame('DB2INST1.SHOWCASE_SYSTEMS')> systypes<-ida.data.frame('DB2INST1.SHOWCASE_SYSTYPES’)

Push down of R data preparation to dashDB> sysusage2 <- sysusage[sysusage$MEMUSED>50000,c("MEMUSED","USERS")]> mergedSys<-idaMerge(systems, systypes, by='TYPEID')> mergedUsage<-idaMerge(sysusage2, mergedSys, by='SID’)

Push down of analytic algorithms to in-db execution> lm1 <- idaLm(MEMUSED~USERS, mergedUsage)

R Runtime

Browser

Any R Runtime ibmdbR

ibmdbR

RStudio

REST ClientREST

Page 13: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Predictive Analytics With R In dashDB 2/3 Dynamite-native implementation of statistical functions

• colnames, cor, cov, dim, head, length, max, mean, min, names, print, sd, summary, var Logically derived columns pushed down to Dynamite

> myDF <- ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> myDF$MemPerUser <- myDF$MEMUSED / myDF$USERS

Sampling of tables in Dynamite> idaSample(myDF, 3)

SID DATE USERS MEMUSED ALERT MemPerUser1 8 2014-02-14 23:39:00.000000 34 5015 f 1472 5 2014-01-22 07:52:00.000000 96 11512 f 1193 7 2013-09-12 05:17:00.000000 39 5592 t 143

Statistics about tables in Dynamite> summary(myDF)

SID USERS MEMUSED ALERT MemPerUser Min. :0.000 Min. : 3.000 Min. : 350.000 f :3655563 Min. :105.000 1st Qu.:2.000 1st Qu.: 35.000 1st Qu.: 5113.000 t :1344437 1st Qu.:135.000 Median :4.500 Median : 64.000 Median : 9455.000 NA's: NA Median :150.000 Mean : NA Mean : NA Mean : NA Mean : NA 3rd Qu.:7.000 3rd Qu.:111.000 3rd Qu.:16517.000 3rd Qu.:165.000 Max. :9.000 Max. :347.000 Max. :62379.000 Max. :209.000

Statistics about categorical values> idaTable(myDF)

ALERT f t 3655563 1344437

Page 14: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Predictive Analytics With R In dashDB 3/3 Store R objects in Dynamite database

> myPrivateObjects <- ida.list(type='private’)> myPrivateObjects['series100'] <- 1:100> x <- myPrivateObjects['series100’]> X [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 [45] 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 [67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 [89] 89 90 91 92 93 94 95 96 97 98 99 100> names(myPrivateObjects) [1] "series100”> myPrivateObjects['series100'] <- NULL

Manage Dynamite tables> idaExistTable('DB2INST1.SHOWCASE_SYSUSAGE') [1] TRUE> idaShowTables()

Schema Name Owner Type 1 BLUADMIN R_OBJECTS_PRIVATE BLUADMIN T 2 BLUADMIN R_OBJECTS_PRIVATE_META BLUADMIN T 3 BLUADMIN R_OBJECTS_PUBLIC BLUADMIN T 4 BLUADMIN R_OBJECTS_PUBLIC_META BLUADMIN T> myView <- idaCreateView(myDF)> idaIsView(myView) [1] TRUE> idaDropView(myView)> idaIsView(myView) [1] FALSE

Page 15: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Create you R script with RStudio• Storing it in home dir inside dashDB

POST <dashdb-server>/dashdb-api/rscript/<fileName>• Run the specified R script

GET <dashdb-server>/dashdb-api/home• List all files under user home (recursively)

– E.g. list the output written by your R script

GET <dashdb-server>/dashdb-api/home/<fileName>• Download the specified file

Running R in dashDB via REST API

15

Page 16: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

dashDB

Predictive Analytics With Python In dashDB• Bluemix Analytic Notebooks

• ibmdbPy package https://pypi.python.org/pypi/ibmdbpy Data frames logically representing data physically residing in dashDB tables

from ibmdbpy import IdaDataFrameidadf = IdaDataFrame(idadb, "IRIS", indexer = "ID")idadf = idadf[["ID","sepal_length", "sepal_width"]]idadf['new'] = idadf['sepal_width'] + idadf['sepal_length'].mean()idadf.head()

Push down of analytic algorithms to in-db executionfrom ibmdbpy.learn import KMeanskmeans = KMeans(3) # clustering with 3 clusterskmeans.fit_predict(idadf).head()

Analytics for Spark Notebook in BluemixBrowser

Any Python Runtime

ibmdbPy

ibmdbPy

Page 17: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Loading Twitter Data to dashDB with Bluemix App

Show Case for box office analysis with Twitter:www.youtube.com/watch?v=9yVNwOs9L4c

Twitter loader app for dashDB: hub.jazz.net/project/torsstei/Twitter-Loader/overview (www.youtube.com/watch?v=ANakSSGM4zU)

Page 18: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

18

Movie Analysis Show Case

Public map data for US countieshttps://www.census.gov/geo/maps-data/data/tiger-line.html

In Bluemix

dashDB service for analytics and correlation between Tweets

and box office data

Box Office stats from the-numbers.com

Interactive app for visualization using Node.JS and D3.js libraryTweets about movies

from Bluemix service

dashDBAnalysis using

built-in R & RStudio

https://hub.jazz.net/project/torsstei/movie-analysis

Page 19: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Movie Analysis Show Case https://hub.jazz.net/project/torsstei/movie-analysis

Page 20: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Movie Analysis Show Case

Page 21: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

S3

Swift

Populating dashDB with Data

dashDB

Geodata in Esri ShapefilesOn Premise Databases

Mobile App Data in Cloudant

GeoJSON

Twitter

The Weather Company

CSVs

Open Data

BluemixCloud Storage

data.gc.ca, data.gov, data.gov.uk, datahub.io, openAFRICA

Page 22: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Open Data Loader

2

Page 23: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

The Weather Company Data Loader Bluemix App

2

Page 24: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Backup

Page 25: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

dashDB: Key Use Cases

• Minimize capital expense of DR solutionDR in the Cloud

Page 26: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

We Bring Netezza Compatible Analytic Platform to the Cloud

Analytic Extension FrameworkUDX C++ API

Canned Analytics

Application Integration

AE Framework In-DB R In-DB LUAIn-DB Python In-DB Perl

OLAP Functions

ROW_NUMBER

RANK

LAG LEAD

DENSE_RANK Linear Regression

Kmeans Clustering Decision Tree

Association Rules

Association Rules

Naive Bayes

Spatial Operators

Contains

Touches

Within

Intersects

Crosses

Overlaps

R Wrapper Watson Analytics ESRI ArcGIS Connector …

Analytics Applications of ISVs and Customers

STDDEV

COVAR

……

Page 27: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Analytic Code & Algorithms:

Analytic Data:

Data pulled out and processed in analytic application

Analytic Applications

This is where we start from: All analytic processing done on application side

Analytics of Warehouse Data

Page 28: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

SQLs

Analytic Code & Algorithms:

Analytic Data:

Simple data lookup & massage operations pushed down as SQL operations

Analytic Applications

Benefit: Acceleration with no SQL skills required

SQLs

Push Down Step 1: BLU tables only logically represented in analytic application

Accelerate Analytics for Warehouse Data

Page 29: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

SQLs

Analytic Code & Algorithms:

Analytic Data:

Call built-in functions via SQL to execute typical algorithms inside db

Cloud Tooling

Analytic Applications

Benefit: Bring Standard Analytics to the Data

SQLs

Canned Algorithms

Push Down Step 2: Typical and popular algorithms pushed down to canned UDFs in the db

Accelerate Analytics for Warehouse Data

Page 30: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Lang

uage

Fra

mew

ork

(UD

X &

AE

)

Analytic Code & Algorithms:

Analytic Data:

Deploy customer code and call via special SQL function interfaces

SQLsSQLs

Canned Algorithms

Analytic Applications

Benefit: Bring Custom Analytics to the Data

Push Down Step 3: Execute entire customer analytic programs inside the db

Accelerate Analytics for Warehouse Data

Page 31: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Don’t forget to submit your Insight session and speaker feedback! Your feedback is very important to us – we use it to continually improve the

conference.

Access your surveys at insight2015survey.com to quickly submit your surveys from your smartphone, laptop or conference kiosk.

We Value Your Feedback!

31

Page 32: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

32

Notices and DisclaimersCopyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.

Page 33: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

33

Notices and Disclaimers (con’t)

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

•IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DB2® , DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, IMS™, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Page 34: IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

© 2015 IBM Corporation

Thank You