Upload
chris-allison
View
80
Download
0
Embed Size (px)
Citation preview
www.citihub.com
Template V4.01
April 2015
Open Data AnalyticsOpen, flexible technology solutions for Social Listening
Prepared by Ian Tivey, Associate Partner
Citihub at a glance
P2
Tier 1 IT Consultancy
>200 consultants in 3 regions
Client base
Strong heritage with leading
Financial Services firms
Investment Banking
Hedge Fund & Asset Mgt
Retail Financial Services
Service Providers
Growing footprint in other
sectors e.g.
Government
Education
Legal
eCommerce
New York
Toronto
London
Zurich
Hong Kong
Singapore
Social listening
P3
“Listening to the conversations that are going on in social media channels and
using the information gleaned to gain insights in areas like customer sentiment”
According to Forrester1, many marketing leaders are missing the social data
opportunity
• Few marketing leaders convert available social data into social intelligence
• Most marketers don’t have the ability to analyze social data
• Agencies’ use of social data remains inconsistent
• Most listening platforms are ill equipped to inform marketing strategy
1 Forrester report: Use Social Data To Improve Your Social Marketing Maturity, Jan 5 2015
3rd party software solutions
P4
The ‘Enterprise Listening’ or “Social Listening’ platform market is 10 years old with
market leaders including Radion 6, Synthesio and Sprinklr
Key differentiators:
• Data quality
• Sentiment engine intelligence
• Integration (e.g. with CRM)
Challenges
• Platforms can cost $10s thousands/month
• Different algorithms (and hence different tools)
can give varying results when applied to
different problems
• Most listening platforms are ill equipped to
inform marketing strategy
Forrester Wave™: Enterprise Listening Platforms 2014
Alternative platform from open source and public cloud
P5
A flexible process is needed for approaching data analysis problems, allowing for
discovery, iteration and experimentation
Using open source tools, running in public cloud, compliments this requirement for
flexibility, allowing the problem to define the tools rather than retrofitting the
approach to the tools available
• Best results: Choose the stack and algorithms that best fit the problem and help
produce the best results
• Time to market: Suited to rapid, cost effective prototyping while searching for
business value. No need to build-out internal infrastructure and negotiate license
costs
• Cost: Lowest cost model with open source and public cloud
• Future proofing: No lock-in to expensive 3rd party licenced platforms. Industry
will undoubtedly go through expansion and consolidation
• Scale: Use of public cloud allows infinite scalability for large data sources and
intensive analytics
• Correlation: simpler integration with internal data sources
Case study 1Automated Categorisation and Clustering using Machine Learning Techniques
P6
Case study 1
P7
Example business goal
• Sentiment analysis from social media, blogs public websites etc.
• What is customer perception of our products and services or those of our
competitors?
• How is that trending and can we correlate to sales figures and pipeline?
Technical challenges
• Understanding nuances of good vs bad sentiment especially when dealing with
slang and abbreviated text (in this example from Facebook)
• Language support: how to deal with multiple languages e.g. Chinese
Solution
• Open source stack including NLP (Natural Language Processing) from Stanford
University and Machine Learning software from Apache Foundation running on
AWS platform
Case study 1 – sample output
P8
Sample visualisation of negative sentiment from Ocean Park public Facebook page
EMR Cluster
Case Study 1 - Architecture Diagram
P9
EC2
Graph API
Commentspost_id
user_id
created_time
content
number_of_likes
Likespost_id
user_id
Postspage_id
post_id
post_message
Usersuser_id
name
gender
birthday
Master Node
Slave Node
HDFS
Slave Node
HDFS
Slave Node
HDFS
Scripts
cluster
visualise
Java
App
Chinese Word Segmentation
Citihub Facebook
Data Structure
NLP
App
Unsupervised
Clustering App
Machine
Learning
Case Study 1 - Workflow Diagram
P10
Input Facebook page ID into webpage
nodejs scripts pull data via Facebook API
Results stored into S3 bucket. Analyse with unsupervised
clustering tool for early insight.
Create training data set. Manually categorise first 200
comments into +ve, -ve, questions and “noise”
Train machine learning tool for supervised classification
of comments based on training data set using Naïve
Bayes algorithm
classification
model Classify full data set in machine learning tool using
classification model
Pre-process text using NLP tools – here, we segment
Chinese comments into “words”
Analyse results with unsupervised
clustering tool.
Unsupervised
Clustering App
NLP
App
Machine
Learning
Machine
Learning
Case Study 1 – Lessons
P11
• Open Source tools and technology exist which,
used in conjunction with public
cloud ecosystem, provide powerful yet
cost-effective capabilities for data analytics
• The techniques used to create classification models
are language independent and not limited to classifying
sentiment nor social media alone
• For language-based analysis, NLP pre-processing creates
strong models
• Social media channels, like Facebook & Twitter
contain a significant amount of noise. Cutting through
the noise to find useful insights requires
• a good understanding of the business goals
• an understanding of how to break down the problem into various tasks and
workflows
• an iterative process allowing experimentation and improvement
• tool selection appropriate to the task
Case Study 2Twitter workflow
P12
Case study 2
P13
Example business goal
• Geographical sentiment trending on social media (in this example Twitter)
Technical challenges
• Geographical data needs to traced via multiple techniques
• Handling sheer volume of real-time data
Solution
• AWS Kinesis can be used to buffer Twitter firehose (real-time feed)
• Hadoop Cluster to extract, transform and load data
- enrich non-geotagged data with geo-coordinates
- standardise table format and load into data warehouse (RedShift).
• Visualisation tools (e.g. CartoDB) load data from data warehouse
Case study 2 – example output
P14
Time-series geo-visualisation of tweets
• Recorded demo: https://www.youtube.com/watch?v=pqOxq5G9lkE
Case Study 2 – Twitter Visualisation Workflow
P15
Kinesis
EMR
Redshift
Capture & visualisation of historical tweets
ETL using EMR
experimenting with hive and Impala running on hadoop
and Spark
Capture data in Redshift data warehouse
Capture twitter stream using Java API
Queue twitter stream in Kinesis
Time-series geo-visualisation of tweets Sentiment scoring based on word
valence
Other
visualisations
Case Study 3Customer insights based on Facebook page comments & likes
P16
Case study 3
P17
Example business goal
• Customer profiling using social media (Facebook in this example)
• How do I learn more about my customers likes and habits?
• Can I correlate to internal CRM systems?
Technical challenges
• Normalising Facebook Graph API model into a relational model for analysis
• Facebook “opt-in” permissioning system for gaining access to customer data
Solution
• Facebook data is normalized through Node.JS script into Cloud Storage (AWS
S3)
• Use of Hive in Hadoop cluster to analyse normalized data with a SQL-like
interface
Case study 3 – example Facebook page
P18
Illustration using an
example Facebook
page against which to
get deeper information
about ‘friends’ ie
customers
Offers page gives
customers the option to
opt into giving access to
deeper personal
information
Case Study 3 - Architecture
P19
EMR Cluster
EC2
Graph API
Commentspost_id
user_id
created_time
content
number_of_likes
Likespost_id
user_id
Postspage_id
post_id
post_message
Usersuser_id
name
gender
birthday
Master Node
Slave Node
HDFS
Slave Node
HDFS
Slave Node
HDFS
Scripts
App
Citihub Facebook
Data Structure
App
Users Likeuser_id
category
page_id
page_name
Liked Pagespage_id
name
category
sub-categories
about
description
general_info
products
num_likes
city
country
Case Study 3 – Analysis Workflow
P20
Login to Facebook scraping app and click on “scrap
pages I administer”
nodejs scripts pull data via Facebook API
Results stored into S3 bucket
Issue SQL-like commands to load data from S3 into
EMR/Hadoop HDFS
HDFS
Issue SQL-like commands to analyse the data using
Hadoop running on EMR
• show me the list of Facebook page categories that
commenters on my Facebook like & number of pages
in each category
• search for keywords in the feeds of commenters on
my Facebook page and return the names and
demographics of those users
Case study 3 - Example output
P21
Other ideas
• Analyse active users interaction with us
and other friends, link to internal data about
that person
• Search for specific keywords in friends’
personal timelines
• Use toolsets to perform unsupervised page
categorisation based on the page
descriptions, rather than Facebook
categories
• Create our own social graph to reveal
clusters of friends, analyse clusters
Example social graph
Working with Citihub Consulting
P22
Proof of concepts
• Citihub can work with your business, marketing group or technology
departments to establish rapid proof of concepts where you are looking to prove
the value of analytics in the social media space
Mobilise and integrate
• We can work with your technology teams to establish an internal capability that
can adapt to your business needs and social / technology trends
• We can help you with bespoke integration of internal and external data sources
for correlation of data e.g. social sentiment analysis correlated with regional
sales; client profiling on Facebook correlated with internal CRM systems
Service-based analytics
• We can provide managed services where Citihub runs analytics and data
trending for you in the public cloud
www.citihub.com
Keith MaitlandManaging Partner
757 3rd Avenue, 20th Floor
New York
NY 10017
+1 212 878 8840
Chris AllisonManaging Director
12F ICC
1 Austin Road, Kowloon
Hong Kong
+852 8108 2777