20
Akhmedov Khumoyun Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Computing Environment Konkuk 2015 [email protected] SMCC Lab Social Media Cloud Computing Research Center

Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Embed Size (px)

Citation preview

Page 1: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Akhmedov Khumoyun

Storm based Real Time Analytics for Recom-

mending Trending Topics and Sentiment Analysison Cloud Computing En-

vironment

Konkuk 2015

[email protected]

SMCCLab

Social Media Cloud ComputingResearch Center

Page 2: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Outline

• Motivation• Real Time Systems and CEP• Storm Introduction• Used Technologies• Related Work• System Overview• System Architecture• Use Case: Social Media Analytics by SAS

Page 3: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Motivation

• Real time computation is on demand• Responding to the problem almost instantly• Business value• Tightly connected to Cloud Computing• Batch processing limitations• and …

Page 4: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Real Time Systems and CEP

• Real Time System? Real-time system has been described as one which “controls an envi-

ronment by receiving data, processing then, and returning the results sufficiently and quickly to affect the environment at that time”. Real-time response latency is often in the order of seconds, or milliseconds.

• CEP(Complex Event Processing)? CEP is event processing that combines data from multiple sources to in-

fer events or patterns that suggest more complicated circumstances. The goal of CEP is to identify meaningful events (such as threats of attacks) and respond to them asap.

Page 5: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Apache Storm is

• Fast & scalable• Fault-tolerant• Guarantees messages will be processed• Easy to setup & operate• Free & open source

distributes real time computation system- originally developed by Nathan Marz at BackType (acquired by Twitter)

Page 6: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Conceptual and Physical View of Storm

Page 7: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Real Time StreamingApache Storm and Apache Kafka

Page 8: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Why we need Kafka

Apache Kafka is an ideal source for Storm topologies. It provides everything necessary for :

- At most once processing - At least once processing - Exactly once processing Apache Storm includes Kafka spout implementations for all levels of reliabil-

ity. Kafka supports a wide variety of languages and integration points for both

producers and consumers.

Page 9: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Used Technologies

• Apache Storm• Apache HBase• MySQL• Hadoop2• Apache ZooKeeper• Apache Kafka (message broker)• Java and some Python• jQuery and Bootstrap• Play Framework(Java) or Django(Python)

Page 10: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

System Overview

• Trending Topics? “Twitter Trends are automatically generated by an algorithm that

attempts to identify topics that are being talked about more now than were previously.” The Trends list is designed to help people discover the most hottest topics, breaking news from across the world, in real-time.

• Sentiment Analysis? Generally speaking, sentiment analysis aims to determine the attitude

of a speaker or a writer with respect to some topic or the overall con-textual polarity of a document.

Page 11: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Trending Topics

Fashion

uniqlo

adidas

shanel

#armani

Politics

putin

NATO

#obama

ISIS

Sports

#messi

UEFA

#NBA

archery

Econom-ics

crisis

#Greece

loan

finance

Health

#MERS

CDO

#Cardio

Cancer

Page 12: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

100 1000 10000 100K0

2

4

6

8

10

#MERSobamaNATO#cancershanel#crisis

#MERS…..

Trending Topics (real time feel)

Page 13: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Sentiment Analysis (of tweets)

• Positive• Negative• Neutral

Page 14: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Top Ten Trending TweetsN User Tweets Sentiment

1 BigData Red Hat Offers Apache Hadoop Big Data Services For Business Critical Workloads : http://tinyurl.com/qb83boj 

Positive

2 Checkmax Secure your source code. http://bit.ly/1MnVRwQ  Get a full vulnerability report and prevent security breaches

Negative

3 Time.com 5 players to follow in the Women’s World Cup http://ti.me/1LkM0Ku 

Neutral

4 …. …. ….

. …. …. ….

. …. …. ….

8 …. …. ….

9 Iran #Iran, #Russia discuss regional development, #SCO membership http://theiranproject.com/blog/2015/06/20/iran-russia-discuss-regional-development-sco-membership/ …

Negative

10 ….. …. ….

Page 15: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Sentiment Analysis

To find sentiment of incoming tweets I will use some Machine Learning algorithms such as Naïve Bayesian Algorithm (predictive learning) and other related techniques.

Besides, I will use predefined reference sentiment dictionary as a model for efficiently determine sentiment value of tweets.

Page 16: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

System Architecture

TCrawler

TCrawler

TCrawler

Dashboard

Page 17: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

System Workflow

TrendingTopicsBolt

TweetManipulation-

Bolt

SentimentAnalyser-

Bolt

TweetSpout

TweetSpout

DBWriter-Bolt

MySQL

Dash-board

AllTweets

HBase

Page 18: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

Social Media Analytics by SAS

Page 19: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment
Page 20: Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment

THANK YOUAny Questions are welcome…