Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL AND
MONGODB
PRESENT BY:
HELLY PATELKUSH PATEL
Data: Twitter Real time Tweets
Tools used: R tool
NO SQL SYSTEM: MongoDB
Creating the Developer’s Account
• First step is to create the twitter Developer’s Account.
Get the API key and Access Tokens
Access tokens
Twitter Authentication WorkFlow
Tool installation for getting Data
• Now installing R studio for getting the Data by adding the different library packages.
Installing the Packages required
• For getting the Twitter data we required the ‘streamR’, ‘ROAuth’ and ‘twitteR’ packages.
Installing Packages(Cont)..
Installing Packages(Cont..)
• Here installing the TwitteR package automatically it install all the dependent packages which are required.
Command used for installing packages
The following command helps us to install the packages in the R studio
• Install.packages(“streamR”)
• Install.packages(“ROAuth”)
• Install.packages(“twitteR”)
Commands for checking up the library installed or not.
• library(streamR)
• library(ROAuth)
• library(twitteR)
Code for handshaking in R
• Executing the command in R for the Handshaking process. It requires the consumer key and consumer secret key which we got by creating the Twitter’s developer account.
Authorizing user
• After writing the code the URL is opened automatically and the user gets authorized by the PIN no.
Capturing Tweets
Now we need to provide the Pin here.
Capturing Tweets
• Now, we need to set the Timeout and the no of tweets for getting the tweets in the filter stream command.
Twitter Data
Twitter returns the data in the .json format and is the logging structure data which looks like as follow:
Twitter Data
• The twitter data looks like as follow which stores in the json file format as shown:
Fields in twitter data
Storing the twitter Data on mongodb
Now after getting the twitter data in form of the Json format from the Rstudio we need to import that data by in the NO SQL System named MongoDB which we have used here by the following steps:
First connecting with MongoDb with:
� mongo
Connecting to the database:
� use Database name
Creating the collection:
db.createCollection(“kushtwitter”)
Storing the json file on mongodb command
• Command used:
Mongoimport - -db helludb - - collection kushtwitter - - file /home/hduser/Downloads/hellutweets_test.json
Twitter data in Mongodb
Data mining on twitter Data
• Query to find the top five hashtag on my data:
> db.kushtwitter.aggregate([{$unwind: '$entities.hashtags'},{$group: {_id: '$entities.hashtags.text',tagCount: {$sum:1}}}, {$sort: {tagCount: -1}}, {$limit:5}]);
Top five Hashtags:
‘Lang’ field with different languages in twitter data
> db.kushtwitter.aggregate([{$group:{_id:'$lang',count:{$sum:1}}},]);
Arranging the data way it was produced by Time
For finding the data way it proceduced the command is:
> db.kushtwitter.find().sort()({”created_at”:-1});
Tweets created with true status
db.kushtwitter.findOne({“retweeted_status”:{$exists”:”true”}})
Tweets creates by word ‘hello’
Tweets=db.kushtwitter.findOne({‘text’:‘$regex’:’hello’}})
Friends count vs Followers Count
Plotting Graph
Displaying the first ten words of tweet
substring(tweet_df$text, 1, 10)
Wordcount by column
Wordcount of the tweet_df
Thank You