Upload
takekatsu-hiramura
View
473
Download
14
Embed Size (px)
DESCRIPTION
RForcecom: An R package which
provides a connection to Force.com
and Salesforce.com
Takekatsu Hiramura
2014-07-02 The R User Conference 2014 @ UCLA
1
Agenda
2
1. About me
2. Brief introduction of Force.com and Salesforce.com
3. Overview of RForcecom
4. Features of RForcecom
5. Example of the analysis using RForcecom
-Visualizing consumers’ voice-
Agenda
3
1. About me
2. Brief introduction of Force.com and Salesforce.com
3. Overview of RForcecom
4. Features of RForcecom
5. Example of the analysis using RForcecom
-Visualizing consumers’ voice-
Takekatsu Hiramura
» IT consultant, Software Engineer and Data scientist
» Private website:
http://thira.plavox.info/
» Blog:
http://hiratake55.wordpress.com/
» R-bloggers:
http://www.r-bloggers.com/author/takekatsu-hiramura/
4
Agenda
5
1. About me
2. Brief introduction of Force.com and Salesforce.com
3. Overview of RForcecom
4. Features of RForcecom
5. Example of the analysis using RForcecom
-Visualizing consumers’ voice-
Specific features of CRM
About Salesforce.com/Force.com
6
» “Salesforce.com “ is one of the most famous SaaS (Software-as-a-Service)
based CRM (Customer Relationship Management) service.
» “Force.com” is a application platform of Salesforce.com, and it specifically
called PaaS (Platform-as-a-Service).
Campaign
Management
Contract
Management
Customer
Management
Product
Management
and etc.
Sales
Forecasting
Case
Management
Overview of the Application/Service Architecture
Application Platform
Service
Custom Object Apex /
VisualForce
Web API
and etc.
Agenda
7
1. About me
2. Brief introduction of Force.com and Salesforce.com
3. Overview of RForcecom
4. Features of RForcecom
5. Example of the analysis using RForcecom
-Visualizing consumers’ voice-
The RForcecom package
I developed an R package “RForcecom” which provides a connection to
Salesforce.com and Force.com via REST API.
8
Statistical Analysis
Machine Learning
Data Manipulation
Visualization
Customer Relationship
Management
Dashboard
Collaboration Platform
(Chatter,Schedule,ToDo etc.)
R Salesforce.com
Delete
Insert
Update/Upsert
Data extract
SOQL query
Search
The CRAN page of the RForcecom
9
http://cran.r-project.org/web/packages/RForcecom/
Source code is available on GitHub
10
https://github.com/hiratake55/RForcecom
Agenda
11
1. About me
2. Brief introduction of Force.com and Salesforce.com
3. Overview of RForcecom
4. Features of RForcecom
5. Example of the analysis using RForcecom
-Visualizing consumers’ voice-
Features of RForcecom
Execute a SOSL rforcecom.search()
Create a record rforcecom.create()
Retrieve record rforcecom.retrieve()
Update a record rforcecom.update()
Upsert a record rforcecom.upsert()
Delete a record rforcecom.delete()
Retrieve a server timestamp rforcecom.getServerTimestamp()
Execute a SOQL rforcecom.query()
Sign in to the Force.com rforcecom.login()
Feature Function name #
Retrieve object descriptions rforcecom.getObjectDescription()
Retrieve a list of objects rforcecom.getObjectList
8
2
3
4
5
6
11
7
1
10
9
12
Sign in to the Force.com
13
> library(RForcecom)
> username <- [email protected]
> password <- "YourPasswordSECURITY_TOKEN”
> instanceURL <- https://xxx.salesforce.com/
> apiVersion <- "26.0“
> session <- rforcecom.login(username, password, instanceURL, apiVersion)
Retrieve records
14
> objectName <- "Case"
> fields <- c("CaseNumber", "Subject", "Status")
> rforcecom.retrieve(session, objectName, fields, order=c("CaseNumber"),limit=12)
Salesforce.com R
Execute a SOQL
> soqlQuery <- "SELECT Id, Name, Industry FROM Account order by CreatedDate"
> rforcecom.query(session, soqlQuery)
15
Salesforce.com R
Agenda
16
1. About me
2. Brief introduction of Force.com and Salesforce.com
3. Overview of RForcecom
4. Features of RForcecom
5. Example of the analysis using RForcecom
-Visualizing consumers’ voice-
RForcecom demo : Visualizing consumers’ voice
» Assume you are a manager at a company and want to know the
consumers’ voice from CRM. Consumers’ voices are stored in
Salesforce.com which registered by their call center staff.
17
Call Center Salesforce.com R Managers
Data collection/
Operation
Data
Management Data Analysis Reporting
Customer
Mgmt.
Case Mgmt.
REST/SOAP
API
API Client
(RForcecom)
NLP
(TreeTagger)
Visualization
(Wordcloud) …
…
Sample Dataset:
Delta’s Twitter Social Customer Support Account
» It is difficult to use actual dataset, so I crawled Delta Airline’s Twitter
account (@DeltaAssist) and stored tweets to Salesforce.com instead
of actual dataset.
18
https://twitter.com/DeltaAssist/with_replies
Step 1: Retrieving a dataset from Salesforce.com
» Tweets sent to @DeltaAssist are stored in Salesforce.com.
19
Step 1: Retrieving a dataset from Salesforce.com
» Tweets sent to @DeltaAssist are stored in Salesforce.com.
20
Step 1: Retrieving a dataset from Salesforce.com
» Load required libraries and sign into Salesforce.com.
21
> library(RForcecom)
> username <- [email protected]
> password <- "YourPasswordSECURITY_TOKEN”
> instanceURL <- https://xxx.salesforce.com/
> apiVersion <- "26.0“
> session <- rforcecom.login(username, password, instanceURL, apiVersion)
Step 1: Retrieving a dataset from Salesforce.com
» To retrieve dataset with parameters of objectname and field names.
22
> CustomerVoice <-rforcecom.retrieve(session,"CustomerVoice__c",c("TweetDate__c","Tweet__c"))
> head(CustomerVoice$Tweet__c,10)
Step 2: Extracting high-frequency keywords
23
> library(koRpus)
> temp.file.name<-tempfile()
> write.table(CustomerVoice$Tweet__c,temp.file.name,col.names=F,row.names=F)
> tagged<-treetag(temp.file.name, lang="en",
treetagger="manual",TT.options=list(path="C:/Apps/TreeTagger", preset="en",encoding="UTF-8"))
> tagged.DF<[email protected]
> head(tagged.DF,10)
» Tag the word class for each words using “koRpus“ package and TreeTagger.
Step 2: Extracting high-frequency keywords
24
> term<-tagged.DF[tagged.DF$wclass=="noun",]$token
> term<-tolower(term)
> head(term,20)
» Filter “noun” from tagged list
Step 2: Extracting high-frequency keywords
25
> term.unique<-unique(term)
> term.freq <- unlist(lapply(term.unique,function(x){length(term[term==x])}))
> termfreq<-data.frame(term=term.unique, freq=term.freq)
> termfreq<-termfreq[order(termfreq$freq,decreasing=T),]
> head(termfreq,10)
» Count frequencies of the terms.
Step 3: Visualizing the words as a word cloud
26
> library(wordcloud)
> termfreq.top<-head(termfreq, n=100)
> pal <- brewer.pal(8, "Dark2")
> windowsFonts(SegoeUI = "Segoe UI")
> wordcloud(termfreq.top$term, termfreq.top$freq, random.color=T, colors=pal, family="SegoeUI")
» Visualize the terms using wordcloud package. “Flight” is the most frequent.
Step 4: Visualize the Buzz-word of the day
27
> CustomerVoice.sun <- rforcecom.query(session, "select Tweet__c, TweetDate__c from CustomerVoice__c
where TweetDate__c >= 2014-05-25T00:00:00-04:00 and TweetDate__c < 2014-05-26T00:00:00-04:00")
> CustomerVoice.mon <- rforcecom.query(session, "select Tweet__c, TweetDate__c from CustomerVoice__c
where TweetDate__c >= 2014-05-26T00:00:00-04:00 and TweetDate__c < 2014-05-27T00:00:00-04:00")
> CustomerVoice.tue <- rforcecom.query(session, "select Tweet__c, TweetDate__c from CustomerVoice__c where TweetDate__c >= 2014-05-27T00:00:00-04:00 and TweetDate__c < 2014-05-28T00:00:00-04:00")
> CustomerVoice.wed <- rforcecom.query(session, "select Tweet__c, TweetDate__c from CustomerVoice__c where TweetDate__c >= 2014-05-28T00:00:00-04:00 and TweetDate__c < 2014-05-29T00:00:00-04:00")
> CustomerVoice.thu <- rforcecom.query(session, "select Tweet__c, TweetDate__c from CustomerVoice__c
where TweetDate__c >= 2014-05-29T00:00:00-04:00 and TweetDate__c < 2014-05-30T00:00:00-04:00")
> CustomerVoice.fri <- rforcecom.query(session, "select Tweet__c, TweetDate__c from CustomerVoice__c
where TweetDate__c >= 2014-05-30T00:00:00-04:00 and TweetDate__c < 2014-05-31T00:00:00-04:00")
> CustomerVoice.sat <- rforcecom.query(session, "select Tweet__c, TweetDate__c from CustomerVoice__c
where TweetDate__c >= 2014-05-31T00:00:00-04:00 and TweetDate__c < 2014-06-01T00:00:00-04:00")
> CustomerVoice.all <- rbind(CustomerVoice.sun, CustomerVoice.mon, CustomerVoice.tue, CustomerVoice.wed, CustomerVoice.thu, CustomerVoice.fri, CustomerVoice.sat)
» Retreive daily datasets by SOQL.
Step 4: Visualize the Buzz-word of the day
28
# Tag, Extract noun, Calculate TF
make.treetag<-function(CustomerVoice){
temp.file.name<-tempfile()
write.table(CustomerVoice$Tweet__c,temp.file.name,col.names=F,row.names=F)
tagged<-treetag(temp.file.name, lang="en", treetagger="manual",TT.options=list(path="C:/Apps/TreeTagger",
preset="en",encoding="UTF-8"))
tagged.DF<[email protected]
# Extract noun, To lower
term<-tagged.DF[tagged.DF$wclass=="noun",]$token
term<-tolower(term)
# Count frequency of term
term.unique<-unique(term)
term.freq <- unlist(lapply(term.unique,function(x){length(term[term==x])}))
termfreq.DF<-data.frame(term=term.unique, freq=term.freq, stringsAsFactors=F)
termfreq.DF<-termfreq.DF[order(termfreq.DF$freq,decreasing=T),]
return(termfreq.DF)
}
# Apply to each dataset
termfreq.sun<-make.treetag(CustomerVoice.sun)
termfreq.mon<-make.treetag(CustomerVoice.mon)
termfreq.tue<-make.treetag(CustomerVoice.tue)
termfreq.wed<-make.treetag(CustomerVoice.wed)
termfreq.thu<-make.treetag(CustomerVoice.thu)
termfreq.fri<-make.treetag(CustomerVoice.fri)
termfreq.sat<-make.treetag(CustomerVoice.sat)
termfreq.all<-make.treetag(CustomerVoice.all)
» Tag, extract noun and calculate the Term Frequency (TF).
*TF (Term Frequency):
number of occurrence of term i in document j
29
Step 4: Visualize the Buzz-word of the day
# Calculate IDF
IDF.documents <- sapply(termfreq.all$term,function(x){
sum(
nrow(termfreq.sun[termfreq.sun$term==x,])>0,
nrow(termfreq.mon[termfreq.mon$term==x,])>0,
nrow(termfreq.tue[termfreq.tue$term==x,])>0,
nrow(termfreq.wed[termfreq.wed$term==x,])>0,
nrow(termfreq.thu[termfreq.thu$term==x,])>0,
nrow(termfreq.fri[termfreq.fri$term==x,])>0,
nrow(termfreq.sat[termfreq.sat$term==x,])>0
)
})
IDF<-data.frame(term=termfreq.all$term,IDF=log(7/IDF.documents))
» Calculate the Inverse Document Frequency (IDF).
*IDF (Inverse Document Frequency):
IDF measures “Term specificity”.
df: number of documents containing term i.
N: Total Number of documents
Reference: http://dovgalecs.com/blog/matlab-simple-tf-idf/
IDF=
Step 4: Visualize the Buzz-word of the day
30
# Calculate TF and returns TF-IDF
calc.tfidf <- function(termfreq){
# TF
termfreq$TF <- termfreq$freq/sum(termfreq$freq)
# TF-IDF
tfidf.val <- lapply(termfreq$term,function(x){
tf_i <- termfreq[termfreq$term==x,]$TF
idf_i <- IDF[IDF$term==x,]$IDF
return(tf_i * idf_i)
})
tdidf <- data.frame(term=termfreq$term, TFIDF=unlist(tfidf.val))
return(tdidf)
}
tfidf.sun <- calc.tfidf(termfreq.sun)
tfidf.mon <- calc.tfidf(termfreq.mon)
tfidf.tue <- calc.tfidf(termfreq.tue)
tfidf.wed <- calc.tfidf(termfreq.wed)
tfidf.thu <- calc.tfidf(termfreq.thu)
tfidf.fri <- calc.tfidf(termfreq.fri)
tfidf.sat <- calc.tfidf(termfreq.sat)
» Calculate the TF-IDF of each dataset.
*TF-IDF
(Term Frequency–Inverse Document Frequency):
TF-IDF measures how important a word is in a
document.
TF-IDF = TF × IDF
Step 4: Visualize the Buzz-word of the day
31
# Wordcloud
draw.wordcloud <- function(tfidf,title=""){ png.filename <- paste("wordcloud-", title,".png", sep="") png(png.filename,width=7,height=7,units="in", res=600)
tfidf <- tfidf[order(tfidf$TFIDF, decreasing=T),] tfidf.head <- head(tfidf, n=100) # Extract to 100 terms
par(oma = c(0, 1, 2, 1)) # Set margin pal <- brewer.pal(8, "Dark2") wordcloud(tfidf.head$term,tfidf.head$TFIDF,random.color=T,colors=pal,main=title) # Plot Wordcloud
par(oma = c(0, 0, 0, 0)) # Unset Margin title(title) # Add Title
dev.off() # Close File } # Plot Wordcloud for each dataset
draw.wordcloud(tfidf.sun,title="2014-05-25(Sun)") draw.wordcloud(tfidf.mon,title="2014-05-26(Mon)")
draw.wordcloud(tfidf.tue,title="2014-05-27(Tue)") draw.wordcloud(tfidf.wed,title="2014-05-28(Wed)") draw.wordcloud(tfidf.thu,title="2014-05-29(Thu)")
draw.wordcloud(tfidf.fri,title="2014-05-30(Fri)") draw.wordcloud(tfidf.sat,title="2014-05-31(Sat)")
» Output word clouds as PNG format.
Step 4: Visualize the Buzz-word of the day
» These wordclouds are describing the trends of the day.
32
Step 4: Visualize the Buzz-word of the day
» These wordclouds are describing the trends of the day.
wordcloud of a Sunday
There are specific location such
as “Vancouver”, “Boston” and
“Phoenix”.
It seems that this day has more questions about a route or a
booking than other days or
troubles that happened in
specific airport.
33
Step 4: Visualize the Buzz-word of the day
» These wordclouds are describing the trends of the day.
wordcloud of a Thursday
The words “seatback”, “pain”
and ”captains” appeared in the
word cloud.
It seems that there are troubles with the fleet, cabin or in-flight
service somewhere.
34
Step 4: Visualize the Buzz-word of the day
» These wordclouds are describing the trends of the day.
Wordcloud of a Friday
There are words : “award” and
“platinum”.
It seems that this day has more
questions about frequent flyer program than a normal day.
35
Conclusion
36
» I told a brief introduction of SaaS-Based CRM “Salesforce.com” and its
application platform “Force.com”.
» An R package RForcecom has various features for exchanging data
with Salesforce.com/Force.com.
» I made a sample use case of RForcecom using Twitter dataset and
visualized customers’ voice.
» The framework might be applied to for conducting a sentiment
(negative/positive) analysis and for analyzing customer feedback for
specific product or service to improve customer satisfaction.
37
Thank you
» Any Questions?
Takekatsu Hiramura http://thira.plavox.info/
[email protected] http://rforcecom.plavox.info/