Upload
manidipa-banerjee
View
83
Download
0
Embed Size (px)
Citation preview
P a g e 1 | 26
1
Shrinking Big Data for Real-time Marketing Strategy
A Statistical Analysis Report
(Using R – statistical Language)
(DeBois, 2015)
Authors: Manidipa Banerjee (MBA-MIS)
University of Massachusetts Dartmouth
Ankita Zaveri (MBA-Marketing)
University of Massachusetts Dartmouth
Abstract:
Marketing is increasingly data driven. To develop strategies, it needs efficient tool to
analyze the data that are valuable in decision making process. An online shopping experience
involves customer interaction, their sentiments involved with the products and their price
value. To determine this semi-structured data, we need a tool that would provide the right
structure to analyze this data, identify the core competencies and predict the product value as
well as market share. R- Statistical language proves to be the efficient environment from where
we can acquire a spectrum that provides all the capabilities needed for the marketing decisions.
This report is based on the data provided by an online shopping retailer known as
“DiamondStuds” (DiamondStuds, n.d.). A wide range of data that includes details of Product,
Revenue, Transactions, Order, Social Trends etc. are used for the evaluation.
P a g e 2 | 26
2
I. Contents II. Introduction ................................................................................................................................... 3
III. The Case Study .............................................................................................................................. 3
IV. Results and Discussion ................................................................................................................ 4
1. Identifying Product Performance .......................................................................................... 4
2. Identifying Target market ....................................................................................................... 6
3. Twitter Analysis ...................................................................................................................... 14
4. Traffic Analysis ....................................................................................................................... 15
V. Recommendations - Predictions ............................................................................................... 19
VI. Conclusion .................................................................................................................................... 19
VII. References ..................................................................................................................................... 20
VIII. Appendices: R-Code ............................................................................................................... 20
P a g e 3 | 26
3
II. Introduction
“Big Data is the biggest game-changing opportunity for marketing and sales since the Internet
went mainstream almost 20 years ago. That statement often prompts vigorous head nodding from
executives, but is quickly followed by head scratching. “How can we make this happen?” (Forbes, 2013)”
Data reveals insights that are useful to identify products, their market value and social
trends as well as provide opportunities to decision making process. The Data journey starts
from consumer decision, product evaluation, and transactions and all the way through its
shipping process. With this enormous amount of data, a wide range of possibilities arise from
the effective data analytics process that help retailers take valuable marketing decisions.
R- Statistical language help marketers make better decisions based on the history data
and provide predictions based on it. This project is based on marketing decisions that are made
based on the statistics and algorithm programmed by R.
III. The Case Study
DiamondStuds.com is an online jewelry store that offers a wide range of customized
diamond stud earrings. They specialize in providing affordable diamond studs with a wide
variety of options.
Their list of services includes Certificate of Authenticity & Free Appraisal Report, Safe
packaging and insured products, free 30-days returns & exchange, Lifetime warranty and
Lifetime upgrades. They use extensive marketing tactics to gain more customers each year.
Their sales have increased tremendously every year and continue to do so.
The company makes most of its revenue during the months of November and December
(the holiday season). This time period is very important for them. With over 2000 orders in 2015
the approximately 1200 of their orders were in the months of November and December. Due to
this concentrated sale period the company is looking for a unique marketing plan for the year
2016 holiday season.
Their current marketing plan includes Facebook target marketing, Google AdWords,
Email Marketing and SEO. They also offer Deal of the Day and Sign-up Discount offer on their
website.
An online business faces multiple marketing problems which require tough decisions.
One of the problems with providing a wide variety is not being able to identify the star product.
Customization make it difficult for the company to distinguish between what is selling well and
what just seems to be selling well. It is also difficult to analyze your customer traffic in detail,
sectionalize them and make predictions based on that information.
It would be great if the knew what variables to consider, what products are more popular, what
are the sources they are earning the most from. We believe that our analysis will help answer
P a g e 4 | 26
4
some of these tough questions and support their marketing plan as a whole. It will allow them
to make more informed decisions that could result in improved sales for 2016 holiday season.
IV. Results and Discussion
A. Back-end Application
1. Identifying Product Performance
To sell diamond jeweler, one has to acquire sufficient knowledge of quality, time
and price, nevertheless, jewelry market is unpredictable and prone to change in
short period of time. To maximize profit in the market, product and its value hold
the crucial role in the process. It also becomes important to know about the product
and its market worth. Upon a standardized price value, organizations can collect a
large amount of profit, based on their statistical report for previous years. Also, it
provides an opportunity to identify the most Revenue given products.
The below figure: 1 shows the products that are allotted in the online shopping
website of Diamondstuds.com
Figure 1: Back- end view of Products at DiamondStuds.com
Methodology:
The highlighted area shows the corresponding dates and the Product Revenues.
Using R - language, a statistical data can be visualizing where a range of products
that provided Revenue during a particular time of year can be determined. Below
Figure: 2 produces a clear picture of the sales around a particular time period
(November - December) during 2014-2015.
P a g e 5 | 26
5
Figure 2: Revenue Vs Sales Period
To locate the high sales area, a geographical map is very much useful.
Figure 3: Locating High SALES area
A list of products that are sold all over the world during that period of time is shown in
the figure below:
P a g e 6 | 26
6
Figure 4: Identifying the top list: Figure 5: Identifying the bottom list:
2. Identifying Target market
a) Interests- Affinity Categories
Affinity categories are used to reach potential customers, to make them aware of
your brand or product. These are users higher in the purchase funnel, near the
beginning of the process. While using AdWords you can add audience targeting to ad
groups in your campaigns to reach people interested in products and services similar to
those your business offers, even when these people are browsing websites, using apps,
or watching videos not directly related to your products and services. By doing so, you
can help boost your campaigns’ performance. Depending on your advertising goals and
the stage of the purchase process your customers are in, you can choose to add different
audiences to your ad groups. Affinity Categories are also used by Facebook to help you
reach specific audiences by looking at their interests, activities, the Pages they have liked
and closely related topics. These interests are combined to expand your ad's reach.
P a g e 7 | 26
7
Figure: 5
We used the following ten variables for cluster analysis of the Interests of Users
as per their Affinity Categories.
Affinity Category
Affinity Category Sessions
% New Sessions New Users
Pages / Session Quantity
Avg. Session Duration Transactions
Revenue Ecommerce
Figure 7: Scatter Plot Sessions vs Revenue
With the above scatter plot we can observe that there is a relationship between
the Number of Sessions of users and Revenue. This helps us identify the general
distribution of the Affinity Interests forming three clusters. We can use this information
P a g e 8 | 26
8
for further analysis in determining the ideal number of clusters as well as the placement
of the Affinity Interests in relation to relevant variables.
We then normalize all the variables except the first column and calculate the distance
matrix with Euclidean distance as default. (#Normalize & calculate). This allows us to create
cluster Dendogram with complete linkage as well as cluster Dendogram with average
linkage. (#Cluster Dendogram) These diagrams help us observe the different clusters
formed with the linkages and help us easily classify the interests accordingly.
Figure 9: Cluster Dendogram (average linkage)
We then characterized clusters by creating a vector showing the cluster membership.
(#Characterizing clusters) This allowed us to plot the following silhouette plot.
Figure 10: Silhouette Plot
P a g e 9 | 26
9
From the above silhouette plot we can see three clusters formed with 32 interests
with a distribution of 25, 6 and 1 in each cluster. It helps us support the conclusion of
ideally requiring 3 clusters for the most optimized analysis.
We then used K-means clustering analysis to conclude that the contributing
variables towards cluster formation are Sessions, New Users, Pages / Session,
Transactions, Revenue and Quantity. (#K-means Clustering)
We can identify these variables by observing the difference between the highest
and lowest values of their respective cluster means. These values determine the high
influence of these variables on the selected clusters. Using this information, we plotted
clusters to find the relevance of Ecommerce Conversion Rate with Revenue. Ecommerce
Conversion Rate is the percentage of visits that resulted in an e-commerce transaction.
The below scatter plot helps us classify users according to their spending
patterns. The users in green have High Ecommerce Conversion Rate and High Revenue
i.e. they are more likely to buy expensive products. The users in black have High
Ecommerce Conversion Rate and Low Revenue i.e. they are more likely to buy
inexpensive products. The users in red have Low Ecommerce Conversion Rate and Low
Revenue i.e. they are less likely to buy expensive products.
Figure 11: Scatter Plot of Clusters - Revenue vs Ecommerce Conversion Rate
Less likely to Buy Expensive Products More likely to Buy Inexpensive Products
More likely to Buy Expensive Products
P a g e 10 | 26
10
From this distribution the company can decide where they want to direct their
resources in their marketing plan. For example, we now know that Business
Professionals, Outdoor Enthusiasts, Do-It-Yourselfers, Thrill Seekers, tend to have a
higher ecommerce conversion rate but purchase less expensive products. The company
can now target this audience to cater to their needs by creating customized marketing
plans.
Another example is of the users who are less likely to buy expensive products.
We now know that Family-Focused, Fashionistas, Pet Lovers, Cooking Enthusiasts, have
a lower ecommerce conversion rate and purchase less expensive products. The company
can target this audience and create marketing plans that focus on increasing their
ecommerce conversion rate.
b) Interests- In-Market segments
Users in these segments are more likely to be ready to purchase products or
services in the specified category. These are users lower in the purchase funnel, near the
end of the process.
While using AdWords Companies can select from these audiences to find
customers who are in the market, which means that they are researching products and
are actively considering buying a service or product like those you offer. In-market
audiences are available to advertisers in all AdWords languages.
These audiences are designed for advertisers focused on getting conversions
from customers most likely to make a purchase. In-market audiences can help drive
remarketing performance and reach consumers close to completing a purchase. We used
the following ten variables for cluster analysis of the Interests of Users as per their In-
Market Segments.
In-Market Segment Sessions
% New Sessions New Users
Pages / Session Avg. Session Duration
Transactions Revenue
Ecommerce Conversion Rate Quantity
We used the same method from Interests- Affinity Categories and created the following
graphs.
P a g e 11 | 26
11
Figure 12: Scatter Plot Sessions vs Revenue
Figure 13: Cluster Dendogram (average linkage)
P a g e 12 | 26
12
Figure 14: Silhouette Plot
With the scatter plot in the above figure we can observe that there is a
relationship between the Number of Sessions of users and Revenue. This helps us
identify the general distribution of the In-Market Segments forming three clusters. In the
above silhouette plot we can see three clusters formed with 18 interests with a
distribution of 3, 10 and 5 in each cluster. It helps us support the conclusion of ideally
requiring 3 clusters for the most optimized analysis.
K-means clustering with 3 clusters of sizes 9, 6, 3
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
As per the above analysis the contributing variables towards cluster formation
are Sessions, New Users, Transactions, Revenue and Quantity.
We can identify these variables by observing the difference between the highest and
lowest values of their respective cluster means. These values determine the high
influence of these variables on the selected clusters.
Using this information, we plotted clusters to find the relevance of Ecommerce
Conversion Rate with Revenue. Ecommerce Conversion Rate is the percentage of visits
that resulted in an e-commerce transaction.
P a g e 13 | 26
13
Figure: 15 Scatter Plot
The above scatter plot helps us classify segmented users according to their
spending patterns. The users in black have Mid Ecommerce Conversion Rate and High
Revenue i.e. they are likely to buy expensive products. The users in green have High
Ecommerce Conversion Rate and Low Revenue i.e. they are more likely to buy
inexpensive products. The users in red have Low Ecommerce Conversion Rate and Low
Revenue i.e. they are less likely to buy expensive products.
Using this distribution, the company can make decisions on segmented
marketing. As the users are already to buy the product and are researching about it, it is
easier for the company to market in order to increase their ecommerce conversion rate.
From the above cluster analysis, the company can target the segments they prefer and
influence their sales in those segments. For example, users looking for Dating Services
are more likely to make a purchase of diamond jewelry than the ones in Sports &
Fitness. The company can make multiple analysis to target and influence that market
segment.
We now know that Beauty Products & Services, Gift & Occasions, Dating
Services, are the sectors that less likely to buy expensive products. The company can
target this audience and create marketing plans that focus on increasing their
ecommerce conversion rate.
P a g e 14 | 26
14
3. Twitter Analysis
We believe that twitter analysis is an important aspect of online marketing for an e-commerce
website such as DiamondStuds.com. Twitter analysis using word cloud helps the company in
understanding the overall user sentiment as well as be aware of their competitors in the market.
Figure 16: Tweet Clouds (1)
The first word cloud is created from the compilation of 35 tweets with “#DiamondStuds”. We
created this word cloud using the color ‘Dark2’ with words having a minimum frequency of 5.
From this we can get the highlighted words/terms like diamond jewelry, mom, gifts, mother’s
day, etc. From this we can view that Mother’s Day is coming up and is a popular topic in
relation to gifting jewelry. These words give us an overview of the general public opinion.
Words like doyleanddoyle and londongold show us the competing brands that are already
popular in relation to diamond studs.
P a g e 15 | 26
15
Figure 16: Tweet Clouds (2)
The second word cloud is created from the compilation of 100 tweets with “Diamonds”
in it. We created this word cloud using the color ‘Dark2’ with words having a minimum
frequency of 10. This word cloud helps us analyze the sentiment of users towards diamonds in
general. From this we can get the highlighted words/terms like gemstones, flawless, gianews,
etc. These words give us an overview of the general public opinion and let us know if we need
to look out for anything that would affect our trade in the future.
If the company creates an automated word cloud generator for analyzing the overall
user sentiment, they will know the right action to take at any given moment. For example, if
they want to launch a new product in the market and after Twitter analysis they observe that
the market is very optimistic and positive in relation to diamond studs, they can go ahead with
the launch. But if they observe that the market isn’t doing so well or that not many people are
interested in purchasing diamond studs right now, they can change their tactics to make the
launch more attractive to their customers.
4. Traffic Analysis
Online shopping provides various gateways of payment options with a wide range of
referral sources that offer the products with equal value price. These sources also provide a
good amount of revenue to the parent source. To identify these sources can help to estimate the
market share that they provide to the parent company. Also, these referral sources can be used
as one of the campaign platform to attract more customers.
P a g e 16 | 26
16
Figure: 16 displays the list of valued referral sources.
We can also identify the ecommerce conversion rate with the data that show us a
percentage value of view that are converted to transactions.
Figure 17: depicts the e-Conversion Ratio
P a g e 17 | 26
17
a) Product suggestions for e-shopping
Online shopping has a unique feature of suggesting shoppers about the related items
that they are looking for. It assists customers with a wide range of unseen products by different
users and provide them information about their visibility and custom price and availability.
To establish this function, an algorithm needs to be created that would categorize
products according to their type and cost. Viewers can see related products viewed by other
users.
Methodology:
Figure: 18 depicts the preliminary rules.
We can also view the number of items that include in those rules. We can mention them
as “Orders”. Figure 18: shows the hierarchical orders of the items.
P a g e 18 | 26
18
Figure: 19 Number of Items in the Rules
Finally, with the help of support, lift and confidence parameters, 12 decent rules can be
find out that would help to optimize the products and related ones to display in the front end of
the web application.
Figure 20: Depicts the 12 Association Rules
P a g e 19 | 26
19
V. Recommendations - Predictions
Data that are collected from back end of an e-shopping application is mainly used to
study the past data, i.e. Revenue acquired, Quantity of products sold, transaction conversion
rate as well as to identify the products that are not receiving proper exposure to the customers,
thereby downsizing with respect to Sales, consequently Revenue.
These kind of Products can be identified as mentioned in the Figure: 5 and can be
predicted w.r.t to Quantity to be sold and corresponding Revenue. Figure 15 depicts the rules
that can be used to identify the process of the sell workflow of the Products and their
possibilities of providing Revenue.
Figure 21: Product Quantity - Revenue - Prediction
VI. Conclusion
R provided valuable insights from the acquired data ranging from identifying Product
performance, Target market section, referral traffic sources as well as social trends that can help
the online retailer with a data proven strategy and provide predicting data for future campaigns
and product placement in the market. Also, this data would estimate the possibility of
capturing a market share when new products are being launched.
P a g e 20 | 26
20
VII. References DeBois, P. (2015, July 31,). How to Shrink Big Data To Fit Your Marketing Strategy. Retrieved from
cmswire: http://www.cmswire.com/analytics/how-to-shrink-big-data-to-fit-your-marketing-
strategy/?utm_source=MainRSSFeed&utm_medium=Web&utm_campaign=RSS-News
DiamondStuds. (n.d.). Diamond online store. Retrieved from diamondstuds:
https://www.diamondstuds.com/
Forbes. (2013, July 22,). Big Data, Analytics And The Future Of Marketing And Sales. Retrieved from
Forbes: http://www.forbes.com/sites/mckinsey/2013/07/22/big-data-analytics-and-the-future-
of-marketing-sales/#3afb7b52344d
VIII. Appendices: R-Code
Figure 2: Revenue Vs Sales Period
boxplot(Revenue~Date,productdata,main="Revenue VS Sales Date", xlab="Date",ylab="Revenue",
Vertical=TRUE,col=terrain.colors(10))
Figure 4: Identifying the top Product list:
myvars <-subset(productdata, Product.Revenue >5000, select=c(Product.Revenue,Product))
str(myvars)
View(myvars)
Figure 5: Identifying the bottom list:
aggregate(Revenue~Product, productdata, mean)
boxplot(Revenue~Product,productdata)
productprice <- subset(productdata,Product=="Product" |
Revenue=="Product.Revenue",select=c(Product,Revenue,Quantity))
View(myvars)
Figure 7: Scatter Plot Sessions vs Revenue
#Scatter plot with labels for points
plot(Revenue~Sessions, data = Interests)
with(Interests,text(Revenue~Sessions, labels=Affinity.Category,pos=3, cex=0.5))
P a g e 21 | 26
21
#Normalize & Calculate
# Normalize
> z = Interests[,-c(1)]
> m <- apply(z,2,mean)
> s <- apply(z,2,sd)
> z <- scale(z,center=m,scale=s)
#calculate distance matrix (default is Euclidean distance)
> distance <- dist(z)
> print(distance, digits = 2)
Figure 9: Cluster Dendrogram (average linkage)
#Cluster Dendrogram
#Cluster Dendrogram (complete linkage)
hc.c <- hclust(distance)
plot(hc.c,hang=-1,labels=Interests$Affinity.Category)
#Cluster Dendrogram (average linkage)
hc.a<-hclust(distance,method="average")
plot(hc.a,hang=-1,labels=Interests$Affinity.Category)
Figure 10: Silhouette Plot
#Characterizing clusters
#Create a vector showing the cluster membership
> member.c = cutree(hc.c,3)
> table(member.c)
> member.a = cutree(hc.a,3)
> table(member.a)
> table(member.c,member.a)
member.a
member.c 1 2 3
P a g e 22 | 26
22
1 15 0 0
2 10 1 0
3 0 5 1
> aggregate(z,list(member.a),mean)
> aggregate(Interests[,-c(1)],list(member.a),mean)
> library(cluster)
> plot(silhouette(cutree(hc.a,3), distance))
#K-means Clustering
> kc<-kmeans(z,3)
> kc
K-means clustering with 3 clusters of sizes 12, 14, 6
Cluster means:
Sessions X..New.Sessions New.Users Pages...Session Transactions
1 -0.3428882 -0.7494206 -0.3540434 -0.3589429 -0.2811070
2 -0.4785232 0.6950469 -0.4694847 -0.3339416 -0.5513192
3 1.8023304 -0.1229350 1.8035511 1.4970830 1.8486254
Within cluster sum of squares by cluster:
[1] 25.38145 35.21799 27.20816
(between_SS / total_SS = 68.5 %)
Figure 11: Scatter Plot of Clusters - Revenue vs Ecommerce Conversion Rate
plot(Revenue~ Ecommerce.Conversion.Rate, Interests,col = kc$cluster)
with(Interests,text(Revenue~ Ecommerce.Conversion.Rate, labels=Affinity.Category,pos=3, cex=0.5))
Figure 12: Scatter Plot Sessions Vs Revenue
# Normalize
> z = Interests[,-c(1)]
> m <- apply(z,2,mean)
> s <- apply(z,2,sd)
> z <- scale(z,center=m,scale=s)
P a g e 23 | 26
23
#calculate distance matrix (default is Euclidean distance)
> distance <- dist(z)
> print(distance, digits = 2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
2 2.9
3 4.8 7.1
4 5.1 7.6 2.4
5 4.8 7.1 3.8 4.0
6 6.4 8.5 6.2 5.3 2.9
7 3.4 5.7 4.1 4.1 1.6 3.2
8 4.8 7.2 1.6 3.2 2.7 5.4 3.4
9 5.0 7.2 3.0 3.3 1.8 4.0 2.8 2.1
10 4.6 6.7 1.5 3.7 4.8 7.4 4.9 2.4 4.0
11 3.2 5.5 2.6 3.8 2.4 5.0 2.2 1.9 2.6 2.9
12 3.2 5.6 1.7 3.1 3.4 5.9 3.2 1.9 3.0 1.8 1.4
13 1.9 1.4 6.4 6.7 6.0 7.3 4.6 6.4 6.3 6.2 4.7 4.8
14 2.0 4.5 3.5 3.4 3.3 4.9 2.1 3.5 3.4 3.9 2.2 2.3 3.5
15 6.1 8.2 5.3 5.2 1.6 2.2 2.8 4.0 2.7 6.3 3.9 5.0 7.1 4.7
16 4.7 7.2 1.9 2.4 2.0 4.4 2.7 1.2 1.6 3.1 2.1 2.2 6.3 3.1 3.4
17 5.3 7.8 3.3 3.4 1.1 3.2 2.4 2.2 1.9 4.5 2.6 3.3 6.7 3.7 2.2 1.4
18 2.8 4.7 3.3 3.9 3.0 5.2 2.5 3.0 2.5 3.6 1.9 2.1 3.9 1.8 4.3 2.9 3.5
Figure 13: Cluster Dendrogram (average linkage)
#Cluster Dendrogram (complete linkage)
hc.c <- hclust(distance)
plot(hc.c,hang=-1,labels=Interests$In.Market.Segment)
#Cluster Dendrogram (average linkage)
hc.a<-hclust(distance,method="average")
plot(hc.a,hang=-1,labels=Interests$In.Market.Segment)
#Characterizing clusters
#Create a vector showing the cluster membership
> member.c = cutree(hc.c,3)
> table(member.c)
member.c
1 2 3
3 13 2
> member.a = cutree(hc.a,3)
> table(member.a)
member.a
1 2 3
3 10 5
> table(member.c,member.a)
P a g e 24 | 26
24
member.a
member.c 1 2 3
1 3 0 0
2 0 10 3
3 0 0 2
> aggregate(z,list(member.a),mean)
Group.1 Sessions X..New.Sessions New.Users Pages...Session Transactions
1 1 1.9459877 0.09052571 1.9377519 -0.4328091 1.8769426
2 2 -0.2173128 0.44471812 -0.2116305 -0.3503525 -0.2554043
3 3 -0.7329671 -0.94375167 -0.7393901 0.9603904 -0.6153569
Revenue Ecommerce.Conversion.Rate Quantity Avg..Session.Duration
1 1.8845803 -0.3757202 1.8852053 -0.4663532
2 -0.2424465 -0.4660330 -0.2565385 -0.4090817
3 -0.6458552 1.1574981 -0.6180462 1.0979754
aggregate(Interests[,-c(1)],list(member.a),mean)
Group.1 Sessions X..New.Sessions New.Users Pages...Session Transactions
1 1 64886.33 0.6740667 43200.33 3.703730 851.0
2 2 18791.40 0.6833700 12620.40 3.725137 235.5
3 3 7804.00 0.6469000 5111.80 4.065426 131.6
Revenue Ecommerce.Conversion.Rate Quantity Avg..Session.Duration
1 737293.1 0.01190 897.0 161.3333
2 215153.0 0.01147 245.9 162.5000
3 116124.8 0.01920 136.0 193.2000
> library(cluster)
> plot(silhouette(cutree(hc.a,3), distance))
> kc<-kmeans(z,3)
> kc
K-means clustering with 3 clusters of sizes 9, 6, 3
Cluster means:
Sessions X..New.Sessions New.Users Pages...Session Transactions
1 -0.2562221 0.62691169 -0.2458183 -0.4638935 -0.3025588
2 -0.5886608 -0.98563039 -0.6001485 0.9122448 -0.4846330
3 1.9459877 0.09052571 1.9377519 -0.4328091 1.8769426
Revenue Ecommerce.Conversion.Rate Quantity Avg..Session.Duration
1 -0.2919728 -0.5577461 -0.3018960 -0.5372607
2 -0.5043310 1.0244792 -0.4897586 1.0390676
3 1.8845803 -0.3757202 1.8852053 -0.4663532
Clustering vector:
[1] 3 3 1 1 2 2 2 1 2 1 1 1 3 1 2 1 2 1
P a g e 25 | 26
25
Within cluster sum of squares by cluster:
[1] 29.339950 16.134436 4.722658
(between_SS / total_SS = 67.2 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
Figure: 15 Scatter Plot
plot(Revenue~ Ecommerce.Conversion.Rate, Interests,col = kc$cluster)
with(Interests,text(Revenue~ Ecommerce.Conversion.Rate, labels=In.Market.Segment,pos=3, cex=0.5))
Figure 16: Tweet Clouds (1 & 2)
tweets<- searchTwitter('#diamondstuds',n=100, lang='en')
tweets
tweets<- searchTwitter('diamonds',n=100, lang='en')
tweets
library(wordcloud)
m<- as.matrix(tdm)
wordFreq <- sort(rowSums(m), decreasing=TRUE)
set.seed(1000)
wordcloud(words=names(wordFreq), freq=wordFreq, min.freq=5, random.order=F,colors=brewer.pal(6,
"Dark2"))
wordcloud(words=names(wordFreq), freq=wordFreq, min.freq=10, random.order=F,colors=brewer.pal(6,
"Dark2"))
Figure 17: depicts the e-Conversion Ratio
library(scatterplot3d)
myvars <- subset(trafficdata,Transactions
>10,select=c(Source,Ecommerce.Conversion.Rate,Average.Order.Value))
str(myvars)
Figure: 18 depicts the preliminary rules.
P a g e 26 | 26
26
scatterplot3d(myvars$Source,myvars$Ecommerce.Conversion.Rate,myvars$Average.Order.Value,main="
Ecommerce Conversion via Referral sources", xlab="Source ", ylab="e-Conversion Rate ",pch=19,
highlight.3d=TRUE,type="h")
Figure: 19 Number of Items in the Rules
plot(rules, shading="order", control=list(main = "Number of Items in the Rules"))
Figure 20: Depicts the 12 Association Rules
rules<-apriori(mydata,parameter = list(minlen=1,maxlen=5,supp=.7))
plot(rules)
inspect(rules)
plot(rules, shading="order", control=list(main = "12 Association Rules"))
Figure 21: Prediction
myvars <-subset(productdata, Revenue < 5000, select=c(Revenue,Product,Quantity))
str(myvars)
predictdata <- myvars
library(party)
mytree <- ctree(Quantity~Revenue+Quantity,predictdata)
plot(mytree,type="simple",main="Product, Quantity VS Revenue Prediction(Bottom list)")
End