19
PREDECTIVE MODELING SZP0052 RESEARCH QUESTION: Zappos.com is an online shoe and clothing shop currently based in Las Vegas Nevada. Zappos.com wants to know who is coming to their website and what they do when visiting there website. They want to find efficient ways to improve there sales by analyzing there customer base using factors such as the platform they use, most visited site, product page views, visits, orders etc. Aim of the project: To answer the following questions: 1. Who is coming to there website and what they do when visiting there website? 2. Do they buy a product or just visit the website? 3. Do they just view or search the product page? 4. Which platform do they normally use? 5. Which site do they normally visit? 6. Do they just search for the product or even buy it? 7. Develop a model to forecast there gross sales?

Customer behaviour analysis

Embed Size (px)

Citation preview

Page 1: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

RESEARCH QUESTION: Zappos.com is an online shoe and clothing shop currently based in Las Vegas Nevada. Zappos.com wants to know who is coming to their website and what they do when visiting there website. They want to find efficient ways to improve there sales by analyzing there customer base using factors such as the platform they use, most visited site, product page views, visits, orders etc. Aim of the project: To answer the following questions: 1. Who is coming to there website and what they do when visiting there website? 2. Do they buy a product or just visit the website? 3. Do they just view or search the product page? 4. Which platform do they normally use? 5. Which site do they normally visit? 6. Do they just search for the product or even buy it? 7. Develop a model to forecast there gross sales?

Page 2: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

Data Visualization:

I. Site Vs Sales

Page 3: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

Observations: From the above plots it is clear that the most visited site is Acme, followed by pinnacle and sorty. Even though Pinnacle and sorty have substantial visits they do not correspond to sales. Thus we can say that customers visiting pinnacle and sorty do no produce substantial sales to the company. Where as the visits of Acme translate into sales, thus we can say that customers visiting the acme produce substantial sales to the company.

II. Customer Vs sales 0 – old Customer 1- New customer

Page 4: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

Observations: From the above graphs it is clear that most of the people who visit the site are old customers and old customers contribute significantly more to the company sales as compared to new customers. Though the difference between old customers and new customers who visit the site does not appear to be significantly different.

Page 5: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

III. Platform Vs Sales

Page 6: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

Observations: From the plots we can notice that of all the platforms most of the customers who visit the company site use ios followed by android and windows. It should also be noted that though most of the customers use ios and android, it is the windows and Mac OS X that contribute significantly to companies’ sales. Though ios is used my majority of the customers it does not reflect the sales of the company. It can also be noted that majority of the customers who use Mac OS X and windows visit the site Acme.

Page 7: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

IV. ORDERS Vs Sales

Page 8: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

Observations: From the above graphs it can be noted that on majority of the orders are between 0-200. It can clearly be observed that there is linear relation between orders and sales and majority of the orders to Acme site from the platform windows.

Page 9: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

V. Product page Views Vs Sales

Page 10: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

OBSERVATIONS: It can be noted that majority of the project page views comes from the windows platform followed by iOS and Mac OSX. It can also be observed that majority of the sales for the site widgetry comes from the iOS users. It also appears that there is a proportional relation between product page views and sales, which indicate that the customers who are searching for the product are actually buying it.

Page 11: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

VI. Search page views Vs Sales

Page 12: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

Observations: It can be observed that majority of the sales come from Windows platform, followed by MacOSX and iOS. It can also be noted that and Acme is the most searched site, as we have seen earlier that Acme also has maximum sales indicating that people search page views is proportional to sales.

Page 13: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

VII. Add to cart Vs Sales

Page 14: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

Observations: It can be observed that most of the products are added to cart from the Acme site and the most used platform is Windows followed by MacOSX and iOS. There also appears to be significant relations between add to cart and sales.

Page 15: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

VIII. Conversion Rate Vs Sales

Page 16: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

Observations: It can be observed that conversion rate is pretty good for Acme, Pinnacle and sortly. Golden ratio: add_to_cart/Orders, we want this to be as low as possible. It can be noted that it is pretty high for Tabular and Widgetry. MODELLING Checking for Assumptions: Normality of Response:

Page 17: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

As the data does not follow normal distribution we need to transform it, here we use log transformation to transform the data to Normal.

The data is now normally distributed, so we can fit linear regression. FITTING SIMPLE LINEAR REGRESSION

1. lm(formula = log(gross_sales) ~ visits, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.435e+00 2.510e-02 256.33 <2e-16 *** visits 2.109e-04 7.122e-06 29.61 <2e-16 *** Multiple R-squared: 0.07096

x We can see that predictor Visits is significant.

Page 18: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

2. lm(formula = log(gross_sales) ~ platform, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.31492 0.15614 21.231 < 2e-16 *** platformAndroid 3.14734 0.16345 19.255 < 2e-16 *** platformBlackBerry -0.05144 0.18442 -0.279 0.780321 platformChromeOS 2.15667 0.18148 11.884 < 2e-16 *** platformiOS 5.21251 0.16255 32.068 < 2e-16 *** platformiPad 4.18013 0.20332 20.560 < 2e-16 *** platformiPhone 3.94906 0.19845 19.900 < 2e-16 *** platformLinux 1.86229 0.17273 10.782 < 2e-16 *** platformMacintosh 4.47188 0.22052 20.278 < 2e-16 *** platformMacOSX 3.97496 0.16676 23.837 < 2e-16 *** platformOther 3.10501 0.22714 13.670 < 2e-16 *** platformUnknown 0.16862 0.17547 0.961 0.336587 platformWindows 4.37739 0.16533 26.477 < 2e-16 *** WindowsPhone 0.61178 0.18497 3.308 0.000944 *** Multiple R-squared: 0.3744

x We can see that all the platforms except Blackberry and Unknown are significant and have significant effect on sales.

3. lm(formula = log(gross_sales) ~ site, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.20121 0.03615 199.228 <2e-16 *** siteBotly -0.09904 0.11445 -0.865 0.387 sitePinnacle -1.47050 0.06326 -23.245 <2e-16 *** siteSortly -1.93302 0.06044 -31.983 <2e-16 *** siteTabular 1.55513 0.11445 13.588 <2e-16 *** siteWidgetry 1.49026 0.11445 13.021 <2e-16 *** Multiple R-squared: 0.1546

x It can be noted that there is no significant difference between sales of Acme and other sites except for Botly.

Page 19: Customer behaviour analysis

PREDECTIVE MODELING SZP0052

4. lm(formula = log(gross_sales) ~ new_customer, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.15069 0.03493 204.69 <2e-16 *** new_customer -1.02895 0.05023 -20.48 <2e-16 *** Multiple R-squared: 0.03594

x It can be noted that there is a significant relation between new customer and sales.

5.